Electronic corpora of the German language as a basis for linguistic research
Tatiana K. Ivanova and
Elsara V. Gafiyatova
Additional contact information
Tatiana K. Ivanova: Kazan Federal University
Elsara V. Gafiyatova: Kazan Federal University
Russian Social and Humanitarian Studies, 2025, vol. 17, issue 4, 137–160
Abstract:
Background. This article examines the types and specific features of the most well‑known German linguistic corpora as tools for studying the German language. The corpora are described to inform the scientific community about the opportunities they offer. The objectives of the study include: description of the electronic portals, their structure, where these German corpora are hosted; presentation of the data — text base and volume, as well as their structure. The conditions and capabilities of linguistic search in these resources are also discussed. The history of the appearance of the first corpus is mentioned, its modern definition is given. There is a review of the scientific literature about the use of corpus data in linguistics and related sciences too. Purpose. The description of the volume, structure, and features of German‑language corpora, as well as the possibilities of automating the process of extracting material taking into account the needs of researchers. Materials and methods. The material for the study are German‑language electronic resources, two of which are examined in detail: the electronic dictionary of the German language — Digitales Wörterbuch der deutschen Sprache (DWDS) and the corpora of the Institute of the German Language — IDS‑Korpora: Corpora of Written Language (LIMAS), the DeReKo project and COSMAS II, Datenbank für Gesprochenes Deutsch (DGD). The paper also references the NEGRA project, a special corpus of the University of Saarbrücken in the Federal State of Saarland, and the Deutscher Wortschatz (German Vocabulary) — corpus dictionary project of the University of Leipzig. The primary research method is structural and descriptive. Quantitative indicators were used to describe and present the data. Results. Along with the corpora descriptions, the authors present recommendations on the application areas of each German‑language resource and the automated search and processing tools available on its portal. The descriptions also mention the volume, structure, and features of the linguistic markup supported by the corpora, which determines the specifics of the extracted material. The authors emphasize the potential of using corpus tools for linguists to test their hypotheses and process empirical data.
Keywords: corpus; German language; linguistic tagging; data volume; verification; prospects for use (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://soc-journal.ru/jour/index.php/mssi/article/view/540/369 (application/pdf)
https://soc-journal.ru/jour/index.php/mssi/article/view/540 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:cxm:russhs:17:4:2025:137-160
DOI: 10.12731/3033-5981-2025-17-4-540
Access Statistics for this article
Russian Social and Humanitarian Studies is currently edited by Fanuza H. Tarasova
More articles in Russian Social and Humanitarian Studies from Science and Innovation Center Publishing House 9 Maya Str., 5 office 192, Krasnoyarsk, 660127, Russian Federation.
Bibliographic data for series maintained by Yan Maksimov ().