European Literary Text Collection (ELTeC) in TextGrid Repository (XIII; 2022-08-29)
This is the project site of the European Literary Text Collection (ELTeC) in the TextGrid Repository (TextGrid Repository). The goal of adding the ELTeC to TextGrid Repository is to publish and archive this valuable set of corpora in European languages and combine them with the technical possibilites that TextGrid Repository offers. Below, we list some of the possibilities that TextGrid Repository facilitates to researchers and readers who are interested in the ELTeC. Currently, we have imported the 11 subcorpora of the ELTeC that contain more than 50 novels.
Browsing the ELTeC in TextGrid Repository
Here we present some possibilities of how to browse the ELTeC in TextGrid Repository:
- Navigate to the texts of one subcorpus of one specific language
- Navigate to all editions of the ELTeC
- Navitage to all objects of the ELTeC corpus (files and metadata of the editions, works and subcorpora)
In all these cases, you can add further filters with the facets on the left.
Subcorpora and Languages
Here are links to the subcorpus for each language:
- Czech Novel Corpus (ELTEC-cze), collected by the Institute of the Czech National Corpus
- German Novel Corpus (ELTeC-deu), collected by Fotis Jannidis, Leonard Konle and Carolin Odebrecht
- English Novel Corpus (ELTeC-eng), collected by Lou Burnard
- French Novel Corpus (ELTeC-fra), collected by Christof Schöch and Lou Burnard
- Hungarian Novel Corpus (ELTeC-hun), collected by Gábor Pálko
- Norwegian Novel Corpus (ELTeC-nor), collected by Michael Preminger and Christian Emil Smith Ore
- Polish Novel Corpus (ELTeC-por), collected by Joanna Byszuk and Jan Rybicki
- Portuguese Novel Corpus (ELTeC-por), collected by Diana Santos
- Romanian Novel Corpus (ELTeC-rom), collected by Roxana Patras
- Slovenian Novel Corpus (ELTeC-slv), collected by Tomaž Erjavec, Miran Hladnik, Marko Juvan and Katja Mihurko Poniž
- Spanish Novel Corpus (ELTeC-spa), collected by Borja Navarro Colorado
- Serbian Novel Corpus (ELTeC-srp), collected by Cvetana Krstev
- Swedish Novel Corpus (ELTeC-swe), collected by Ljubica Miočević and Cai Alfredson
- Ukranian Novel Corpus (ELTeC-ukr), collected by Dmytro Yesypenko and Mykhailo Nazarenko
Filtering through Specific Metadata of the ELTeC (Facets)
Because some specific metadata fields are relevant for the composition of the ELTeC, these have been incorporated as new searchable metadata (facets) to TextGrid Repository. For this, the metadata in the TEI files incorporated in the TextGrid metadata fiels. Here we present some possible queries specific for the ELTeC:
- language
- timeSlot
- firstEdition
- Look for works being created in 1840 (work.dateOfCreation.value:1840)
- Look for works being created after 1840 (work.dateOfCreation.value:>1840)
- Look for works being created before 1910 (work.dateOfCreation.value:<1910)
- Look for works being created between 1840 and 1910 (work.dateOfCreation.value:>1840 work.dateOfCreation.value:<1910 )
- authorGender
- size
- reprintCount
Of course, queries combining these facets are possible. The facet search can be combined with fulltext queries, such as:
- Look for texts in the French ELTeC corpus which contain word Paris (Paris edition.language:fra)
- Look for texts in the English ELTeC corpus which contain the word London (London edition.language:eng)
For further information about querying TextGrid Repository, consider the documentation.
Benefits of ELTeC in TextGrid Repository
The ELTeC corpora are already available as GitHub repositories and in Zenodo. So, what is the motivation to publish it also in TextGrid Repository? In our opinion, TextGrid Repository can offer a series of advantages to the ELTeC and its community of users:
- Long-term archive: TextGrid Repository is a long-term repository awarded with the CoreTrustSeal
- Findability through Harvesting: By including the ELTeC editions in TextGrid Repository, these texts can be found in further platforms. Aggregators or registries like re3data, OpenAIRE, VLO (CLARIN Virtual Language Observatory) or DARIAH Collection Registry harvest the information of the TextGrid Repository. The corpora of ELTeC will become more visible and easier to find for interested scholars
- Identification: TextGrid Repository assigns persistent identifier to all subcorpora, works and editions of the ELTeC
- Integration: in TextGrid Repository, the ELTeC is integrated in one of the largest literary corpus openly available
- Combination with other corpora: users can combine easily some texts of the ELTeC with other corpora, for example filtering the entire TextGrid Repository by language or year of publication
- Shelf function: TextGrid Repository offer the shelf function, with any user can combine
- Publication in HTML: in contrast to other platforms, the TEI files are also published as HTML, enabling search engines to find them easily
- Transformation: Besides the HTML format, all texts in TextGrid Repository are authomatically transformed in other formats (zip, ePUB, plaintext)
- Analysis: TextGrid allows the sending single texts or entire subcorpora to Natural Language Processing (via Switchboard) and Digital Humanities tools (Voyant)
- Integration in the NFDI Consortium Text+ Portfolio: TextGrid Repository is part of the services of the Consortium Text+ as part of the German National Strategy of Research Data
- Future integration in future services: TextGrid Repository is further developed in association with several ongoing projects. With its integration, the ELTeC will profit from future features and development, such as the currently in development Python library
TextGrid Metadata Files
The basic metadata is covered by the TextGrid Metadata schema in Edition and Work metadata, all additional project specific metadata is covered by the metadata added to the works. Please see the following examples:
Citation Suggestion
To cite each corpus, please, click on them in the previous links, you will find a citation suggestion at the bottom of the page. To cite all ELTeC subcorpora in TextGrid Repository, we suggest following reference:
- European Literary Text Collection (ELTeC) in TextGrid Repository (2023). Edited by Carolin Odebrecht, Lou Burnard and Christof Schöch. Version 1.0.0, based on ELTeC release 1.1.0 (April 2021). COST Action Distant Reading for European Literary History (CA16204) & TextGrid Repository. https://sandbox.dev.textgridrep.org/project/TGPR-5d9f2f27-7019-3901-1ab1-630dc237b4df.
Version
- Project name: DISTANT READING XIII
- Project ID: TGPR-5d9f2f27-7019-3901-1ab1-630dc237b4df
- More and up-to-date issues please find in the project's Gitlab Issues
- Gitlab import project
- Citation Suggestion for this Object
- TextGrid Repository (2023). README.md. European Literary Text Collection (ELTeC). . https://hdl.handle.net/21.T11991/0000-001C-E803-9