Distant Reading – 2022-11-22
Distant Reading for European Literary History (COST Action CA16204) is a project aiming to create a vibrant and diverse network of researchers jointly developing the resources and methods necessary to change the way European literary history is written. Grounded in the Distant Reading paradigm (i.e. using computational methods of analysis for large collections of literary texts), the Action will create a shared theoretical and practical framework to enable innovative, sophisticated, data-driven, computational methods of literary text analysis across at least 10 European languages. Fostering insight into cross-national, large-scale patterns and evolutions across European literary traditions, the Action will facilitate the creation of a broader, more inclusive and better-grounded account of European literary history and cultural identity. [mehr...]
European Literary Text Collection (ELTeC) in TextGrid Repository (2021-11-22)
This is the project site of the European Literary Text Collection (ELTeC) in the TextGrid Repository (TextGrid Repository). The goal of adding the ELTeC to TextGrid Repository is to publish and archive this valuable set of corpora in European languages and combine them with the technical possibilites that TextGrid Repository offers. Below, we list some of the possibilities that TextGrid Repository facilitates to researchers and readers who are interested in the ELTeC. Currently, we have imported the 11 subcorpora of the ELTeC that contain more than 50 novels.
Browsing the ELTeC in TextGrid Repository
Here we present some possibilities of how to browse the ELTeC in TextGrid Repository:
- Navigate to the subcorpus of one specific language
- Navigate to all editions of the ELTeC
- Navitage to all objects of the ELTeC corpus (files and metadata of the editions, works and subcorpora).
In all these cases, you can add further filters with the options on the left.
Languages
Here are links to the subcorpus for each language (edition.language:"[language]"):
- Romanian Novel Corpus (ELTeC-rom)
- Czech Novel Corpus (ELTEC-cze)
- German Novel Corpus (ELTeC-deu)
- French Novel Corpus (ELTeC-fra)
- Hungarian Novel Corpus (ELTeC-hun)
- Polish Novel Corpus (ELTeC-por)
- Portuguese Novel Corpus (ELTeC-por)
- Slovenian Novel Corpus (ELTeC-slv)
- Serbian Novel Corpus (ELTeC-srp)
- English Novel Corpus (ELTeC-eng)
- Spanish Novel Corpus (ELTeC-spa)
Filtering through Specific Metadata of the ELTeC (Facets)
Because some specific metadata fields are relevant for the composition of the ELTeC, these have been incorporated as new searchable metadata (facets) to TextGrid Repository. For this, the metadata in the TEI files incorporated in the TextGrid metadata fiels. Here we present some possible queries specific for the ELTeC:
- timeSlot
- firstEdition
- authorGender
- size
- reprintCount
Of course, queries combining these facets are possible. [For further information about querying TextGrid Repository, consider the documentation](https://dev.TextGrid Repository.org/docs/syntax?lang=en).
Benefits of ELTeC in TextGrid Repository
The ELTeC is already available as GitHub repositories and in Zenodo. So, what is the motivation to publish it also in TextGrid Repository? In our opinion, TextGrid Repository can offer a series of advantages to the ELTeC and its community of users:
- Long-term archive: TextGrid Repository is a long-term repository awarded with the CoreTrustSeal
- Findability through Harvesting: By including the ELTeC editions in TextGrid Repository, these texts can be found in further platforms. Aggregators or registries like re3data, OpenAIRE, VLO (CLARIN Virtual Language Observatory) or DARIAH Collection Registry harvest the information of the TextGrid Repository. The corpora of ELTeC will become more visible and easier to find for inrerested scholars
- Identification: TextGrid Repository assigns persistent identifier to all subcorpora, works and editions of the ELTeC
- Integration: in TextGrid Repository, the ELTeC is integrated in one of the largest literary corpus openly available
- Combination with other corpora: users can combine easily some texts of the ELTeC with other corpora, for example filtering the entire TextGrid Repository by language or year of publication
- Shelf function: TextGrid Repository offer the shelf function, with any user can combine
- Publication in HTML: in contrast to other platforms, the TEI files are also published as HTML, enabling search engines to find them easily
- Transformation: Besides the HTML format, all texts in TextGrid Repository are authomatically transformed in other formats (zip, ePUB, plaintext)
- Analysis: TextGrid allows the sending single texts or entire subcorpora to Natural Language Processing (via Switchboard) and Digital Humanities tools (Voyant)
- Integration in the NFDI Consortium Text+ Portfolio: TextGrid Repository is part of the services of the Consortium Text+ as part of the German National Strategy of Research Data
- Future integration in future services: TextGrid Repository is further developed in association with several ongoing projects. With its integration, the ELTeC will profit from future features and development
TextGrid Metadata Files
The basic metadata is covered by the TextGrid Metadata schema in Edition and Work metadata, all additional project specific metadata is covered by the metadata added to the works. Please see the following two examples:
Technically, there are two parts of metadata: Metadata that can be searched using facets, and metadata that cannot.
Further Internal Documentation
- Currently, ELTeC as collector of the several subcorpora
Example Project
- Project name: DISTANT READING XI
- Project ID: TGPR-d683fb0f-b71d-89fa-6678-61979ac32d0f
Issues
- More and up-to-date issues please find in the [project's Gitlab Issues](https://gitlab.gwdg.de/dariah-de/TextGrid Repository-portal/-/issues?scope=all&state=opened&label_name[]=ELTeC)
- Gitlab import project