GLAM/Newsletter/December 2019/Contents/Norway report
|
Plaintext Wikipedia dumps for the National Library
ByText dumps from Wikipedia to The National Library; Språkbanken - a language technology resource collection for Norwegian
Like Wikimedia Norge did last year, we supplied plaintext copies of Wikipedia content to the National Library division, Språkbanken, which is a corpus of texts in Norwegian and Sámi languages in December 2019. This time, it was texts from talk pages they requested, since that is somewhat closer to natural or informal language than what article texts are.
Here is more info the National Library division, Språkbanken /Language bank work:
"Språkbanken offers digital language resources for use in research and in the development of language technology. The resources can be downloaded from Språkbanken’s website free of charge. The collection is expanding continuously.
Språkbanken is a service to that part of the ICT-industry which works with the development of language-based ICT-solutions, to researchers within language technology and linguistics, and to public enterprises which develop electronic solutions for public services. Among other things, Språkbanken contains corpora of written and spoken language, i.e. large collections of text and speech in machine-readable format.
Språkbanken is a national infra-structure initiative designed to ensure that language technology solutions based on the Norwegian language will be developed, and thereby prevent domain loss of Norwegian in technology-dependent areas." (link)