Item request has been placed!
×
Item request cannot be made.
×

Processing Request
W2C – Web to Corpus – Corpora
Item request has been placed!
×
Item request cannot be made.
×

Processing Request
- Author(s): Majliš, Martin
- Subject Terms:
- Document Type:
other/unknown material
- Language:
Afrikaans
unknown
Amharic
Arabic
Aragonese
Asturian; Bable; Leonese; Asturleonese
Azerbaijani
Belarusian
Bengali
Bosnian
Breton
Buginese
Bulgarian
Catalan; Valencian
Cebuano
Czech
Chuvash
Corsican
Welsh
Danish
German
Greek, Modern (1453-)
English
Esperanto
Estonian
Basque
Faroese
Persian
Finnish
French
Western Frisian
Gaelic; Scottish Gaelic
Irish
Galician
Gujarati
Haitian; Haitian Creole
Hebrew
Hindi
Croatian
Hungarian
Armenian
Interlingua (International Auxiliary Language Association)
Indonesian
Icelandic
Italian
Javanese
Japanese
Kannada
Georgian
Korean
Kurdish
Latin
Latvian
Lithuanian
Malayalam
Marathi
Macedonian
Malagasy
Mongolian
Maori
Malay
Burmese
Low German; Low Saxon; German, Low; Saxon, Low
Nepali
Nepal Bhasa; Newari
Dutch; Flemish
Norwegian Nynorsk; Nynorsk, Norwegian
Norwegian
Occitan (post 1500)
Polish
Portuguese
Quechua
Romanian; Moldavian; Moldovan
Russian
Yakut
Sicilian
Scots
Slovak
Slovenian
Spanish; Castilian
Albanian
Serbian
Swahili
Swedish
Tamil
Tatar
Telugu
Tajik
Tagalog
Thai
Turkish
Ukrainian
Urdu
Uzbek
Vietnamese
Waray
Yiddish
Yoruba
Chinese
- Additional Information
- Publication Information:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Publication Date:
2011
- Collection:
LINDAT-Clarin: Repository (Centre for Language Research Infrastructure in the Czech Republic)
- Abstract:
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
- File Description:
application/x-gzip; text/plain; charset=utf-8; downloadable_files_count: 122
- Relation:
http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
- Online Access:
http://hdl.handle.net/11858/00-097C-0000-0022-6133-9
- Rights:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) ; http://creativecommons.org/licenses/by-sa/3.0/ ; PUB
- Accession Number:
edsbas.2821386D
No Comments.