Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

W2C – Web to Corpus – Corpora

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Author(s): Majliš, Martin
  • Subject Terms:
  • Document Type:
    other/unknown material
  • Language:
    Afrikaans
    unknown
    Amharic
    Arabic
    Aragonese
    Asturian; Bable; Leonese; Asturleonese
    Azerbaijani
    Belarusian
    Bengali
    Bosnian
    Breton
    Buginese
    Bulgarian
    Catalan; Valencian
    Cebuano
    Czech
    Chuvash
    Corsican
    Welsh
    Danish
    German
    Greek, Modern (1453-)
    English
    Esperanto
    Estonian
    Basque
    Faroese
    Persian
    Finnish
    French
    Western Frisian
    Gaelic; Scottish Gaelic
    Irish
    Galician
    Gujarati
    Haitian; Haitian Creole
    Hebrew
    Hindi
    Croatian
    Hungarian
    Armenian
    Interlingua (International Auxiliary Language Association)
    Indonesian
    Icelandic
    Italian
    Javanese
    Japanese
    Kannada
    Georgian
    Korean
    Kurdish
    Latin
    Latvian
    Lithuanian
    Malayalam
    Marathi
    Macedonian
    Malagasy
    Mongolian
    Maori
    Malay
    Burmese
    Low German; Low Saxon; German, Low; Saxon, Low
    Nepali
    Nepal Bhasa; Newari
    Dutch; Flemish
    Norwegian Nynorsk; Nynorsk, Norwegian
    Norwegian
    Occitan (post 1500)
    Polish
    Portuguese
    Quechua
    Romanian; Moldavian; Moldovan
    Russian
    Yakut
    Sicilian
    Scots
    Slovak
    Slovenian
    Spanish; Castilian
    Albanian
    Serbian
    Swahili
    Swedish
    Tamil
    Tatar
    Telugu
    Tajik
    Tagalog
    Thai
    Turkish
    Ukrainian
    Urdu
    Uzbek
    Vietnamese
    Waray
    Yiddish
    Yoruba
    Chinese
  • Additional Information
    • Publication Information:
      Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
    • Publication Date:
      2011
    • Collection:
      LINDAT-Clarin: Repository (Centre for Language Research Infrastructure in the Czech Republic)
    • Abstract:
      A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected using the W2C toolset: http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
    • File Description:
      application/x-gzip; text/plain; charset=utf-8; downloadable_files_count: 122
    • Relation:
      http://hdl.handle.net/11858/00-097C-0000-0022-60D6-1
    • Online Access:
      http://hdl.handle.net/11858/00-097C-0000-0022-6133-9
    • Rights:
      Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) ; http://creativecommons.org/licenses/by-sa/3.0/ ; PUB
    • Accession Number:
      edsbas.2821386D