Item request has been placed!

Item request cannot be made.

Processing Request

Deltacorpus 1.1

Item request has been placed!

Item request cannot be made.

Processing Request

Read More Add to Saved list

Author(s): Mareček, David; Yu, Zhiwei; Zeman, Daniel; Žabokrtský, Zdeněk
Subject Terms:
part of speech; tagging; semi-supervised; cross-language
Document Type:
text
Language:
Belarusian
Bosnian
Bulgarian
Czech
Croatian
Sorbian languages
Macedonian
Polish
Russian
Slovak
Slovenian
Serbian
Ukrainian
Latvian
Lithuanian
Afrikaans
Danish
German
English
Faroese
Western Frisian
Swiss German; Alemannic; Alsatian
Icelandic
Low German; Low Saxon; German, Low; Saxon, Low
Dutch; Flemish
Norwegian Nynorsk; Nynorsk, Norwegian
Norwegian
Scots
Swedish
Yiddish
Aragonese
Asturian; Bable; Leonese; Asturleonese
Catalan; Valencian
French
Galician
Haitian; Haitian Creole
Italian
Latin
Portuguese
Romanian; Moldavian; Moldovan
Spanish; Castilian
Breton
Welsh
Gaelic; Scottish Gaelic
Irish
Greek, Modern (1453-)
Armenian
Albanian
Persian
Kurdish
Tajik
Bengali
Gujarati
Hindi
Marathi
Nepali
Urdu
Amharic
Arabic
Egyptian (Ancient)
Hebrew
Estonian
Finnish
Hungarian
Basque
Georgian
Chuvash
Azerbaijani
Turkish
Uzbek
Tatar
Yakut
Korean
Mongolian
Telugu
Kannada
Malayalam
Tamil
Nepal Bhasa; Newari
Vietnamese
Indonesian
Javanese
Malagasy
Maori
Malay
Tagalog
Waray
Swahili
Esperanto
Interlingua (International Auxiliary Language Association)

Additional Information
- Publication Information:
  Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Publication Date:
  2016
- Collection:
  OLAC: Open Language Archives Community
- Abstract:
  Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia). Changes in version 1.1: 1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset. 2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0. 3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
- Relation:
  http://hdl.handle.net/11234/1-1662; http://hdl.handle.net/11234/1-1743
- Online Access:
  http://hdl.handle.net/11234/1-1743
- Rights:
  Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ; http://creativecommons.org/licenses/by-sa/4.0/
- Accession Number:
  edsbas.E919EEFC

Comments

No Comments.