Fifth Workshop on NLP for Similar Languages, Varieties and Dialects

VarDial 2018 will be co-located with COLING 2018 in Santa Fe, United States. 

VarDial is a well-established series of workshops promoting a forum for scholars working on a range of topics related to the study of diatopic linguistic variation from a computational perspective.

Previous editions were the first VarDial 2014 workshop co-located with COLING, the joint workshop LT4VarDial 2015 co-located with RANLP, VarDial 2016 co-located with COLING, and VarDial 2017 co-located with EACL.

We anticipate discussion on computational methods and on language resources for closely related languages and language varieties. Corpus-driven exploitation of different degrees of linguistic variation such as lexicon and grammar is another topic of interest.

Papers presented at the previous editions of VarDial focused on: machine translation between closely related languages, adaptation of POS taggers and parsers for similar languages and language varieties, compilation of corpora for language varieties, spelling normalization, computational approaches to the study of mutual intelligibility, and finally the discrimination or identification of similar languages and dialects.

We welcome papers dealing with one or more of the following topics:

  • Language resources and tools for similar languages, varieties and dialects
  • Adaptation of tools (taggers, parsers) for similar languages, varieties and dialects;
  • Evaluation of language resources and tools when applied to language varieties;
  • Reusability of language resources in NLP applications (e.g., for machine translation, POS tagging, syntactic parsing, etc.);
  • Corpus-driven studies in dialectology and language variation;
  • Computational approaches to the study of mutual intelligibility between dialects and similar languages;
  • Automatic identification of lexical variation;
  • Automatic classification of language varieties;
  • Text similarity and adaptation between language varieties;
  • Linguistic issues in the adaptation of language resources and tools (e.g., semantic discrepancies, lexical gaps, false friends);
  • Machine translation between closely related languages, language varieties and dialects.

Examples of language varieties include pluricentric languages like English, Spanish, French or Portuguese and examples of pairs of related languages include Swedish-Norwegian, Bulgarian-Macedonian, Serbian-Bosnian, Russian-Ukrainian, Irish-Gaelic Scottish, Malay-Indonesian, Turkish–Azerbaijani, Mandarin-Cantonese, Hindi–Urdu, and many other.

Together with VarDial 2018 we are organizing an evaluation campaign with five shared tasks.