Modern Standard Arabic Pronunciation Lexicon
There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable
View MoreKaldi Gale Recipe
This package includes files for building Arabic ASR using the GALE database from LDC and the Kaldi Speech Recognition Toolkit. The test set is a mix of conversational and report speech
View MoreQCRI Educational Domain (QED) Corpus
The QED Corpus is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform. The current release of the QED Corpus v1.4 contain 20 languages distributed over 44620 files.
View MoreAnnotated Al Jazeera Dialectal Speech Corpus
This corpus contains speech from Al Jazeera with both human-annotated and automatically-assigned labels for MSA and four major dialect groups (Egyptian, Levantine, North African, Gulf).
View MoreBilingual Corpus of Parallel Tweets
A collection of parallel Arabic-English tweets and an additional list of Twitter accounts that post parallel tweets.
View MoreArabic Fact-Checking and Stance Detection Corpus
Rationale, relevant document retrieval and fact checking. The corpus contains 422 claims that are made about the war in Syria and related Middle East political issues, where each claim is labeled for factuality, indicating whether they are True or False
View MoreWAW Corpus
WAW Corpus is a bilingual translation and interpretation corpus in Arabic and English. WAW corpus comprises recordings from three international conferences namely WISE 2013, ARC’14 and WISH. These recordings contains both original speaker and the interpreter; their tanscripts and their translation..
View MoreQCRI Arabic Dialects Identification (QADI) Corpus
QCRI Arabic Dialects Identification (QADI) is a Country level Arabic dialects identification (DI) dataset. It provides a collection for benchmarking DI task.
View MoreTanbih
The Tanbih mega-project aims to limit the effect of 'fake news', propaganda and media bias by making users aware of what they are reading. The team believes that promoting media literacy and critical thinking is the best way to address disinformation and 'fake news'.
View MoreQCRI Dialectal Arabic Resources
A list of resources for dailectal Arabic open to researchers. These resources have been compiled at QCRI for research purposes and pilot experiments for various Arabic dialects.
View MoreQATIP
Continuous text recognition and works best for entire pages of historic documents with a challenging script.
View MoreAraBench
AraBench offers 4 coarse, 15 fine-grained and 25 city-level dialect categories, belonging to diverse genres, such as media, chat, religion and travel with varying level of dialectness.
View MoreInternational Workshop on Semantic Evaluation
Barcelona, Spain
Collocated with The 28th International Conference on Computational Linguistics (COLING-2020).
International Workshop on Semantic Evaluation
Minneapolis, USA
collocated with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019).
International Workshop on Semantic Evaluation
New Orleans, LA, USA
Collocated with the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018)
International Workshop on Semantic Evaluation (SemEval-2014)
Vancouver, Canada
Collocated with the 55th annual meeting of the Association for Computational Linguistics (ACL)
International Workshop on Semantic EvaluationSan Diego, CaliforniaCollocated with the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
International Workshop on Semantic Evaluation
Denver, Colorado
Collocated with NAACL-2015
International Workshop on Semantic Evaluation (SemEval-2014)
Dublin, Ireland
Collocated with COLING and *Sem