Resources – ALT Website

Resources that been created by our team!

We have created multiple packages that have been used with various projects.

Modern Standard Arabic Pronunciation Lexicon

There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable

Kaldi Gale Recipe

This package includes files for building Arabic ASR using the GALE database from LDC and the Kaldi Speech Recognition Toolkit. The test set is a mix of conversational and report speech

QCRI Educational Domain (QED) Corpus

The QED Corpus is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform. The current release of the QED Corpus v1.4 contain 20 languages distributed over 44620 files.

Annotated Al Jazeera Dialectal Speech Corpus

This corpus contains speech from Al Jazeera with both human-annotated and automatically-assigned labels for MSA and four major dialect groups (Egyptian, Levantine, North African, Gulf).

Bilingual Corpus of Parallel Tweets

A collection of parallel Arabic-English tweets and an additional list of Twitter accounts that post parallel tweets.

Arabic Fact-Checking and Stance Detection Corpus

Rationale, relevant document retrieval and fact checking. The corpus contains 422 claims that are made about the war in Syria and related Middle East political issues, where each claim is labeled for factuality, indicating whether they are True or False

WAW Corpus

WAW Corpus is a bilingual translation and interpretation corpus in Arabic and English. WAW corpus comprises recordings from three international conferences namely WISE 2013, ARC’14 and WISH. These recordings contains both original speaker and the interpreter; their tanscripts and their translation..

QCRI Arabic Dialects Identification (QADI) Corpus

QCRI Arabic Dialects Identification (QADI) is a Country level Arabic dialects identification (DI) dataset. It provides a collection for benchmarking DI task.

Tanbih

The Tanbih mega-project aims to limit the effect of 'fake news', propaganda and media bias by making users aware of what they are reading. The team believes that promoting media literacy and critical thinking is the best way to address disinformation and 'fake news'.

QCRI Dialectal Arabic Resources

A list of resources for dailectal Arabic open to researchers. These resources have been compiled at QCRI for research purposes and pilot experiments for various Arabic dialects.

QATIP

Continuous text recognition and works best for entire pages of historic documents with a challenging script.

AraBench

AraBench offers 4 coarse, 15 fine-grained and 25 city-level dialect categories, belonging to diverse genres, such as media, chat, religion and travel with varying level of dialectness.

Conferences contribution

We have contribute to many conferences happend worldwide

SEMEval-2020

International Workshop on Semantic Evaluation

Barcelona, Spain

Collocated with The 28th International Conference on Computational Linguistics (COLING-2020).

SEMEval-2019

International Workshop on Semantic Evaluation

Minneapolis, USA

collocated with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019).

SEMEval-2018

International Workshop on Semantic Evaluation

New Orleans, LA, USA

Collocated with the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018)

SEMEval-2017

International Workshop on Semantic Evaluation (SemEval-2014)

Vancouver, Canada

Collocated with the 55th annual meeting of the Association for Computational Linguistics (ACL)

SEMEval-2016

International Workshop on Semantic EvaluationSan Diego, CaliforniaCollocated with the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

SEMEval-2015

International Workshop on Semantic Evaluation

Denver, Colorado

Collocated with NAACL-2015

SEMEval-2014

International Workshop on Semantic Evaluation (SemEval-2014)

Dublin, Ireland

Collocated with COLING and *Sem