Modern Standard Arabic Pronunciation Lexicon

This package includes a pronunciation dictionary for Modern Standard Arabic ASR. It has been used in combination with the Kaldi Gale Recipe.
» Go to page

Kaldi Gale Recipe

This package includes files for building Arabic ASR using the GALE database from LDC and the Kaldi Speech Recognition Toolkit. The test set is a mix of conversational and report speech
» Go to page

QCRI Educational Domain (QED) Corpus

The QED Corpus is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform. The current release of the QED Corpus v1.4 contain 20 languages distributed over 44620 files.
» Go to page

Annotated Al Jazeera Dialectal Speech Corpus

This corpus contains speech from Al Jazeera with both human-annotated and automatically-assigned labels for MSA and four major dialect groups (Egyptian, Levantine, North African, Gulf).
» Go to page