QCRI Dialectal Arabic Resources
Abstract
This page includes a list of resources for dailectal Arabic open to researchers. These resources have been compiled at QCRI for research purposes and pilot experiments for various Arabic dialects.
Related publications
- A. A. M. E. Y. S. R. A. M. A. W. M. Kareem Darwish Hamdy Mubarak and L. Kallmeyer, “Multi-dialect arabic pos tagging: a crf approach,” in Proceedings of the eleventh international conference on language resources and evaluation (lrec 2018), Paris, France, 7-12 2018.
[BibTeX]@InProceedings{DARWISH18.562, author = {Kareem Darwish ,Hamdy Mubarak ,Ahmed Abdelali ,Mohamed Eldesouki ,Younes Samih ,Randah Alharbi ,Mohammed Attia ,Walid Magdy and Laura Kallmeyer}, title = {Multi-Dialect Arabic POS Tagging: A CRF Approach}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-00-9}, language = {english} }
- S. Younes, A. Mohammed, E. Mohamed, M. Hamdy, A. Ahmed, K. Laura, and D. Kareem, “A neural architecture for dialectal arabic segmentation,” in The third arabic natural language processing workshop (wanlp-2017). eacl 2017, 2017.
[BibTeX]@INPROCEEDINGS{ysamihNDASeg, author={Younes, Samih and Mohammed, Attia and Mohamed, Eldesouki and Hamdy, Mubarak and Ahmed, Abdelali and Laura, Kallmeyer and Kareem, Darwish}, booktitle={The Third Arabic Natural Language Processing Workshop (WANLP-2017). EACL 2017}, title={A Neural Architecture for Dialectal Arabic Segmentation}, year={2017}, month={Apr}, pages={000-000} }
- M. A. K. D. A. A. H. M. Younes Samih Mohamed Eldesouki and L. Kallmeyer, “Learning from relatives: unified dialectal arabic segmentation,” in Conll 2017, 2017.
[BibTeX]@INPROCEEDINGS{ysamihLDASegs, author={Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak and Laura Kallmeyer }, booktitle={CoNLL 2017}, title={Learning from Relatives: Unified Dialectal Arabic Segmentation}, year={2017}, month={Aug}, pages={000-000} }
Download
- Egyptian Arabic Segmentation Training Dataset. Released on Feb. 21, 2017.
- Annotation Guidelines and Four Arabic Dialects Segmentation Dataset. Released on Jun. 12, 2017.
- Four Arabic Dialects POS tagged Dataset. or clone from git. Released on May. 07, 2018.
License
The resources herein provided by (QCRI a member of Qatar Foundation. All Rights Reserved) are licensed under the Apache License, Version 2.0 (the "License"); you may not use them except in compliance with the License. You may obtain a copy of the License here.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.