Farasa – ALT Website

Text Hover

Farasa (which means “insight” in Arabic), is a fast and accurate text processing toolkit for Arabic text. Farasa can do segmentation, lemmatization, POS tagging, Arabic diacritization, dependency parsing, constituency parsing, named-entity recognition, and spell-checking.

Farasa toolkit is available as a RESTful API. the following are code snippets to use Farasa web API into you preferred programming languages:

Segmentation

import http.client

conn = http.client.HTTPSConnection(“farasa-api.qcri.org”) payload = “{”text”: ”هذا مثال بسيط”}”

headers = { ‘content-type’: “application/json”, ‘cache-control’: “no-cache”, }

conn.request(“POST”, “/msa/webapi/segmenter”, payload, headers)

res = conn.getresponse()

data = res.read()

print(data.decode(“utf-8”))

This is HTML Code Example from Massive Dynamic Theme

Lemmatization

import http.client
conn = http.client.HTTPSConnection(“farasa-api.qcri.org”) payload = “{”text”: ”هذا مثال بسيط”}”
headers = { ‘content-type’: “application/json”, ‘cache-control’: “no-cache”, }
conn.request(“POST”, “/msa/webapi/lemma”, payload, headers)
res = conn.getresponse()
data = res.read()
print(data.decode(“utf-8”))

var settings = {
“async”: true,
“crossDomain”: true,
“url”: “https://farasa-api.qcri.org/msa/webapi/lemma”,
“method”: “POST”,
“headers”: {
“content-type”: “application/json”,
“cache-control”: “no-cache”,
},
“processData”: false,
“data”: “{”text”: ”هذا مثال بسيط”}”
}

$.ajax(settings).done(function (response) {
console.log(response);
});

HttpResponse response = Unirest.post(“https://farasa-api.qcri.org/msa/webapi/lemma”)

.header(“content-type”, “application/json”)
.header(“cache-control”, “no-cache”)
.body(“{”text”: ”هذا مثال بسيط”}”)
.asString();

Part-Of-Speech Tagging

import http.client
conn = http.client.HTTPSConnection(“farasa-api.qcri.org”) payload = “{”text”: ”هذا مثال بسيط”}”
headers = { ‘content-type’: “application/json”, ‘cache-control’: “no-cache”, }
conn.request(“POST”, “/msa/webapi/pos”, payload, headers)
res = conn.getresponse()
data = res.read()
print(data.decode(“utf-8”))

var settings = {
“async”: true,
“crossDomain”: true,
“url”: “https://farasa-api.qcri.org/msa/webapi/pos”,
“method”: “POST”,
“headers”: {
“content-type”: “application/json”,
“cache-control”: “no-cache”,
},
“processData”: false,
“data”: “{”text”: ”هذا مثال بسيط”}”
}

$.ajax(settings).done(function (response) {
console.log(response);
});

HttpResponse response = Unirest.post(“https://farasa-api.qcri.org/msa/webapi/pos”)

.header(“content-type”, “application/json”)
.header(“cache-control”, “no-cache”)
.body(“{”text”: ”هذا مثال بسيط”}”)
.asString();

Diacritization

import http.client
conn = http.client.HTTPSConnection(“farasa-api.qcri.org”) payload = “{”text”: ”هذا مثال بسيط”}”
headers = { ‘content-type’: “application/json”, ‘cache-control’: “no-cache”, }
conn.request(“POST”, “/msa/webapi/diacritizeV2”, payload, headers)
res = conn.getresponse()
data = res.read()
print(data.decode(“utf-8”))

var settings = {
“async”: true,
“crossDomain”: true,
“url”: “https://farasa-api.qcri.org/msa/webapi/diacritizeV2”,
“method”: “POST”,
“headers”: {
“content-type”: “application/json”,
“cache-control”: “no-cache”,
},
“processData”: false,
“data”: “{”text”: ”هذا مثال بسيط”}”
}

$.ajax(settings).done(function (response) {
console.log(response);
});

HttpResponse response = Unirest.post(“https://farasa-api.qcri.org/msa/webapi/diacritizeV2”)
.header(“content-type”, “application/json”) .header(“cache-control&##8221;, “no-cache”) .body(“{”text”: ”هذا مثال بسيط”}”) .asString();

For more details about Farasa usage, visit this link

Publications

A. Abdelali, K. Darwish, N. Durrani, and H. Mubarak, “Farasa: a fast and furious segmenter for arabic,” in 15th annual conference of the north american chapter of the association for computational linguistics: human language technologies, 2016, p. 11–16.
[BibTeX]

@inproceedings{abdelali2016farasa,
title={Farasa: A Fast and Furious Segmenter for Arabic},
author={Abdelali, Ahmed and Darwish, Kareem and Durrani, Nadir and Mubarak, Hamdy},
booktitle={15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages={11--16},
year={2016},
organization={Association for Computational Linguistics}
}

K. Darwish and H. Mubarak, “Farasa: a new fast and accurate arabic word segmenter,” in Proceedings of the tenth international conference on language resources and evaluation (lrec 2016), 2016.
[BibTeX]

@inproceedings{darwish2016farasa,
title={Farasa: A New Fast and Accurate Arabic Word Segmenter},
author={Darwish, Kareem and Mubarak, Hamdy},
booktitle={Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
year={2016},
organization={European Language Resources Association (ELRA)}
}

Y. Zhang, C. Li, R. Barzilay, and K. Darwish, “Randomized greedy inference for joint segmentation, pos tagging and dependency parsing,” in Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: human language technologies, 2015, p. 42–52.
[BibTeX]

@inproceedings{zhang2015randomized,
title={Randomized greedy inference for joint segmentation, POS tagging and dependency parsing},
author={Zhang, Yuan and Li, Chengtao and Barzilay, Regina and Darwish, Kareem},
booktitle={Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
pages={42--52},
year={2015}
}

K. Darwish, “Named entity recognition using cross-lingual resources: arabic as an example,” in Proceedings of the 51st annual meeting of the association for computational linguistics, 2013, p. 1558–1567.
[BibTeX]

@inproceedings{darwish2013named,
title={Named Entity Recognition using Cross-lingual Resources: Arabic as an Example},
author={Darwish, Kareem},
booktitle={Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics},
pages={1558--1567},
year={2013},
organization={Association for Computational Linguistics}
}

K. Darwish and W. Gao, “Simple effective microblog named entity recognition: arabic as an example,” in International conference on language resources and evaluation, 2014.
[BibTeX]

@inproceedings{darwish2014simple,
title={Simple Effective Microblog Named Entity Recognition: Arabic as an Example},
author={Darwish, Kareem and Gao, Wei},
booktitle={International Conference on Language Resources and Evaluation},
year={2014}
}

H. Mubarak and K. Darwish, “Automatic correction of arabic text: a cascaded approach,” in Proceedings of the emnlp 2014 workshop on arabic natural language processing (anlp), 2014, p. 132–136.
[BibTeX]

@inproceedings{mubarak2014automatic,
title={Automatic correction of arabic text: a cascaded approach},
author={Mubarak, Hamdy and Darwish, Kareem},
booktitle={Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP)},
pages={132--136},
year={2014}
}

H. Mubarak, K. Darwish, and A. Abdelali, “Qcri $@$ qalb-2015 shared task: correction of arabic text for native and non-native speakers’ errors,” in Proceedings of the second workshop on arabic natural language processing, 2015, p. 150–154.
[BibTeX]

@inproceedings{mubarak2015qcri,
title={QCRI $@$ QALB-2015 Shared Task: Correction of Arabic Text for Native and Non-Native Speakers’ Errors},
author={Mubarak, Hamdy and Darwish, Kareem and Abdelali, Ahmed},
booktitle={Proceedings of the Second Workshop on Arabic Natural Language Processing},
pages={150--154},
year={2015}
}

K. Darwish, A. Abdelali, H. Mubarak, and M. Eldesouki, “Arabic diacritic recovery using a feature-rich bilstm model,” Arxiv preprint arxiv:2002.01207, 2020.
[BibTeX]

@article{darwish2020arabic,
title={Arabic Diacritic Recovery Using a Feature-Rich biLSTM Model},
author={Darwish, Kareem and Abdelali, Ahmed and Mubarak, Hamdy and Eldesouki, Mohamed},
journal={arXiv preprint arXiv:2002.01207},
year={2020}
}

H. Mubarak, A. Abdelali, K. Darwish, M. Eldesouki, Y. Samih, and H. Sajjad, “A system for diacritizing four varieties of Arabic,” in In proceedings of the empirical methods in natural language processing (emnlp), 2019.
[BibTeX]

@inproceedings{diacritic2019emnlp,
title={A System for Diacritizing Four Varieties of {Arabic}},
author={Hamdy Mubarak and Ahmed Abdelali and Kareem Darwish and Mohamed Eldesouki and Younes Samih and Hassan Sajjad},
booktitle={In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP)},
year={2019},
month={November},
}

H. Mubarak, A. Abdelali, H. Sajjad, Y. Samih, and K. Darwish, “Highly Effective Arabic Diacritization using Sequence to Sequence Modeling,” in Proceedings of the annual conference of the north american chapter of the association for computational linguistics: human language technologies (naacl), 2019.
[BibTeX]

@InProceedings{mubarak:2019:NAACL,
title={{Highly Effective Arabic Diacritization using Sequence to Sequence Modeling}},
author={Hamdy Mubarak and Ahmed Abdelali and Hassan Sajjad and Younes Samih and Kareem Darwish},
booktitle={Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)},
year={2019},
Month = {June},
}

A. Abdelali, M. Attia, Y. Samih, K. Darwish, and H. Mubarak, “Diacritization of maghrebi arabic sub-dialects,” Arxiv preprint arxiv:1810.06619, 2018.
[BibTeX]

@article{abdelali2018diacritization,
title={Diacritization of maghrebi arabic sub-dialects},
author={Abdelali, Ahmed and Attia, Mohammed and Samih, Younes and Darwish, Kareem and Mubarak, Hamdy},
journal={arXiv preprint arXiv:1810.06619},
year={2018}
}

K. Darwish, H. Mubarak, A. Abdelali, M. Eldesouki, Y. Samih, R. Alharbi, M. Attia, W. Magdy, and L. Kallmeyer, “Multi-dialect arabic pos tagging: a crf approach,” in In 11th edition of the language resources and evaluation conference, 2018.
[BibTeX]

@inproceedings{darwish2018multi,
title={Multi-Dialect Arabic POS Tagging: A CRF Approach},
author={Darwish, Kareem and Mubarak, Hamdy and Abdelali, Ahmed and Eldesouki, Mohamed and Samih, Younes and Alharbi, Randah and Attia, Mohammed and Magdy, Walid and Kallmeyer, Laura},
booktitle={In 11th edition of the Language Resources and Evaluation Conference},
year={2018},
organization={Miyazaki (Japan).}
}

K. Darwish, A. Abdelali, H. Mubarak, Y. Samih, and M. Attia, “Diacritization of moroccan and tunisian arabic dialects: a crf approach,” in Proceedings of the 4th arabic natural language processing workshop (wanlp-2018), the 11th edition of the language resources and evaluation conference, 2018.
[BibTeX]

@inproceedings{darwish2018diacritization,
title={Diacritization of Moroccan and Tunisian Arabic Dialects: A CRF Approach},
author={Darwish, Kareem and Abdelali, Ahmed and Mubarak, Hamdy and Samih, Younes and Attia, Mohammed},
booktitle={Proceedings of The 4th Arabic Natural Language Processing Workshop (WANLP-2018), the 11th edition of the Language Resources and Evaluation Conference},
year={2018},
organization={Miyazaki (Japan).}
}

M. Eldesouki, Y. Samih, A. Abdelali, M. Attia, H. Mubarak, K. Darwish, and K. Laura, “Arabic multi-dialect segmentation: bi-lstm-crf vs. svm,” Arxiv preprint arxiv:1708.05891, 2017.
[BibTeX]

@article{eldesouki2017arabic,
title={Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM},
author={Eldesouki, Mohamed and Samih, Younes and Abdelali, Ahmed and Attia, Mohammed and Mubarak, Hamdy and Darwish, Kareem and Laura, Kallmeyer},
journal={arXiv preprint arXiv:1708.05891},
year={2017}
}

H. Mubarak, K. Darwish, and W. Magdy, “Abusive language detection on arabic social media,” in Proceedings of the first workshop on abusive language online, 2017, p. 52–56.
[BibTeX]

@inproceedings{mubarak2017abusive,
title={Abusive language detection on Arabic social media},
author={Mubarak, Hamdy and Darwish, Kareem and Magdy, Walid},
booktitle={Proceedings of the first workshop on abusive language online},
pages={52--56},
year={2017}
}