Farasa (which means “insight” in Arabic), is a fast and accurate text processing toolkit for Arabic text. Farasa can do segmentation, lemmatization, POS tagging, Arabic diacritization, dependency parsing, constituency parsing, named-entity recognition, and spell-checking.
Farasa
Try Farasa
Farasa toolkit is available as a RESTful API. the following are code snippets to use Farasa web API into you preferred programming languages:

import http.client

conn = http.client.HTTPSConnection(“farasa-api.qcri.org”) payload = “{\”text\”: \”هذا مثال بسيط\”}”

headers = { ‘content-type’: “application/json”, ‘cache-control’: “no-cache”, }

conn.request(“POST”, “/msa/webapi/segmenter”, payload, headers)

res = conn.getresponse()

data = res.read()

print(data.decode(“utf-8”))

var settings = {
    “async”: true,
    “crossDomain”: true,
    “url”: “https://farasa-api.qcri.org/msa/webapi/segmenter”,
    “method”: “POST”,
    “headers”: {
          “content-type”: “application/json”,
          “cache-control”: “no-cache”,
    },
    “processData”: false,
    “data”: “{\”text\”: \”هذا مثال بسيط\”}”
}

$.ajax(settings).done(function (response) {
console.log(response);
});

<?php $request = new HttpRequest();

$request->setUrl(‘https://farasa-api.qcri.org/msa/webapi/segmenter’);

$request->setMethod(HTTP_METH_POST);

$request->setHeaders(array(
‘cache-control’ => ‘no-cache’,
‘content-type’ => ‘application/json’
));

$request->setBody(‘{“text”: “هذا مثال بسيط”}’);

try {
      $response = $request->send();
      echo $response->getBody();
} catch (HttpException $ex) {
      echo $ex;
}

HttpResponse response = Unirest.post(“https://farasa-api.qcri.org/msa/webapi/segmenter”)

.header(“content-type”, “application/json”)
.header(“cache-control”, “no-cache”)
.body(“{\”text\”: \”هذا مثال بسيط\”}”)
.asString();

curl  –request POST
        –url https://farasa-api.qcri.org/msa/webapi/segmenter
        –header ‘cache-control: no-cache’
        –header ‘content-type: application/json’
        –data ‘{“text”: “هذا مثال بسيط”}’

Publications

  • Ahmed Abdelali, Kareem Darwish, Nadir Durrani, Hamdy Mubarak. 2016.Farasa: A Fast and Furious Segmenter for Arabic. NAACL-2016
  • Kareem Darwish and Hamdy Mubarak. 2016. Farasa: A New Fast and Accurate Arabic Word Segmenter. LREC-2016.
  • Zhang, Yuan, Chengtao Li, Regina Barzilay, and Kareem Darwish. “Randomized Greedy Inference for Joint Segmentation, POS Tagging and Dependency Parsing.” In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 42-52. 2015.
  • Hamdy Mubarak, Kareem Darwish, Ahmed Abdelali. 2015. “QCRI@QALB-2015 Shared Task:Correction of Arabic Text for Native and Non-Native Speakers’ Errors”. Proceedings of the ACL 2015 Second Workshop on Arabic Natural Language Processing.
  • Kareem Darwish. 2013. Named Entity Recognition using Cross-lingual Resources: Arabic as an Example. ACL-2013.
  • Kareem Darwish, Wei Gao. 2014. Simple Effective Microblog Named Entity Recognition: Arabic as an Example. LREC-2014.
Close Menu