HBKU - QCRI

How to use FARASA Packages

The usage of all Farasa packages (segmentation module, POS tagger, and the parser) is almost the same. you can use each package as a standalone application or as a library inside another software.The following usage of Farasa considers only Linux, Mac and Windows operating systems. For Java version, we tried Java 7 and Java 8 to build and run the Farasa packages. Earlier version of java may not be suitable to build the packages due to some dependencies(where the encoding is utf-8).

 


 

Farasa Segmenter Module

There are two options to download Farasa Segmenter; either downloading just the jar file or downloading the entire source code zipped; “FarasaSegmenter.tar.gz”. In case downloading the jar file, you can skip the building and compiling step.

In case downloading the source code of Farasa Segmenter module from the link sent to your email (through the registration), unzip the file and then change to the home directory of the project “FarasaSegmenter” and execute the following commands into the terminal to compile the source code and build the jar file:
ant clean
ant jar

To run the package as a standalone, there two ways; either in an interactive mode, just run the following command:

java -jar dist/farasaSeg.jar

Or, just pass a text file (where the encoding is utf-8) as input to the package and specify the output file name as following:

 

java -jar dist/farasaSeg.jar -i <inputfile> -o <output_file>

To use Farasa segmentation package as a library in your application, just build it as shown before using the shell command “ant jar”. Then import the jar file farasaSeg.jar into your project. The following is an example few line of code to show how to use Farasa segmentation package


 

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;

public class TryingSeg {

    ...

    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException {
         ...

        Farasa farasa = new Farasa();
        ArrayList<String> output = farasa.segmentLine("النص المراد معالجته");
                for(String s: output)
                System.out.println(s);
         ...
     }
     ...
}

Farasa POS Tagger

As Farasa Segmenter, there are two options to use Farasa POS Tagger; either downloading just the jar file or downloading the entire source code zipped; “FarasaPOS.tar.gz”. In case downloading the jar file, you can skip the building and compiling step. You just need to create a directory “lib” in the same level where the jar file is and copy the jar file of Farasa Segmenter module to this directory. Furthermore, download the file “weka.jar” and place it in the directory as well.

In case downloading the source code of Farasa POS Tagger from the link sent to your email (through the registration), unzip the file and then change to the home directory of the project “FarasaPOS” and execute the following commands into the terminal to compile the source code and build the jar file:
ant clean
ant jar

To run the package as a standalone, just pass a text file (where the encoding is utf-8) as input to the package and specify the output file name as following:

java -jar dist/FarasaPOSJar.jar -i <inputfile> -o <output_file>

To use Farasa POS Tagger as a library in your application, just build it as before using the shell script file “make.sh”. Then import the jar file FarasaPOS.jar into your project. The following is an example fews line of code to show how to use Farasa POS module

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.pos.Sentence;
import com.qcri.farasa.pos.Clitic;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;

public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {


        Farasa farasa = new Farasa();
        FarasaPOSTagger farasaPOS = new FarasaPOSTagger(farasa);

        ArrayList<String> segOutput = farasa.segmentLine("النص المراد معالجته");

        Sentence sentence = farasaPOS.tagLine(segOutput);

        for (Clitic w : sentence.clitics)
        System.out.println(w.surface + "/" + w.guessPOS + ((w.genderNumber!="")?"-"+w.genderNumber:"")+" ");
     }
}
Farasa Diacritizer

Farasa Diacritizer is built using a two cascaded steps. It performs word core diaciritization and next, case-ending. The steps uses a combination of a Viterbi decoder and SVM ranker to restore the diacritization. See more details in the “Arabic Diacritization: Stats, Rules, and Hacks” paper.  Using Farasa diacritizer package works in the same manner as other Farasa packages. Compiling the dicritizer use the command “ant jar”. For runnning the diacritizer, When you pass the text file as input to the package and specify the output file name as follows:
java -jar dist/farasaSeg.jar -i <inputfile> -o <output_file>

To use Farasa Diacritizer as a library in your application, just build it (or download the already built one) and then import the jar file FarasaDiacritize.jar into your project. The following is an example few lines of code to show how to use Farasa POS module.


 

package tryingfarasa;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.diacritize.DiacritizeText;


public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {

        Farasa farasa = new Farasa();
        FarasaPOSTagger farasaPOS = new FarasaPOSTagger(farasa);

        String dataDirectory = "/var/www/farasa/data/";
        DiacritizeText dt = new DiacritizeText(dataDirectory, "all-text.txt.nocase.blm", farasa, tagger);
        String diacritized = dt.diacritize("النص المراد معالجته");
     }
}
Farasa Constituency Parser

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.pos.Sentence;
import com.qcri.farasa.pos.Clitic;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;

public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {


        Farasa farasa = new Farasa();
        FarasaPOSTagger farasaPOS = new FarasaPOSTagger(farasa);

        ArrayList<String> segOutput = farasa.segmentLine("النص المراد معالجته");

        Sentence sentence = farasaPOS.tagLine(segOutput);

        for (Clitic w : sentence.clitics)
        System.out.println(w.surface + "/" + w.guessPOS + ((w.genderNumber!="")?"-"+w.genderNumber:"")+" ");
     }
}
Farasa Named-Entity Recognizer

In case of using just the jar file, download it and create a directory with the name “lib” next to it. Furthermore, you need to download and place the next set of jar files in the lib directory:


 

In the command prompt, navigate to the directory where the jar file is and type the following command:
java -jar FarasaNERJar.jar -i <inputfile> -o <output_file>

To use Farasa NER within your application, follow the next example code:

package tryingfarasa;

import com.qcri.farasa.segmenter.Farasa;
import com.qcri.farasa.pos.FarasaPOSTagger;
import com.qcri.farasa.ner.ArabicNER;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;

public class TryingFarasaPOS {
    public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException,
    UnsupportedEncodingException, InterruptedException, Exception {





        ArrayList<String> segOutput = farasa.segmentLine("النص المراد معالجته");

        Sentence sentence = farasaPOS.tagLine(segOutput);

        for (Clitic w : sentence.clitics)
        System.out.println(w.surface + "/" + w.guessPOS + ((w.genderNumber!="")?"-"+w.genderNumber:"")+" ");

public static void main(String[] args) throws IOException, FileNotFoundException, ClassNotFoundException, UnsupportedEncodingException, InterruptedException, Exception {
        Farasa segmenter = new Farasa();
        FarasaPOSTagger tagger = new FarasaPOSTagger(segmenter);
        ArabicNER ner = new ArabicNER(segmenter, tagger);



        ArrayList output = ner.tagLine("النص المراد معالجته");

	int loc = 0;
	for (String s : output)
	    {
		String plusSign = " ";
		if (loc == 0)
		{
		    plusSign = "";
		}
                System.out.println(plusSign + s.trim());

		loc++;
	    }
     }
}