Data and Tools

Target Domain

The four selected target domains are:
- Chemical: the taxonomy rooted on "chemical", examples of terminology concepts are ("ammonium carbonate", "beta hydroxybutyric acid", "butyl rubber" );
- Equipment: the taxonomy rooted on "equipment", examples of terminology concepts are ("acoustic modem", "parasail", "clock pendulum");
- Food: the taxonomy rooted on "food", examples of terminology concepts are ("jacket potato", "lemonade", "bolognese pasta sauce");
- Science: the taxonomy rooted on "science", examples of terminology concepts are ( "neuropsychiatry", "craniometry", "microelectronics");


TExEval_testdata_1.0

The terminologies can be downloaded here please note you don't have the 5 days time limit, so outputs may be uploaded before 20th december.

the provided domain terminologies are tab-separated: 

term_id <TAB> term
where:
- term_id: is a term identifier (numeric);
- term: is a domain term.
 

 

TExEval_trialdata_1.2

Download the trial data here.

The trial data package contains the following:

README.txt                      A description
ontolearn_AI.taxo            Artificial Intelligence taxonomy[¹]
ontolearn_AI.taxo.eval   Human evaluation for the Artificial Intelligence taxonomy relations[¹]
WN_plants.taxo               WordNet plants taxonomy 
WN_plants.terms             WordNet plants terminology
WN_vehicles.taxo           WordNet vehicles taxonomy
WN_vehicles.terms         WordNet vehicles terminology

 


FILE FORMAT


The input files format for the taxonomies (.taxo) is a tab-separated fields:

relation_id <TAB> term <TAB> hypernym

where:
- relation_id: is a relation identifier;
- term: is a term of the taxonomy;
- hypernym: is a hypernym for the term.

e.g

0<TAB>cat<TAB>animal
1<TAB>dog<TAB>animal
2<TAB>car<TAB>animal
....

The input files format for the system relation evaluation (.taxo.eval) is a tab-separated fields:

relation_id <TAB> eval

where:
- relation_id: is a relation identifier;
- eval: is an empty string if the relation is good, an "x" otherwise

e.g.
0<TAB>
1<TAB>
2<TAB>x
....


The input files format for the terminologies (.taxo.eval) is a
tab-separated fields:

term_id <TAB> term

where:
- term_id: is a term identifier;
- term: is a domain term.

 

TExEval_tool_1.0

Download the tool package here.

The tool package contains the following files:

README.txt                        A description file
TExEval.jar                          Program for scoring the outputs
runExample.sh                   Linux script for run the example evaluation
example/gold1.taxo           Example gold standard taxonomy
example/sys1.taxo             Example system produced taxonomy
example/sys1.taxo.eval    Example system taxonomy relation evaluation
example/results.txt             Example of the output of the scoring system

INPUT FORMAT

The input file format for the system and gold standard taxonomies is a
tab-separated fields:

relation_id <TAB> term <TAB> hypernym

where:
- relation_id: is a relation identifier;
- term: is a term of the taxonomy;
- hypernym: is a hypernym for the term.

e.g

0<TAB>cat<TAB>animal
1<TAB>dog<TAB>animal
2<TAB>car<TAB>animal
....

The input files format for the system relation evaluation is a
tab-separated fields:

relation_id <TAB> eval

where:
- relation_id: is a relation identifier;
- eval: is an empty string if the relation is good, an "x" otherwise

e.g.
0<TAB>
1<TAB>
2<TAB>x
....

  EVALUATION METRICS

The TExEval.jar is a runnable jar, which measure a system generated taxonomy against a gold standard taxonomy, the measures reported by the program are:
1) A measure to compare the overall structure of the taxonomy against a gold standard, with an approach used for comparing hierarchical clusters[¹];
2) Precision: the number of correct relations over the number of given relations;
3) Recall: the number of relation in common with the gold standard over the number of gold standard relations;


To run TExEval.jar on your linux machine, open a terminal and enter:
"java -jar TExEval.jar system.taxo goldstandard.taxo root results"
or
"java -jar TExEval.jar system.taxo.eval results"

where:
- system.taxo: is the taxonomy produced by your system;
- system.taxo.eval: is the evaluation of the system produced relations;
- goldstandard.taxo: is the gold standard taxonomy;
- root: is the common root node for the system and the goldstandard taxonomies
- result: is the file where the program will write the results.

By running the runExampleVSGoldStandard.sh, the TExEval.jar compare the following system produced taxonomy:

example/sys1.taxo
0 a entity
1 b a
2 c b
3 d b
4 e b

against the following gold standard taxonomy:

example/gold1.taxo
0 a entity
1 b a
2 c b
3 d b
4 e b
5 f e
6 g e
7 h e

producing the following result.txt file

example/results.txt
Taxonomy file ./example/sys1.taxo
Gold Standard file ./example/gold1.taxo
Root entity
level B Weight BxWeight
0 0.18257418583505536 1.0 0.18257418583505536
1 0.18257418583505536 0.5 0.09128709291752768
2 0.18257418583505536 0.3333333333333333 0.06085806194501845
3 0.0 0.25 0.0
Cumulative Measure 0.16066528353484874
Recall from relation overlap 0.625

where:
1) the two first lines report the arguments passed to the jar application
2) a structural comparison of the system taxonomy against the gold standard taxonomy[¹]
3) the estimated Recall


By running the runExamplePrecision.sh, the TExEval.jar compute the Precision from the following Evaluation file for the system produced relation:

example/sys1.taxo.eval
0
1
2
3
4 x

and produce the following result.txt file:

Taxonomy relation evaluation file ./example/sys1.taxo.eval
Precision from relation evaluation 0.8

[¹] Paola Velardi, Stefano Faralli, Roberto Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press, 2013, pp. 665-707.
 

Contact Info

Organizers

  • Dr. Paul Buitelaar - Insight, Centre for Data Analytics, National University of Ireland, Galway
  • Dr. Georgeta Bordea - Insight, Centre for Data Analytics, National University of Ireland, Galway
  • Prof. Roberto Navigli - Linguistic Computing Laboratory Dept. of Computer Science Sapienza University of Rome, Italy
  • Stefano Faralli - Linguistic Computing Laboratory Dept. of Computer Science Sapienza University of Rome, Italy

email :
  • Paul Buitelaar: paul[dot]buitelaar[at]insight-centre[dot]org
  • Georgeta Bordea: georgeta[dot]bordea[at]insight-centre[dot]org
  • Roberto Navigli: navigli[at]di[dot]uniroma1[dot]it
  • Stefano Faralli: faralli[at]di[dot]uniroma1[dot]it

Other Info

Announcements

  • Terminologie released on December 06, 2014
  • Target Domains announced on November 06, 2014
  • Important Dates updated
  • Trial data and tools released on May 30, 2014