Task 13: Taxonomy Extraction Evaluation (TExEval-2)

Important Announcements


DEADLINE EXTENSION The deadline for submitting system description papers was extended to March 4, 2016.

SYSTEM RANKING The final raking of the submitted systems is now available here. Also released, the complete test data including term lists and gold standard taxonomies.

We encourage all the participants to submit a system description paper, which is due on February 26, 2016. Task participants are asked to cite the task description paper in system description papers as follows:

  title={Semeval-2016 task 13: Taxonomy Extraction Evaluation (TExEval-2)},
  author={Bordea, Georgeta and Lefever, Els and Buitelaar, Paul},
  booktitle={Proceedings of the 10th International Workshop on Semantic Evaluation},
  organization={Association for Computational Linguistics}

SUBMISSION PROCEDURE All systems will be submitted through the START conference management system (https://www.softconf.com/naacl2016/SemEval2016/user/), which is the same system used to later submit the system description papers.


DEADLINE EXTENSION The deadline for submitting system runs was extended to January 17, 2016. Participants are asked to submit only one system run for each domain and language.


TEST DATA RELEASE Test data is now available from the Data and Tools page. Participants are instructed not to use any of the resources used to construct the gold standards:

  • hypernym-hyponym relations from WordNet
  • skos:broader and skos:narrower relations from EuroVoc
  • the Google product taxonomy
  • the Taxonomy of fields and their subfields provided for the National Academies of Sciences, Engineering, and Medicine

The file naming convention for system runs is to use the same name as the one used for test data, but change the file extension from ".terms" to ".taxo". The following format should be used for the system output for all the subtasks:

relation_id <TAB> term <TAB> hypernym


Test data is released for the following subtasks:

  • Taxonomy construction
  • Hypernym identification
  • Multilingual taxonomy construction
  • Multilingual hypernym identification

Please note that the relation directionality subtask that was previously announced was cancelled.




Taxonomies are useful tools for content organisation, navigation, and retrieval, providing valuable input for semantically intensive tasks such as question answering and textual entailment. In general, a hierarchical relation is any asymmetrical relation that indicates subordination between two terms, but in this task we focus on hyponym-hypernym relations. Taxonomy learning from text is a challenging task that can be divided in several subtasks, including term extraction, relation discovery and taxonomy construction. This task is concerned with automatically extracting hierarchical relations from text and subsequent taxonomy construction, therefore we make the assumption that a list of terms is readily available. This simplifies the evaluation by providing a common ground for all the systems. Nevertheless, participants are allowed to add additional nodes, i.e. terms, in the hierarchy as they consider appropriate. Terms will be extracted from existing, well known taxonomies, providing participants with a domain lexicon that has to be organised in a hierarchical structure. Existing approaches for relation discovery from text rely on lexico­-syntactic patterns, co­occurrence information, substring inclusion, or exploit semantic relations provided in textual definitions. This stage usually produces a large number of noisy, inconsistent relations, which assign multiple parents to a node and contain cycles. Hence, the third stage of taxonomy learning, taxonomy construction, focuses on the overall structure of the resulting graph and aims to organise terms in a hierarchical structure, more specifically a directed acyclic graph.



The first TExEval shared task [¹], organised as part of SemEval 2015, introduced a monolingual dataset that covers terms and hierarchical relations from four new domains that were not previously considered for this task, bringing together 6 teams that submitted 45 automatically constructed taxonomies. Performance was evaluated across domains, considering commonsense knowledge as well as technical domains gathered from WordNet and other well known taxonomies. The second TExEval shared task aims to extend this experimental setting to a multilingual setting, covering English, French, Italian and Dutch. A main challenge faced by the participants in the first TExEval was that no corpus was provided by the task organisers. The plan is to address this issue by providing a Wikipedia-based corpus of domain­-specific documents. Depending on the selected approach, a system may or may not require large amounts of text to extract relations between terms, therefore participants will be allowed to extend this corpus as they consider appropriate. The task will be structured in several subtasks, including subtasks for relation directionality, hypernym identification, taxonomy construction, and a multilingual subtask.


For the relation directionality task the input is a list of pairs of terms and the task is to identify which is the hypernym and which is the hyponym, in other words to simply find the direction of the relation.
Input: (lion, animal) Output: lion -> animal

For hypernym identification the input is a list of terms and the task is to find all the hypernyms of each term.
Input: lion Output: animal, mammal


The construction of taxonomies is a challenging task even for humans but evaluating a taxonomy is not a trivial task either. In this shared task taxonomies are evaluated through comparison with gold standard relations collected from WordNet and other well known, openly available taxonomies. Gold standard relations will be gathered from manually constructed taxonomies, classification schemes and/or ontologies, where available depending on the domain. This will be complemented by a manual evaluation of relations that are not covered by the gold standard and through quantitative and qualitative structural analysis of the resulting graph. Structural criteria will include the presence of cycles, the number of intermediate nodes compared to leaf nodes, and the number of over­generic relations with the root node.  Submitted relations between terms will be evaluated against collected gold standards using standard precision, recall and F1 measures. Also, an evaluation approach used for comparing hierarchical clusters will be used to evaluate the overall structure of the taxonomy against the structure of gold standard taxonomies (Velardi et al. 2013). The final ranking of the systems will be computed through a voting approach, combining the results of each type of evaluation.



The TExEval shared task will be supported by the following projects:

Unit for NLP, Insight Centre for Data Analytics at the National University of Ireland, Galway, lead by Dr. Paul Buitelaar

PARIS Personalised AdveRtisements buIlt from web Sources, lead by Prof. dr. Veronique Hoste, part of the SBO Program of the IWT IWT­SBO.



[¹] Georgeta Bordea, Paul Buitelaar, Stefano Faralli and Roberto Navigli (2015). SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval) SemEval 2015, co-located with NAACL HLT 2015, Denver, Colorado, USA

Contact Info


  • Georgeta Bordea - Insight, Centre for Data Analytics, National University of Ireland, Galway
  • Els Lefever - LT3 language and translation team at the Faculty of Arts and Philosophy at Ghent University
  • Paul Buitelaar - Insight, Centre for Data Analytics, National University of Ireland, Galway

email :

Other Info


  • The deadline for submitting system description papers was extended to March 4, 2016
  • Final rankings made available on February 5, 2016
  • English, Dutch, French and Italian test data released on December 15, 2015
  • The task is organised in an unsupervised setting therefore no training data will be provided
  • Updated Dutch trial taxonomy and released Italian and French trial terms and taxonomy on July 31, 2015
  • Dutch trial terms and taxonomy released on July 12, 2015
  • Trial data and tools released on June 30, 2015