SemEval-2015 Task 17: Taxonomy Extraction Evaluation

Important Announcement

We encourage all the participants to submit a system description paper, which is due on January 30, 2015. Task participants are asked to cite the task description paper in system description papers.

@inproceedings{task17semeval2015,
title={Semeval-2015 task 17: Taxonomy Extraction Evaluation (TExEval)},
author={Bordea, Georgeta and Buitelaar, Paul and Faralli, Stefano and Navigli, Roberto},
booktitle={Proceedings of the 9th International Workshop on Semantic Evaluation},
year={2015},
organization={Association for Computational Linguistics}
}

Google Group

please register to the following Google group:
https://groups.google.com/forum/#!forum/semeval-task17

The results of the automatic evaluation and the gold standard taxonomies are now available here.

TExEval_testdata_1.0 released

The terminologies can be downloaded here please note you don't have the 5 days time limit, so outputs may be uploaded before 20th december.

After the new scheduling updates from the SemEval-2015 organizers, we decided to announce the target domains before the releases of the domain terminologies (scheduled for December 5, 2014).

You are invited to visit the following pages:
- Important Dates: for the new schedule of the task;
- Data and tools: for the target domains used for evaluation.

Google Group

please register to the following Google group:
https://groups.google.com/forum/#!forum/semeval-task17

INTRODUCTION

Taxonomies are useful tools for content organisation, navigation, and retrieval, providing valuable input for semantically intensive tasks such as question answering [1] and textual entailment [2]. We implemented a task concerned with automatically extracting hierarchical relations from text and subsequent taxonomy construction. A hierarchical relation is any asymmetrical relation that indicates subordination between two terms. However, in this task, the focus is on hyponym-hypernym relations.
Taxonomy learning from text is a challenging task that can be divided in several subtasks, including term extraction, relation discovery, taxonomy construction and taxonomy cleaning.
Although term extraction is an important step when constructing a domain taxonomy, this shared task makes the assumption that a list of terms is readily available. Nevertheless, participants are allowed to add additional nodes, i.e. terms, in the hierarchy as they consider appropriate. Terms will be extracted from a domain specific corpus using an existing term extraction tool, providing the participants with a list of manually filtered terms. In this way, taxonomy learning is limited to finding relations between pairs of terms and organising them in a hierarchical structure. This simplifies the evaluation by providing common ground for all the
systems. Participants are encouraged to consider polyhierarchies when organising terms, as multiple perspectives can be equally valid when organising concepts. Because nodes can have more than one parent, the final structure of the taxonomy is not necessarily a tree.

TASK DESCRIPTION

In this shared task, taxonomies are evaluated through comparison with gold standard relations collected from BabelNet [3], a multilingual semantic network built by merging WordNet with Wikipedia. Additionally, gold standard relations will be gathered from manually constructed taxonomies, classification schemes and/or ontologies, where available depending on the domain. Expert evaluation will be performed as well by pooling a subset of the relations submitted by the participants. Recall will be estimated based on the combined set of relations identified by all the systems. We will evaluate the performance of systems across domains, by considering three domains that were not previously considered for this task including commonsense knowledge as well as technical domains.

Depending on the selected approach, the task may or may not require large amounts of text to extract relations between terms, therefore no corpora is provided by the organisers of the task. Trial/training data will consist of terms and hierarchical relations selected from one of the WordNet domains that was previously considered for this task, such as plants or vehicles, as well as for a technical domain, such as AI.
The domains will be revealed only when test data is available so that systems will not be overfitted to the domain. Possible domains could be politics, sociology, rock music, etc.
For each domain, the test data will consist of a list of domain terms that the systems will have to structure into a taxonomy, with the possibility of adding further intermediate terms. Each system will return a list of pairs (term, hypernym).

EVALUATION METHODOLOGY

It is not only the construction of taxonomies that is difficult but the evaluation as well, therefore we consider two different evaluation methodologies.
We will evaluate the relations between terms using standard precision, recall and F1 measures. We will also use the evaluation scheme presented in [4] to compare the overall structure of the taxonomy against a gold standard, with an approach used for comparing hierarchical clusters.
As a baseline, we will use hypernym relations from WordNet. We expect a low recall, as WordNet has only a partial coverage for most technical domains.

Acknowledgement

The TExEval shared task will be supported by the following projects:
The “MultiJEDI” ERC Starting Grant (http://multijedi.org/), lead by Prof. Roberto Navigli at the Linguistic Computing Laboratory of the Sapienza University of Rome, Italy.
Linked Data and Text Mining research area, lead by Dr. Paul Buitelaar at INSIGHT, the Irish National Centre for Data Analytics, National University of Ireland, Galway.

REFERENCES

[1] S. M. Harabagiu, S. J. Maiorano and M. A. Pasca. 2003. Open-Domain Textual Question Answering Techniques. Natural Language Engineering 9 (3): 1-38, 2003.
[2] M. Geffet and I. Dagan. 2005. The Distributional Inclusion Hypotheses and Lexical Entailment. ACL’05.
[3] Roberto Navigli and Simone Paolo Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence 193, 217-250.
[4] Paola Velardi, Stefano Faralli, Roberto Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press, 2013, pp. 665-707.

SemEval-2015 Task 17

SemEval-2015 Task 17: Taxonomy Extraction Evaluation

Contact Info

Organizers

Other Info

Announcements