Task 14: Semantic Taxonomy Enrichment
Semantic Networks and Ontologies are key resources in Natural Language Processing, especially for work in Lexical Semantics where they provide an important source of information on concepts and how they relate to one another. Of these resources, WordNet (Fellbaum, 1998) has remained in wide-spread use over the past two decades, in part due to its broad coverage semantic network, which includes over 200K senses of 155K word forms. However, despite its coverage, WordNet still omits many lemmas and senses, such as those from domain specific lexicons (e.g., law or medicine), creative slang usages, or those for technology or entities that came into recent existence. Therefore, a variety of techniques have been proposed for extending the current ontology structure with new terminology and senses (Snow et al., 2006; Toral et al., 2008; Ponzetto and Navigli, 2009; Yamada et al., 2011; Jurgens and Pilehvar, 2015).
A key question remains in how to measure the quality of ontology extension algorithms. Despite the interest in enriching WordNet, no dataset currently exists for testing accuracy on the types of terms and senses that might be added to WordNet, such as slang or technical jargon. While performance could be measured by removing existing terms and senses from WordNet and measuring accuracy at their reinsertion, the senses of the currently-missing novel words may be very different from those already in WordNet (e.g., more fine-grained or more concentrated in a particular part of the ontology) and further, the definitions associated with these new senses may not match the writing style, descriptive length, or language used for WordNet’s current senses. As a result, measuring the accuracy of WordNet enrichment through ablation testing does not reflect the full difficulty of the task and hence, a method’s corresponding accuracy.