SemEval-2015 Task 16: Detecting Nocuous Ambiguity (DNA)
Overview
Although many utterances are ambiguous in one way or another, it is often the case that human readers do not recognise the ambiguity. For example, in:
They wore black hats and boots
does black modify both nouns hats and boots?
The phenomenon of nocuous ambiguity was investigated by Chantree et al (2006) who investigated the phenomenon in requirements documents, and this work was later extended by Yang et al (2010) to cover pronoun resolution.
Detecting innocuous and nocuous ambiguity is of huge importance within domains such as Requirements Engineering and within Safety Critical Systems that contain large number of human written documents that are supposedly unambiguous.
Although the syntactic structure of a sentence often displays such ambiguity, often the context means that human readers are likely to agree on a single interpretation. For example, when presented with the sentence:
The procedure shall convert the 24 bit image to an 8 bit image, then display it in a dynamic window.
around 25% of respondents selected the 24 bit image as the antecedent for it, compared to 75% of respondents who selected the 8 bit image.
Thus although, no unanimous agreement was achieved, there is a significant agreement on one interpretation. The task of identifying nocuous ambiguity recognises that humans may not agree on a particular interpretation, and that it is the job of the system to identify that disagreement..
In this proposed task, teams will be given a collection of texts, each of which will contain an ambiguity. The task will be to identify which of those texts display nocuous ambiguity at a variety of thresholds. We follow Willis et al (2008):
Definition Given an ambiguous phrase or sentence, S, a collection of judgements of the correct interpretation of that sentence and an ambiguity threshold T (where 0 ≤ T ≤ 100%):
if there is at least one non-ambiguous interpretation of S which has a certainty greater than T, then S exhibits innocuous ambiguity at threshold T. Otherwise, S exhibits nocuous ambiguity at threshold T.
where the “certainty” of an interpretation is simply the number of annotators who selected that interpretation as the intended meaning.
Datasets
Building on earlier work of Willis et al (2008), sentences containing high degree of potential nocuous ambiguity will be collected. We aim to annotate approx. 3k such sentences. Each sentence will be annotated by 10 or more volunteers to generate sufficient ambiguity statistics.
Evaluation methodology
The submitted systems will be evaluated using standard precision and recall measures, on the task of identifying which sentences or phrases out of a provided list are nocuous for a given range of thresholds (say, T at 60%, 70% and 80%).
Organisers
Suresh Manandhar, University of York, UK (suresh@cs.york.ac.uk)
Alistair Willis, Open University, UK (alistair.willis@open.ac.uk) [Primary Contact]
Bibliography
Chantree, Francis, Bashar Nuseibeh, Anne DeRoeck, and Alistair Willis. 2006. “Identifying Nocuous Ambiguities in Natural Language Requirements.” In Proceedings of 14th IEEE International Requirements Engineering Conference (RE’06). Minneapolis/St Paul, Minnesota, USA.
Willis, Alistair, Francis Chantree, and Anne Roeck. 2008. “Automatic Identification of Nocuous Ambiguity.” Research on Language and Computation 6 (3-4).
Yang, Hui, Anne De Roeck, Alistair Willis, and Bashar Nuseibeh. 2010. “A Methodology for Automatic Identification of Nocuous Ambiguity.” In 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China.