Task 2: Interpretable Semantic Textual Similarity
Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text. Interpretable STS (iSTS) adds an explanatory layer. Given the input (pairs of sentences) participants need first to identify the chunks in each sentence, and then, align chunks across the two sentences, indicating the relation and similarity score of each alignment.
For instance, given the following two sentences (drawn from a corpus of headlines):
- 12 killed in bus accident in Pakistan
- 10 killed in road accident in NW Pakistan
A participant system would split the sentence in chunks:
- [12] [killed] [in bus accident] [in Pakistan]
- [10] [killed] [in road accident] [in NW Pakistan]
And then provide the alignments between chunks, indicating the relation and the similarity score of the alignment, as follows:
- [12] <=> [10] : (SIMILAR 4)
- [killed] <=> [killed] : (EQUIVALENT 5)
- [in bus accident] <=> [in road accident] : (MORE-SPECIFIC 4)
- [in Pakistan] <=> [in NW Pakistan] : (MORE-GENERAL 4)
Given such an alignment, an automatic system could explain why the two sentences are very similar but not equivalent, for instance, phrasing the differences as follows:
- the first sentence mentions "12" instead of "10"
- "bus accident" is more specific that "road accident" in the second,
- "Pakistan" is more general than "NW Pakistan" in the second.
While giving such explanations comes naturally to people, constructing algorithms and computational models that mimic human level performance represents a difficult natural language understanding (NLU) problem, with applications in dialogue systems, interactive systems and educational systems.
Please check the detailed task descriptions for more details on chunking, alignment, relation labels and scores.
Datasets
Two datasets are currently covered, comprising pairs of sentences from news headlines and image captions. The pairs are a subset of the datasets released in the STS tasks. Please check the iSTS train dataset for details.
New in 2016
The 2015 STS task offered a pilot subtask on interpretable STS, which showed that the task is feasible, with high inter-annotator agreement and system scores well above baselines.
For 2016, the pilot subtask has been updated into a standalone task. The restriction to allow only 1:1 alignment has been lifted. Annotation guidelines have been updated, and new training has been released.
Please check out the updates for more details.
Participants
If you are interested in participating, you should:
(note that registration and mailing list management are independent, please do both of them)