Task 1: Semantic Textual Similarity: A Unified Framework for Semantic Processing and Evaluation


Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text. While making such an assessment is trivial for humans, constructing algorithms and computational models that mimic human level performance represents a difficult and deep natural language understanding (NLU) problem.


To stimulate research in this area and encourage the development of creative new approaches to modeling sentence level semantics, the STS shared task has been held annually since 2012, as part of the SemEval/*SEM family of workshops. Each year the competition brings together numerous participating teams, diverse approaches, and ongoing improvements to state-of-the-art methods.


Task Definition


Given two sentences, participating systems are asked to return a continuous valued similarity score on a scale from 0 to 5, with 0 indicating that the semantics of the sentences are completely independent and 5 signifying semantic equivalence. Performance is assessed by computing the Pearson correlation between machine assigned semantic similarity scores and human judgements.


Evaluation Tracks 


STS 2016 offers two tracks: STS Core and Cross-lingual STS. The former is a traditional STS task with paired monolingual sentences drawn from English data sources. Cross-lingual STS involves assessing paired English and Spanish sentences.


  • STS Core, with English sentence pairs on Plagiarism Detection, Q&A Question-Question, Q&A Answer-Answer, Post-Edited Machine Translations and Headlines.
  • Cross-lingual STS with Spanish-English bilingual sentence pairs on Plagiarism Detection, Q&A Answer-Answer, Post-Edited Machine Translations and Headline as well as parallel and comparable literary works in English and Spanish.

Unsupervised Modeling of the Evaluation Data New!


Participants are allowed to train purely unsupervised models on the evaluation data sources. 


Training unsupervised model components on the evaluation data sources is permissible as long as no manual annotations of any kind or data source structure are used. Participants must check with the organizers to ensure your use of the data is compliant.


See the Data and Tools tab for a list of evaluation sources. 




The Semantic Textual Similarity Wiki details previous tasks and open source software systems and tools.


Join the STS mailing list for updates at http://groups.google.com/group/STS-semeval.




Eneko Agirre (University of Basque Country) 
Daniel Cer (Google)
Mona Diab (George Washington University)
Aitor Gonzalez-Agirre (University of the Basque Country)
German Rigau (University of the Basque Country)




Eneko Agirre; Carmen Banea; Claire Cardie; Daniel Cer; Mona Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Inigo Lopez-Gazpio; Montse Maritxalar; Rada Mihalcea; German Rigau; Larraitz Uria; Janyce Wiebe. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. Proceedings of SemEval 2015 [pdf]

Eneko Agirre; Carmen Banea; Claire Cardie; Daniel Cer; Mona Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Rada Mihalcea; German Rigau; Janyce Wiebe. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings of SemEval 2014. [pdf]

Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, WeiWei Guo. *SEM 2013 shared task: Semantic Textual Similarity, Proceedings of *SEM 2013. [pdf]

Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. Proceedings of SemEval 2012. [pdf]

Contact Info

STS Core

Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre

Cross-lingual STS

Carmen Banea, Daniel Cer, Rada Mihalcea, Janyce Wiebe

Wiki: STS Wiki
Discussion Group : STS-semeval

Other Info


  • The official cross-lingual STS results have been posted! New!
  • The gold standard cross-lingual STS files have been released! New!