Semantic Textual Similarity


The competition is over! The results are posted here

STS benchmark comprising 2012-2017 data just released

Semantic Textual Similarity (STS) measures the degree of equivalence in the underlying semantics of paired snippets of text. While making such an assessment is trivial for humans, constructing algorithms and computational models that mimic human level performance represents a difficult and deep natural language understanding (NLU) problem.


To stimulate research in this area and encourage the development of creative new approaches to modeling sentence level semantics, the STS shared task has been held annually since 2012, as part of the SemEval/*SEM family of workshops. Each year the competition brings together numerous participating teams, diverse approaches, and ongoing improvements to state-of-the-art methods.


Task Definition


Given two sentences, participating systems are asked to return a continuous valued similarity score on a scale from 0 to 5, with 0 indicating that the semantics of the sentences are completely independent and 5 signifying semantic equivalence. Performance is assessed by computing the Pearson correlation between machine assigned semantic similarity scores and human judgements.




STS 2017 will assess the ability of systems to determine the degree of semantic similarity between monolingual and cross-lingual sentences in Arabic, English and Spanish.

The shared task is organized into a set of secondary sub-tracks and a single combined primary track. Each secondary sub-track involves providing STS scores for monolingual sentence pairs in a particular language or for cross-lingual sentence pairs from the combination of two particular languages. Participation in the primary track is achieved by submitting results for all of the secondary sub-tracks.


The complete list of tracks is given below:

  • Primary Track
    Cross-lingual and monolingual pairs: Arabic-English, Spanish-English, Arabic-Arabic, English-English and Spanish-Spanish
  • Track 1
    Arabic monolingual pairs
  • Track 2
    Arabic-English cross-lingual pairs
  • Track 3
    Spanish monolingual pairs
  • Track 4
    Spanish-English cross-lingual pairs
  • Track 5
    English monolingual pairs
  • Track 6
    Surprise language track (announced during the evaluation period)

We would like to strongly encourage all participants to build systems capable of participating in the primary evaluation. This can be done by building models that can assess pairs in different languages (Aldarmaki and Diab, 2016; Ataman et al., 2016; Bicici 2016; Lo et al., 2016) or by using machine translation to convert pairs into a single language understood by an otherwise monolingual STS system.


However, we hope the individual secondary sub-tracks will provide teams with specific linguistic expertise the opportunity to more deeply explore the languages that interest them.




The Semantic Textual Similarity Wiki details previous tasks and open source software systems and tools.


Join the STS mailing list for updates at


Organizers (alpha. order)


Eneko Agirre (University of Basque Country) 
Daniel Cer (Google Research)
Mona Diab (George Washington University)
Iñigo Lopez-Gazpio (University of Basque Country)
Lucia Specia (University of Sheffield)




Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau and Janyce Wiebe. SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings of SemEval 2016 [pdf]


Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria and Janyce Wiebe. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. Proceedings of SemEval 2015 [pdf]


Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau and Janyce Wiebe. SemEval-2014 Task 10: Multilingual Semantic Textual Similarity. Proceedings of SemEval 2014. [pdf]


Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre and WeiWei Guo. *SEM 2013 shared task: Semantic Textual Similarity. Proceedings of *SEM 2013. [pdf]


Eneko Agirre, Daniel Cer, Mona Diab and Aitor Gonzalez-Agirre. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. Proceedings of SemEval 2012. [pdf]


Hanan Aldarmaki and Mona Diab. GWU NLP at SemEval-2016 Shared Task 1: Matrix factorization for crosslingual STS.  Proceedings of SemEval 2016 [pdf]


Duygu Ataman, Jose G. C. de Souza and Marco Turchi Matteo Negri. FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual semantic similarity measurement using quality estimation features and compositional bilingual word embeddings. Proceedings of SemEval 2016 [pdf]


Ergun Bicici. RTM at SemEval-2016 Task 1: Predicting semantic similarity with referential translation machines and related statistics. Proceedings of SemEval 2016 [pdf]


Chi-kiu Lo, Cyril Goutte and Michel Simard. CNRC at SemEval-2016 Task 1: Experiments in
crosslingual semantic textual similarity. Proceedings of SemEval 2016 [pdf]


Contact Info

Organizers (alpha. order)

Eneko Agirre, Daniel Cer, Mona Diab, Iñigo Lopez-Gazpio and Lucia Specia

Wiki: STS Wiki

Discussion Group: STS-semeval

Other Info