Data and Tools

 

 

RESULTS

 

 

TEST DATA

 

 

DATA VISUALIZATION

 

 

INSTRUCTIONS TO ANNOTATORS

 

 

DEVELOPMENT DATA

 

General English Sentiment Modifiers Set:

The terms for this dataset are taken from the Sentiment Composition Lexicon for Negators, Modals, and Degree Adverbs (SCL-NMA). The dataset has phrases formed by combining a word and a modifier, where a modifier is a negator, an auxilary verb, a degree adverb, or a combination of those, for example, 'would be very easy', 'did not harm', and 'would have been nice'. (See development data for more examples.) The dataset also includes single word terms (as separate entries). These terms are chosen from the set of words that are part of the multi-word phrases, for example, 'easy', 'harm', and 'nice'. The terms in the test set will have the same form as the terms in the development set, but can involve different words and modifiers.

 

English Twitter Mixed Polarity Set:

The terms for this dataset are taken in part from the Sentiment Composition Lexicon for Opposing Polarity Phrases (SCL-OPP). This dataset focuses on phrases made up of opposing polarity words, for example, phrases such as 'lazy sundays', 'best winter break', 'happy accident', and 'couldn't stop smiling'. Observe that 'lazy' is associated with negative sentiment whereas 'sundays' is associated with positive sentiment. Automatic systems have to determine the degree of association of the whole phrase with positive sentiment. The dataset also includes single word terms (as separate entries). These terms are chosen from the set of words that are part of the multi-word phrases,  for example, words such as 'lazy', 'sundays', 'best', 'winter', and so on. This allows the evaluation to determine how good the automatic systems are at determining sentiment association of individual words as well as how good they are at determining sentiment of phrases formed by their combinations. The multi-word phrases and single-word terms are drawn from a corpus of tweets, and may include a small number of hashtag words and creatively spelled words. However, a majority of the terms are those that one would use in everyday English.

 

You can also use the English Twitter datasets provided as part of  the last year's competition (SemEval-2015 Task 10 Subtask E). You are free to use them for any purposes (development, training, etc.)

 

Arabic Twitter Set:

This dataset includes single words and phrases commonly found in Arabic tweets. The phrases in this set are formed only by combining a negator and a word. See development data for examples.

 

 

EVALUATION SCRIPT:

  • checks the format of the prediction file
  • evaluates the predictions against the gold ratings
  • outputs the following statistics: Kendall rank correlation coefficient and Spearman rank correlation coefficient

 

Contact Info

Organizers


  • Svetlana Kiritchenko
    National Research Council Canada
  • Saif M Mohammad
    National Research Council Canada
  • Mohammad Salameh
    University of Alberta

email: SemEval-SentimentIntensity
@googlegroups.com

Other Info

Announcements

  • Results have been announced.
  • Test data has been released.
  • Task 7 (all three subtasks) will have the following evaluation period: Jan 11th (Mon) to Jan 18 (Mon).
  • Trial data have been released for all three domains.