Data and Tools


A Scorer for Task 11 in Python (November 24th)

 

A Python Scorer is now available to download for this task (here).


The scorer can be tested using these gold and test inputs, using the following command-line call:


    python semevalscorer.py  gold.csv test.csv

 

Please see below (under Java scorer) for general comments about the scorer and its operation. The Python and Java scorers are functionally equivalent.


A Scorer for Task 11 in Java (November 23rd)


A Java Scorer is now available to download for this task (here).

 

The scorer can be tested using these gold and test inputs, using the following command-line call:


    java -jar semevalscorer.jar gold.csv test.csv

 

This scorer was created to evaluate your system output for the SemEval 2015 Task 11. You have to prepare your system output in the same format as the training data provided online (integer version). See the files gold and test for examples. Then please use the above command to run the scorer.

 

The scorer expects the system output to be in the 11-scale integers (ranging from -5 to +5), for the convenience of the systems based on either regression model or classification model. Please make sure that the output value for each tweet in your submission is properly scaled and rounded (if you are using real-valued output).


The gold standard is derived from the averaged human judgments of trusted users. The script will transform the gold standard and the system output into a vector space representation, based on which the scorer will evaluate the output by calculating the cosine similarity of the two input vectors. Note that we will employ a linear penalty for submissions that do not cover all tweets. For example, if your submission provides sentiment judgments for only half of the tweets appearing in the test data set, then your final score will be halved, that is, 0.5*cosine(gold, test).

 

Important Note regarding Training Data (August 30th):

A transcription error in the representation of tweet-ids was observed after the training data files were first uploaded. In addition, some duplicates were observed in the training data. If you downloaded the training data files before August 30th, please download them again to ensure that you have the correct data files with the correct tweet-ids and without duplicate entries.


Important Note regarding Training Data (October 15th):

Some prospective participants have noted the perishability of our training tweets (as all tweets are potentially perishable). Please see the bottom of this page for a discussion of tweet perishability, and our approach to ensuring that ALL training tweets are made available participating systems.   

 

Training Data

 

Training data for this task (8000 figurative tweets annotated with sentiment scores in the range -5...+5) is now available as a spreadsheet here (rounded integer scores) and here (real-valued scores)


Training data is now also available in a .tsv format here (rounded integer scores) and here (real-valued scores)

 


An .RTF README document for this trial data is available here.

 

Please see the imporant note about tweet perishability (and what you can do about it) at the bottom of this page!

 

Trial Data

Trial data for this task (1000 figurative tweets annotated with sentiment scores in the range -5...+5) is now available as a spreadsheet here.

An .RTF README document for this trial data is available here.

 

Tweet Text

The actual text of each tweet is not included, due to copyright/privacy concerns that come as standard with the use of Twitter data. A script is available here for retrieving the text of each tweet given its tweet-id. 

For python 2.x

Tweet downloader script

For python 3.x

Tweet downloader script

 

Note: Tweets are a perishable commodity and may be deleted, archived or otherwise made inaccessible by their creators. Participants are encouraged to download the text of tweets via their tweet ids using the script provided at their earliest convenience.


As of October 15, 2014 approx. 15% of our training tweets have already perished for one of the above reasons. For this reason we have created a mapping from the published tweet-ids in the training data (above) to a new set of imperishable copies.  This mapping of tweet-ids (perishable to imperishable, with weighted sentiment score) is downloadable here:

 

Tweet id mapping (TSV format)

 

Contact Info

Organizers

  • John Barnden (J.A.Barnden@cs.bham.ac.uk) University of Birmingham, UK.
  • Antonio Reyes (antonioreyes@isit.edu.mx) Superior Institute of Interpreters and Translators
  • Ekaterina Shutova (shutova.e@gmail.com) ICSI, UC Berkeley
  • Paolo Rosso (prosso@dsic.upv.es) Technical University of Valencia
  • Tony Veale (tony.veale@ucd.ie ) University College Dublin

email : tony.veale@UCD.ie

Other Info

Announcements

  • Initial Analysis of Results is now available here
  • Test data for this task will be available from Dec 5th. To obtain the test data, you must register for the task. Here is the link
    Note: you have 5 days to submit your results from the time your download the data. Do not download until you are ready to use it!
  • We have now released a Java scorer for download: please see the Data and Tools page.
  • Note: the dates for the evaluation period for SemEval-2015 have changed! (Dec. 5 -- 22, 2014)
  • Training data for this task (8000 figurative tweets annotated with sentiment scores in the range -5...+5) is now available.
  • Trial data for this task (1000 figurative tweets annotated with sentiment scores in the range -5...+5) is now available.
  • Follow @MetaphorMagnet -- a Twitterbot that uses metaphor theory to automatically generate novel metaphors