TEST dataset for SemEval-2015 Task #10 Subtask E Task organizers: Sara Rosenthal, Columbia University Alan Ritter, The Ohio State University Veselin Stoyanov, Facebook Svetlana Kiritchenko, NRC Canada Saif Mohammad, NRC Canada Preslav Nakov, Qatar Computing Research Institute Version 1.0: December 8, 2014 IMPORTANT To use this test dataset, the participants should download (1), and most likely also (2): 1. the official scorer and format checker 2. the trial dataset You can find them here: http://alt.qcri.org/semeval2015/task10/index.php?id=data-and-tools The format checker released should be used to check the output before submitting the results. INPUT DATA FORMAT The test dataset has the following format: - each line corresponds to a unique term (single word or phrase); - the terms are given in random order. SUBMISSION FORMAT Your submission should have the same format as the trial dataset: - each line should correspond to a unique term; - each line should have the format: termscore where 'score' is the strength of association with positive sentiment - a number between 0 and 1. The terms in the submission file can be in any order (following the order of the terms in the test file or re-organizing terms in ascending or descending order of sentiment scores are reasonable options, but not obligatory). EVALUATION System ratings for terms will be evaluated by first ranking the terms according to sentiment score and then comparing this ranked list to a ranked list obtained from human annotations. Kendall's Tau will be used as the metric to compare the ranked lists. (We will provide scores for Spearman's Rank Correlation as well, but participating teams will be ranked by Kendall's Tau.) We have released an evaluation script so that participants can: - make sure the output is in the right format, - track progress of their system's performance on the trial data. DATASET USE No training data will be released. You are free to use the trial data for training. TEST PROCEDURE Task participants must submit their runs by the final deadline of December 14, 2014 (23:59 at Midway, Midway Islands, United States: see http://www.timeanddate.com/worldclock/city.html?n=1890) AND also no later than 7 days after data download. Late submissions will not be counted. Each team is allowed only ONE official submission. Note that you can make new submissions, which will substitute your earlier submissions on the FTP server, multiple times, but only before the deadline. Only the submission with the latest timestamp will be counted as official. Thus, we advise that you submit your runs early, and possibly resubmit later if there is time for that. SUBMISSION PROCEDURE 1. In the email you received when registering for test data download, there are instructions on how to upload your submission and your system's description. 2. ZIP your submission file and name it: "task10-subtaskE-TEAMID.zip", where 'TEAMID' is your team ID. USEFUL LINKS: Google group: semevaltweet@googlegroups.com Task website: http://alt.qcri.org/semeval2015/task10/ SemEval-2015 website: http://alt.qcri.org/semeval2015/