Data and Tools < SemEval-2017 Task 6

Data and Tools

Download the evaluation scripts

Download the data

Trial and Training Data

The training/trial data consits of a single directory with several files. Each file corresponds to a single hashtag, and is named appropriately. For example, for the hashtag #DogSongs, the file is called Dog_Songs. We add the underscore between hashtag tokens for easier parsing of the hahstags. We believe a better semantic understanding of the hashtag will contribute to a better performance in the task.

The tweets are labeled 0, 1, or 2. 0 corresponds to a tweet not in the top 10 (most of the tweets in a file). 1 corresponds to a tweet in the top 10, but not the winning tweet (usually, 9 tweets per hashtag). 2 corresponds to the winning tweet (one tweet per hashtag).

Annotating Trial/Training Data

To do this task well there are potentially several subtasks that need to be accomplished. For example, most of the tweets in #DogSongs are dog-related puns to existing songs. To understand why a pun is funny, one would need to know the song it references. Therefore, we allow participants to provide manual annotations for the trial/training data, such as annotating the proper nouns referenced in a tweet. Annotations of any type cannot be done on Evaluation Data.

Evaluation data

For evaluation, tweets with different labels will be paired, and the goal will be to determine which tweet is the funnier. We ask that participants do not use the knowledge of label distributions directly when creating their systems.

SemEval-2017 Task 6

#HashtagWars: Learning a Sense of Humor

Data and Tools

Contact Info

Other Info

Announcements