Subtasks
For this year's task, we are offering the following subtasks:
Subtask A: Pairwise Comparison
Given two tweets, a successful system will be able to predict which tweet is funnier, according to the gold labels of the tweets. During evaluation, only tweets with differing labels will be evaluated, and the tweet with the higher label is said to be funnier in a given pair of tweets. The sample script released with the Trial Data performs this task specifically.
For evaluation, we will release data formatted exactly like the Trial/Training data, but without labels. To evaluate this subtask, teams will produce predictions for every possible combination of tweet pairs from a given Evaluation file. The evaluation script will then select the appropriate pairs for evaluation. The Evaluation metric is accuracy micro-averaged across all Evaluation files.
CodaLab competition: https://competitions.codalab.org/competitions/15682
Subtask B: Semi-Ranking
Given an input file of tweets for a given hashtag, systems will produce a ranking of tweets from funniest to least funny. Since the tweet files do not relate an explicit rankings, we will be evaluating whether tweets having been placed in the appropriate bucket: winning tweet, top 10 but not winning, and not 10. In a certain sense this can be thought of as labeling, however there is a known cardianlity for tweets in each bucket: 1 tweet, 9 tweets, the rest of the tweets.
System evaluation will use a measure inspired by edit distance: for each tweet, how many moves must occur for it to be placed in the right bucket. For example, if the winning tweet has been placed in the top 10 but not winning bucket, and a tweet from the top 10 but not winning bucket has placed in the winning tweet bucket, the total edit error will be 2, 1 for each tweet. The final Evaluation measure will be the edit error normalized by 22, the maximum edit error. This Evaluation metric is averaged across all Evaluation files to produce the final metric.
CodaLab competition: https://competitions.codalab.org/competitions/15689