RumourEval: Determining rumour veracity and support for rumours
Media is full of false claims. Even Oxford Dictionaries named "post-truth" as the word of the year. This makes it more important than ever to build systems that can identify the truth of a story, and the kind of discourse there is around it.
RumourEval invites people and groups to create systems that can do just that. We provide data for training and testing a system, and then in January 2017, will release some evaluation data. You will build a system ahead of that, and then run the system on the evaluation data, and send us the results. We will share the results, and all systems participating will be invited to present their work and analysis at a major conference in summer 2017 as part of the large SemEval workshop.
This task aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics -- each having their own families of claims and replies -- and concrete subtasks.
The task of analysing and determining veracity of social media content has been of recent interest to the field of natural language processing. After initial work, increasingly advanced systems and annotation schemas have been developed to support the analysis of rumour and misinformation in text. Veracity judgment can be decomposed intuitively in terms of a comparison between assertions made in -- and entailments from -- a candidate text, and external world knowledge, this leads to a veracity judgment. Intermediate linguistic cues have also been shown to play a role. Critically, based on recent work the task appears deeply nuanced and very challenging, while having important applications in, for example, journalism and disaster mitigation.
We propose a shared task where participants analyse rumours in the form of claims made in user-generated content, and where users respond to one another within conversations attempting to resolve the veracity of the rumour. We define a rumour as a ``circulating story of questionable veracity, which is apparently credible but hard to verify, and produces sufficient skepticism and/or anxiety so as to motivate finding out the actual truth''. While breaking news unfold, gathering opinions and evidence from as many sources as possible as communities react becomes crucial to determine the veracity of rumours and consequently reduce the impact of the spread of misinformation.
Within this scenario where one needs to listen at, and assess the testimony of, different sources to make a final decision with respect to a rumour's veracity, we propose to run a task in SemEval consisting of the following two subtasks:
- determining whether statements from different sources support, deny, query or comment on rumours
- veracity prediction
There is a Google Group for the task - see https://groups.google.com/forum/#!forum/rumoureval
Subtask A: SDQC
Related to the objective of predicting a rumour's veracity, the first subtask will deal with the complementary objective of tracking how other sources orient to the accuracy of the rumourous story. A key step in the analysis of the surrounding discourse is to determine how other users in social media regard the rumour. We propose to tackle this analysis by looking at the replies to the tweet that presented the rumourous statement, i.e. the originating rumourous (source) tweet.
We will provide participants with a tree-structured conversation formed of tweets replying to the originating rumourous tweet, where each tweet presents its own type of support with regard to the rumour. We frame this in terms of supporting, denying, querying or commenting on (SDQC) the claim. Therefore, we introduce a subtask where the goal is to label the type of interaction between a given statement (rumourous tweet) and a reply tweet (the latter can be either direct or nested replies).
Each tweet in the tree-structured thread will have to be categorised into one of the following four categories:
- Support: the author of the response supports the veracity of the rumour they are responding to.
- Deny: the author of the response denies the veracity of the rumour they are responding to.
- Query: the author of the response asks for additional evidence in relation to the veracity of the rumour they are responding to.
- Comment: the author of the response makes their own comment without a clear contribution to assessing the veracity of the rumour they are responding to.
Evaluation
The evaluation of the SDQC task needs to be more careful, as the distribution of the categories is clearly skewed towards comments. Evaluation is in classification accuracy.
Submit your system on Codalab: RumourEval task A
Subtask B: Veracity prediction
The goal of this subtask is to predict the veracity of a given rumour. The rumour is presented as a tweet, reporting an update associated with a newsworthy event, but deemed unsubstantiated at the time of release. Given such a tweet/claim, and a set of other resources provided, systems should return a label describing the anticipated veracity of the rumour as true or false. The ground truth of this task is manually established by journalist members of the team who identify official statements or other trustworthy sources of evidence that resolve the veracity of the given rumour.
The participants in this subtask will be able to choose between two variants. In the first case -- the closed variant -- the veracity of a rumour will have to be predicted solely from the tweet itself. In the second case -- the open variant -- additional context will be provided as input to veracity prediction systems; this context will consist of snapshots of relevant sources retrieved immediately before the rumour was reported, including a snapshot of an associated Wikipedia article, a Wikipedia dump, news articles from digital news outlets retrieved from NewsDiffs, as well as preceding tweets from the same event. Critically, no external resources may be used that contain information from after the rumour's resolution. To control this, we will specify precise versions of external information that participants may use. This is important to make sure we introduce time sensitivity into the task of veracity prediction.
We take a simple approach to this task, using only true/false labels for rumours. In practice, however, many claims are hard to verify; for example, there were many rumours concerning Vladimir Putin's activities in early 2015, many wholly unsubstantiable. Therefore, we also expect systems to return a confidence value in the range of 0-1 for each rumour; if the rumour is unverifiable, a confidence of 0 should be returned.
Evaluation
The evaluation of the predicted veracity, which will be one of true or false for each instance, will be performed with microaveraged accuracy, hence measuring the ratio of instances for which a correct prediction is made.
Submit your system on Codalab: