SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering (pilot task)
The evaluation has ended:
Introduction
In any domain, professionals need to have access to knowledge in order to take well-informed decisions. An insightful way of presenting information in an easily updatable and complete manner is to present it on a timeline that is continuously updated with new information.
The aim of the task is to build timelines from written news in English. More specifically, the goal is to order on a timeline all the events in which a target entity is involved. We focus mainly on cross-document event coreference resolution and cross-document temporal relation extraction.
Temporal relation extraction has been the topic of the three past TempEval tasks as part of SemEval:
- TempEval-1 (2007): Temporal Relation Identification
- TempEval-2 (2010): Evaluating Events, Time Expressions, and Temporal Relations
- TempEval-3 (2013): Temporal Annotation
In addition, temporal relation extraction has been the focus of the 6th i2b2 NLP Challenge for clinical records (Sun et al., 2013).
The cross-document aspect, on the other hand, has not been often explored. Ji et al. (2009) worked on a similar task by using the ACE 2005 training corpora. The task was to link pre-defined events involving the same centroid entities (i.e. entities frequently participating in events) on a timeline. Nominal coreference resolution has been the topic of SemEval 2010 Task on Coreference Resolution in Multiple Languages.
Partially motivated by the work in the NewsReader project, TimeLine goes beyond the above mentioned tasks by addressing coreference resolution for events and temporal relation identification at a cross document level.
Task Description
Given a set of documents and a target entity, the task is to build an event TimeLine related to that entity, i.e. to detect, anchor in time and order the events involving the target entity.
Input data. As input data, we provide a set of documents and a set of target entities (people, organization, product or financial entity); only entities of interest will be selected as target entities, i.e. entities involved in many events across different documents and for which it is relevant to build a timeline.
Tracks. Two different tracks are proposed on the basis of the data used as input. For Track A only raw texts are provided to the participants, while for Track B gold event mentions are also given.
For both tracks the expected output is one TimeLine for each target entity. Each TimeLine consists of an ordered list of events in which each event is associated to a time anchor.
A sub-track is proposed for both tracks, in which the events are not associated to a time anchor.
-
Track A (main track):
- input data: raw texts
- output: full TimeLines (ordering of events and assignment of time anchors)
-
Subtrack A:
- input data: raw texts
- output: TimeLines consist of just ordered events (no assignment of time anchors)
-
Track B:
- input data: texts with manual annotation of event mentions
- output: full TimeLines (ordering of events and assignment of time anchors)
-
Subtrack B:
- input data: texts with manual annotation of event mentions
- output: TimeLines consist of just ordered events (no assignment of time anchors)
Participants can choose to participate to any track and subtrack.
Participants can submit up to two runs for each track/subtrack.
TimeLine. A TimeLine is represented in a simple tab format:
ordering time_anchor event(s)
The first column (ordering) contains a cardinal number which indicates the position of the event in the TimeLine (two events can be associated to the same number if they are simultaneous). The second column (time_anchor) contains a time anchor. The third column (event) and the following consist of one event or a list of corefered events separated by a tab. Each event is represented by the id of the file (<DOCID>), the id of the sentence and the extent of the event mention in the following format: docid-sentid-event (11778-2-launch)
In the case of multi-words event, tokens are separated by an underscore:
16844-12-showed_off
TimeLine example:
iTunes
1 2003 11778-3-launch 11778-4-launch
2 2007 11778-4-pass
3 2008-01 11778-7-hold
4 2008-02 11778-2-pass 11778-5-pass
4 2008-02 11778-3-accounts_for
Note: events put in position 0 are not considered including in the timeline and are not going to be evaluated.
The format for the subtracks (only ordering of events) should be the same. The second column can contain "XXXX" or "ZZZZ" for each events, but should not be empty.
Target Entities. Each TimeLine is associated to one target entity. The entity can be of type organization, person, product or financial entities.
The TimeLine contains events in which the target entity explicitly participates in a has_participant relation, according to the NewsReader Guidelines (section 10.2), with the semantic role ARG0 (i.e. agent) or ARG1 (i.e. patient), according to PropBank Guidelines (Bonia et al., 2010). In the sentence (1) Iphone 4 is ARG0 of the verb use, and in sentence (2) it is ARG1 of the verb unveil.
(1) The iPhone 4 will use iOS.
(2) Yesterday, Steve Jobs unveiled iPhone 4.
Entity coreference must be resolved. A TimeLine should contain events involving besides the target entity its coreferences (including pronominal coreferences). For example in a TimeLine about “Cook”, both events involving “Cook” in the first sentence and “He” in the second should be part of the TimeLines.
(3) Before his post at Apple, Cook held positions at IBM and Compaq. He is known for staying out of the spotlight.
The member_of relations are not considered as coreferences.
In sentence (4) “the parties” refers to the two companies “Apple Inc.” and “Apple Corps”, but “the parties” doesn’t corefer with neither “Apple Inc.” or “Apple Corps”.
(4) On September 21, 2004 the parties agreed to have the case heard by the UK court.
Events. Not all events can be part of a TimeLine, amongst others counter-factual events will not appear in a TimeLine. The Manual Annotation Guidelines provides details about candidate events for TimeLines.
Event coreference must be resolved. For two coreferring events there is only one position (i.e. one line) in the TimeLine.
The sentence (5) and (6) contain two event mentions which corefer: “introduced” and ‘introducing”. They will appear at the same position in the TimeLine:
1 2010-06-07 16844-5-introducing 16900-11-introduced
(5) The newest iPhone, [iPhone 4] was introduced by [Apple CEO Steve Jobs] at the company's 2010 Worldwide Developer's Conference less than two weeks ago.
(6) While introducing [iPhone 4], at the annual conference, [Jobs] [...]
Time Anchors. In a TimeLine each event is associated to a time anchor and the annotation of time anchors is based on TIMEML.
A time anchor is always a DATE (as defined in TIMEML) and it’s format follows the ISO-8601 standard: YYYY-MM-DD (that is Year, Month, and Day), the maximum granularity admitted being DAY.
As for anchors with a lower granularity, we admit only months and years: references to months are specified as: YYYY-MM, whereas references to years are expressed as: YYYY.
The place-holder character, X, is used for each unfilled position in the value of a component.
Examples:
- February 6, 2007 → 2007-02-06
- April 2010 → 2010-04
- in 2009 → 2009
- May 23 → XXXX-05-23
A time anchor takes as value the point in time when the event occurred (in case of punctual events) or began (in case of durative events).
Ordering. Event ordering is based on temporal relations between events; more specifically on the before/after and includes/simultaneous relations as defined by ISO-TimeML.
Note: If with the information available it's not possible to order an event, then the event will be placed at the beginning of the TimeLine with 0 as position. These events will not be considered by the evaluation tool.
Evaluation methodology
Participants will submit the TimeLines produced by their system for all target entities. Systems will be ranked based on the temporal awareness (UzZaman and Allen, 2011).
Examples
In this section, we give two examples of the task. In the examples we give excerpts of the documents associated to the document creation time (DCT), information available in each document. The events in which the target entity participates with the semantic role ARG0 (i.e. agent) or ARG1 (i.e. patient) are in bold. The produced TimeLine is given, with the anchor time and the order of the events.
1. Given the entity Steve Jobs as an input and a set of documents, a TimeLine is built.
Entity: Steve Jobs
Relevant sentences:
- (file id: 1664; DCT: June 6, 2005; sentence id: 2) Apple Computer CEO and co-founder Steve Jobs gave his annual opening keynote to the World Wide Developers Conference (WWDC) at Moscone Center in San Francisco, California on Monday.
- (file id: 18315; DCT: August 24, 2011; sentence id: 2) Steve Jobs, founder of Apple, has chosen to step down from his post as CEO of the company.
- (file id: 18315; DCT: August 24, 2011; sentence id: 7) Steve Jobs has been fighting pancreatic cancer since 2004 and has been on medical leave since January of this year.
- (file id: 18355; DCT: October 6, 2011; sentence id: 4) He has been fighting pancreatic cancer since 2004.
TimeLine:
Steve Jobs
1 2004 18315-7-fighting 18355-4-fighting
2 2005-06-05 1664-2-keynote
3 2011-01 18315-7-leave
4 2011-08-24 18315-2-step_down
2. For the second example, the entity of interest is Beatles’ Apple Corps.
Entity: Beatles’ Apple Corps
Relevant sentences:
- (file id: 4954; DCT: May 8, 2006; sentence id: 2) The Beatles' label Apple Corps lost its court case against Apple Computer today in the High Court.
- (file id: 4954; DCT: May 8, 2006; sentence id: 6) During the case Apple Corps showed the court just how many times the Apple Computer logo appeared during a typical download.
- (file id: 4596; DCT: March 28, 2006; sentence id: 3) Apple Corps claims that Apple Computer's iTunes Music Store violates an agreement reached between the two companies in 1991.
TimeLine:
Beatles Apple Corps
1 2006-03-28 4596-3-claims
2 XXXX-XX-XX 4954-6-showed
3 2006-05-08 4954-2-lost
Submission
Participants can submit up to two runs for each track/subtrack.
The submission is a single ZIP file.
The ZIP should contain one directory for each track/subtrack and run and must be named as follows: TRACK-ID_SYSTEM-NAME_RUN-ID
The TRACK-ID are: "TrackA", "SubtrackA", "TrackB" and "SubtrackB".
Each directory should have 3 sub-directories containing the timelines produced for each set of documents, named respectively "corpus_1_timelines/", "corpus_2_timelines/" and "corpus_3_timelines/".
The name of the files containing the timelines must be the mention of the target entity in lower case, and the extension “.txt”. In the case of multi-words entity, tokens will be separated by an underscore (e.g.: steve_jobs.txt).