Data

 

Evaluation Data: Test and Gold data

The evaluation data consists of 3 sets documents annotated with event mentions and a set of 38 target entities. Each set contains 30 documents from Wikinews, for a total of around 30,000 tokens.

Evaluation data are available by filling the following form:  Download ME!

 

Trial Data

The trial data consists of a set of 30 documents collected from Wikinews (http://en.wikinews.org) about Apple Inc. A set of target entities (input) and the corresponding ordered list of events (the output timeline) is provided with the set of documents.

The trial data have been annotated with the extents of event mentions.

 

Download:

 

We also provide independently the 3 files used for the agreement on event mentions annotation, and the two TimeLines built by using these files for the agreement. The 3 files are also included in the whole corpus, but not the TimeLines. The annotation and the TimeLines have been reviewed.

 

No training data have been provided in addition to the trial data.

 

Format

Documents. The documents will be available in two formats: CAT (Content Annotation Tool)  (Bartalesi Lenzi et al.,2012) labelled format and a format which mimics TimeML format (http://timeml.org/site/publications/specs.html).

CAT labelled format is an XML ­based stand­off format where different annotation layers are stored in separate document sections and are related to each other and to source data through pointers. Trial data are annotated with event mentions and the document creation time, so each document contains 2 different sections: one with the tokens and one with the markables.

The XSD schema of the annotated documents in CAT labelled format is available here.

In the alike TimeML format events are annotated using only the EVENT element (and not the MAKEINSTANCE as in TimeML). Elements has been added to mark out the sentences (s) and associate them to an unique id. The text is tokenized.

 

TimeLine. One file by TimeLine must be created. The first line contains the target entity.
The name of the files must be the mention of the target entity in lower case, and the extension “.txt”. In the case of multi-words entity, tokens will be separated by an underscore.
E.g.: steve_jobs.txt

 

Set of target entities. For each set of documents, one file is provided containing the list of target entities, one by line.

 

Contact Info

Organizers

  • Anne-Lyse Minard
  • Eneko Agirre
  • Itziar Aldabe
  • Marieke van Erp
  • Bernardo Magnini
  • German Rigau
  • Manuela Speranza
  • Rubén Urizar

email: semeval-task4-timeline@googlegroups.com

google group: semeval-task4-timeline

Other Info

Announcements