Tools
Evaluation tool
The evaluation script relies heavily on the TempEval-3 evaluation script (UzZaman et al., 2013) used to evaluate relations.
For each timeline, we use the evaluation metric presented at TempEval-3 to evaluate relations and to obtain the F1 score. The metric captures the temporal awareness of an annotation (UzZaman and Allen, 2011). Our evaluation script returns the micro average F1 score.
- Download the evaluation script used to compute the official results: evaluation_tool_timeline_task4_v3.zip
Before evaluating the temporal awareness, each timeline needs to be transformed into the corresponding graph representation. For that, we defined the following transformation steps:
- ordering and time anchors
- Each time anchor is represented as a TIMEX3
- Each event is related to one TIMEX3 with the "SIMULTANEOUS" relation type
- If one event happens before another one, a "BEFORE" relation type is created between both events
- If one event happens at the same time as another one, a "SIMULTANEOUS" relation type is created between both events
- ordering only
- If one event happens before another one, a "BEFORE" relation type is created between both events
- If one event happens at the same time as another one, a "SIMULTANEOUS" relation type is created between both events
References
Naushad UzZaman and James Allen (2011), "Temporal Evaluation." In Proceedings of The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Short Paper), Portland, Oregon, USA.
Naushad UzZaman and Hector Llorens and Leon Derczynski and Marc Verhagen and James Allen and James Pustejovsky (2013) "SemEval-2013 Task 1: TEMPEVAL-3: Evaluating Time Expressions, Events, and Temporal Relations" Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 1–9, Atlanta, Georgia, June 14-15, 2013. http://anthology.aclweb.org//S/S13/S13-2001.pdf