Evaluation
Participants in Clinical TempEval may participate in any or all of the 6 tasks (TS, ES, TA, EA, DR, CR). Additionally, Clinical TempEval will have a two-phase evaluation, allowing systems to start either from the plain text, or to incorporate some manual annotations. Specifically, the phases will be:
- Only the plain text is given
- Manually annotated event and time expression spans and attributes are given (i.e., manual TS, ES, TA and EA)
The evaluation metrics that will be applied for each of these phases are:
-
Only the plain text is given
- TS, ES: precision, recall and F1
- TA, EA: precision, recall and F1 for each attribute, and an overall precision, recall and F1 where a time/event is marked correct only if all attributes are correct
- DR: precision, recall and F1
- CR: precision, recall and F1, and closure-based precision, recall and F1, where temporal closure is run to infer additional relations on both the system and the reference relations and scores are calculated on the post-closure relations.
-
Manually annotated event and time expression spans and attributes are given
- DR: accuracy
- CR: precision, recall and F1, and closure-based precision, recall and F1.
Phase 1 Submissions
If you have completed the data use agreement process with the Mayo Clinic and received the THYME corpus, then you already have the test.zip file which contains the raw text we will be using as a test set this year.
- The password for test.zip file will be posted to the Clinical TempEval Group on January 18, 2016.
-
Participants will upload their system output before January 24, 2016 GMT -12:00:
- Go to the SemEval submission site.
- Select "Make a new Submission"
- Under "Submission Category" select "Task 12: Clinical TempEval (phase 1)"
- Fill in the remaining form details (team name, team members, system description)
- Upload a zip file containing the system output (see "System Output Format" below)
Participants may update their system output at any time during the evaluation period by returning to the SemEval submission site.
Phase 2 Submissions
The phase 2 data includes EVENT and TIMEX3 annotations corresponding to texts from the test.zip file. Even if you do not intend to participate in Phase 1, you will still need the password for test.zip released as part of that phase to get the raw texts.
- A download link for the EVENT and TIMEX3 annotations will be posted to the Clinical TempEval Group on January 25, 2016.
-
Participants will upload their system output before January 31, 2016 GMT -12:00:
- Go to the SemEval submission site.
- Select "Make a new Submission"
- Under "Submission Category" select "Task 12: Clinical TempEval (phase 2)"
- Fill in the remaining form details (team name, team members, system description). If you participated in phase 1, these will likely be the same as that phase.
- Upload a zip file containing the system output (see "System Output Format" below)
Participants may update their system output at any time during the evaluation period by returning to the SemEval submission site.
System Output Format
The format of submissions is the same for both phase 1 and phase 2. Your system output should take the same format and organization as the Anafora XML files in the training data. Your directory structure should look like:
-
SystemName-RunName
-
ID004_clinic_010
- ID004_clinic_010.Temporal-Relation.system.completed.xml
-
ID004_clinic_012
- ID004_clinic_012.Temporal-Relation.system.completed.xml
-
ID004_path_011
- ID004_path_011.Temporal-Relation.system.completed.xml
-
ID005_clinic_013
- ID005_clinic_013.Temporal-Relation.system.completed.xml
- ...
-
ID004_clinic_010
"SystemName" should be the same name that you registered on the SemEval site. "RunName" may be anything you like, though short and alphanumeric is best. Each team may submit up to 2 system runs for each phase.
Before uploading your results, please check that your Anafora XML files are valid and are read correctly by the evaluation script.
Note that you only need to submit system output on the ID* files as we will only be evaluating on colon cancer notes this year. However, there's no harm if you include system output for the other files - they will be automatically ignored during the official evaluation.