Data and Tools < SemEval-2017 Task 3

Data and Tools

English TRAIN+DEV data v3.2 -- same as for SemEval-2016 Task 3 (subtasks A, B and C)

Data for all English subtasks v3.2 is here
It includes a TRAIN/DEV split with reliable double-checked DEV
- Subtask A (6,398 questions + 40,288 comments) + unannotated (189,941 questions + 1,894,456 comments)
- Subtask B (317 original + 3,169 related questions)
- Subtask C (317 original questions + 3,169 related questions + 31,690 comments)

Arabic TRAIN+DEV data v1.3 -- same as for SemEval-2016 Task 3 (Subtask D)

The Arabic TRAIN+DEV data v 1.3 can be found here
It includes a TRAIN/DEV split with reliable double-checked DEV (1,281 original questions, and 37,795 potentially related question-answer pairs) + unannotated (163,383 question--answer pairs)

Test data from SemEval-2016 Task 3 -- can be used for training too (subtasks A, B, C and D)

Can be found here
Format checker for the test output is here
The GOLD labels and results are here

Multi-domain SAMPLE data Task 3 -- data for the new Subtask E

The StackExchange multi-domain SAMPLE data can be found here
The sample data is taken from a StackExchange subforum that is not in the DEV, TRAIN or TEST sets.

Multi-domain TRAIN and DEV data Task 3 -- data for the new Subtask E

The StackExchange multi-domain TRAIN and DEV data can be found here
UPDATE 9/9/2016: user data has been added. The current version is v1_2.

Test data for SemEval-2017 Task 3

Now available here for subtasks A, B, C, and D (check the README file)
For subtask E it will be available on January 21, 2017
For instructions on how to submit the test results of your systems, please read this page
UPDATE 24/9/2016: test data for subtask E is now available here

Scorer v2.2 and random baselines (subtasks A, B, C, D and E) -- same as for SemEval-2016 Task 3

Can be found here

UPDATE 3/12/2016: There is a new scorer available (v2.3) which can be used for all subtasks, including subtask E.

It can be found here

RESULTS

The gold labels, submissions and scores for all teams can be found here
The gold labels inside the test XML can be found here
The task description paper is here.

@InProceedings{SemEval-2017:task3,

     author    = {Nakov, Preslav and Hoogeveen, Doris and M\`{a}rquez, Llu\'{i}s and Moschitti, Alessandro and Mubarak, Hamdy and Baldwin, Timothy and Verspoor, Karin},
     title     = {{SemEval}-2017 Task 3: Community Question Answering},
     booktitle = {Proceedings of the 11th International Workshop on Semantic Evaluation},
     series    = {SemEval '17},
     month     = {August},
     year      = {2017},
     address   = {Vancouver, Canada},
     publisher = {Association for Computational Linguistics},
   }

Contact Info

Organizers

Preslav Nakov, Qatar Computing Research Institute, HBKU
Lluís Màrquez, Qatar Computing Research Institute, HBKU
Alessandro Moschitti, Qatar Computing Research Institute, HBKU
Hamdy Mubarak, Qatar Computing Research Institute, HBKU
Timothy Baldwin, The University of Melbourne
Doris Hoogeveen, The University of Melbourne
Karin Verspoor, The University of Melbourne

email : semeval-cqa@googlegroups.com

Other Info

Announcements

14 Feb. 2017: Submit your paper by February 27
11 Feb. 2017: The results and all scores are released
30 Jan. 2017: The closing date for test submissions is January 30th midnight UTC-12.
24 Jan. 2017: Test set for subtask E is available now. (here)
12 Jan. 2017: Test sets for subtasks A-D are available now. (data webpage)
9 Jan. 2017: The release of test data for subtasks A-D is delayed by some days. Apologies for the inconvenience.
5 Jan. 2017: Submission deadline is set to be January 30.
5 Jan. 2017: New web page created with instructions on how to submit system results.
8 Dec 2016: Separate competitions for the subtasks have been set up at CodaLab, where you can submit your results: Subtask A, Subtask B, Subtask C, Subtask D, and Subtask E. You can submit results both for the development set and the test set here, receive scores and choose what to publish on the leaderboard.
8 Dec 2016: A new scorer is now available from the Data and Tools page, which can also be used for subtask E
Register to participate here