Data and Tools
I. Codalab *NEW*
The following are development sets and instructions to be used to practice submitting your output to Codalab. We encourage you to begin testing uploading immediately. Initially, you may want to upload the baseline provided in the zipped files.
- 4A English (codalab): 4a-english.zip
- 4A Arabic (codalab): 4a-arabic.zip
- 4B English (codalab): 4b-english.zip
- 4B Arabic (codalab): 4b-arabic.zip
- 4C English (codalab): 4c-english.zip
- 4C Arabic (codalab): 4c-arabic.zip
- 4D English (codalab): 4d-english.zip
- 4D Arabic (codalab): 4d-arabic.zip
- 4E English (codalab): 4e-english.zip
- 4E Arabic (codalab): 4e-arabic.zip
II. English Training Data
- Download the English data from prior years organized for this year.
- Link to the data as posted for last year (SemEval-2016).
III. Arabic Training Data
IV. Download Scripts
- Script to download tweets and user information for the above datasets
V. Test input:
- Test input v3.0 for phase 1 (January 11-22): subtasks A, C, E (deadline: passed)
- Test input for phase 2 (January 23-30): subtasks B and D (deadline: passed)
VI. Arabic+English training data:
NOTES:
1. For English, we provide a default split of the data from previous years into training, development and development-time testing datasets, participants are free to use this data in any way they find useful when training and tuning their systems, e.g., use a different split, perform cross-validation, train on all datasets, etc.
2. For English, unlike in previous years, for SemEval-2017 Task 4, there was no progress testing, and thus all the provided data could be used for training and development.
RESULTS:
- All training data can be found here.
- The test data can be found here.
- The gold labels, submissions and scores for all teams can be found here.
- The task paper can be found here.
@InProceedings{SemEval:2017:task4,
author = {Sara Rosenthal and Noura Farra and Preslav Nakov},
title = {{SemEval}-2017 Task 4: Sentiment Analysis in {T}witter},
booktitle = {Proceedings of the 11th International Workshop on Semantic Evaluation},
series = {SemEval '17},
month = {August},
year = {2017},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics},
}