Setting up CodaLab Competition Websites for SemEval-2020

(This page is mostly for the benefit of task organizers, however, participants should see the "Directions for participating in a CodaLab competition" section at the bottom, and might benefit from looking at the page to get a general understanding of the motivations and functionality of CodaLab.)

Benefits of CodaLab:

You don’t have to manually download and manage system submissions.
Your evaluation scripts will be run and the results will be collected fully automatically.
Participants can easily test their output formats (e.g., on the trial data) without any help from you.
If you want to set your own task start and end dates, CodaLab supports that.
If you need to run a multi-phase evaluation, CodaLab supports that.
CodaLab allows upload, download, and versioning of your datasets and scoring programs.
CodaLab leaderboards can include multiple different scores and can be anonymous if desired.
The CodaLab website can remain active even after the competition, allowing others to upload their submissions and be automatically evaluated.

Note that CodaLab competitions is not the same as CodaLab worksheets. There is no requirement of participants to upload any code to CodaLab. In the vast majority of competitions, participants simply upload their output files, as in all past SemEvals.

Tutorials:

This tutorial by Tristan Miller is likely to be extremely useful:
Slides: https://www.hse.ru/data/2017/05/31/1171931089/CodaLabCompetitions.pdf
Video: https://www.youtube.com/watch?v=Ptx93cSBdNY
https://www.youtube.com/watch?v=mU1yEEMrMvY
Quickstart: https://github.com/codalab/codalab-worksheets/wiki/Quickstart
Running a Competition: https://github.com/codalab/codalab-competitions/wiki/User_Running-a-Competition
https://github.com/codalab/codalab-competitions/wiki/User_Competition-Roadmap

Some resources that should help you in this process:

We have put together a sample CodaLab competition that you can use as a starting point for your task’s competition: https://github.com/bethard/semeval-codalab
Example Competition on CodaLab for Emotion Intensity: https://competitions.codalab.org/competitions/16380
The code can be found here: https://github.com/felipebravom/EmoInt/tree/master/codalab
Example Competition on CodaLab for Clinical TempEval:
https://competitions.codalab.org/competitions/15621
The code can be found here: https://github.com/bethard/clinical-tempeval
Post your questions and issues on CodaLab here:
https://github.com/codalab/codalab-competitions/issues
Search first to see if a similar question has already been asked and answered.
You can also post your questions to the semeval-task-organizers@googlegroups.com mailing group where other task organizers and semeval organizers might be of help.

Important Notes:

Experiment with the testing version of Codalab before uploading an official competition: https://competitions-test.codalab.org/
Create only a single CodaLab competition, even if you have multiple subtasks. Multiple CodaLab competitions solve very few problems and make things more complicated.
- If you have multiple subtasks, define a submission file format that is the same regardless of how many subtasks someone participants in.
- If you have multiple subtasks, your evaluation script will have to handle cases where a participant makes submissions for only one or some subtasks.
Ensure that your CodaLab competition defines a detailed overview.html, data.html, and evaluation.html. These are the key documentation for your task, and what potential participants will be looking at.
While the CodaLab browser-based interface can update many parts of a competition, there are several known circumstances where the only way to modify part of a CodaLab competition is to delete and recreate it, uploading a new competition.yaml, etc. Some such circumstances are:
- Adding or removing .html files from the CodaLab website
- Modifying the leaderboard (e.g., adding new evaluation metrics)
- Adding a new phase (if done via the browser-based interface leaderboards will not display)
Make sure your evaluation script throws errors (and exits with an error status) for all formatting issues you are able to detect in system submissions. This will ensure that ill-formatted submissions do not count against a participant’s submission limit.
- A very common error is for participants to include an extra subdirectory in their submission. Please make sure your evaluation script detects and handles this error.
Consider setting up at least the following three phases for your competition:
- Practice phase:
  - Runs from now until 10 Jan 2020
  - Uses the official evaluation script, but on the trial data
  - Set maximum submissions to something high like 999
  - Make the leaderboard public
  - Allows participants to check their formatting
- Evaluation phase:
  - Runs from 10 Jan until 31 Jan 2020 or some subset, if your competition is shorter
  - Uses the official evaluation script and the official test data
  - Set maximum submissions to a number less than or equal to 10. If the number is greater than 1, a suggested option is to tell the participants that only their final valid submission on CodaLab will be taken as the official submission to the competition. The participants can still describe contrastive runs in their system paper. If you choose to accept more than one official submission per team, then you will have to look for the other submissions in the 'submissions' tab (the leaderboard only shows the latest valid submission).
  - Hide the leaderboard (leaderboard_management_mode: hide_results)
  - Determines the official leaderboard rankings for SemEval
  - At the end of the evaluation period, make a copy of the leaderboard and save it as backup in case the leaderboard gets updated (especially needed if you have not set up a post-evaluation phase)
- Post-Evaluation phase:
  - Runs from 31 Jan (or earlier, if your evaluation length is shorter than the maximum allowed time)
  - Uses the official evaluation script and the official test data
  - Enable “Auto migration” of submissions from Evaluation phase to this phase
  - Set maximum submissions to something high like 999
  - Make the leaderboard public
  - Allows participants to score “contrastive runs” that can be included as part of the analysis in system description papers. Also allows scoring of future systems interested in the task beyond SemEval 2020
  - At most one submission for each participant can be displayed on the leaderboard.
Participants must click the Submit to Leaderboard button underneath one of their submissions to display those results on the leaderboard. (Task organizers may override the participants using the “SHOW” setting on the “Submissions” page.)
If you choose to allow more than 1 submission in your phases, organizers can see all submissions in the ‘Admin Features’ -> ‘Submissions’ tab.
If you choose to allow more than 1 submission, it is especially important to consider hiding the leaderboard during the Test phase to prevent participants from making a large number of submissions, viewing their results, and then choosing the best to place on the leaderboard (i.e., tuning to the test data).

Directions for participating in a CodaLab competition:

Create an account in CodaLab (https://competitions.codalab.org/). Sign in.
Edit your profile appropriately. Make sure to add a team name, and enter names of team members. (Go to "Settings", and look under "Competition settings".)
Proceed to task webpage on CodaLab. Read information on all the pages.
Download data: training, development, and test (when released)
Run your system on the data and generate a submission file, which must follow the official submission format outlined for your task. CodaLab does not place any restrictions on the name of the zip file name.
Make submissions on the development set (Phase 1).
- Wait a few moments for the submission to execute.
- Click on the ‘Refresh Status’ button to check status.
- Check to make sure submission is successful.
  - System will show status as “Finished”.
  - Click on ‘Download evaluation output from scoring step’ to examine the result. If you choose to, you can upload the result on the leaderboard.
  - If unsuccessful, check error log, fix format issues (if any), resubmit updated zip.
Once the evaluation period begins, you can make submissions for the test set. The procedure is similar to that on the dev set. These differences apply:
- The leader board will be disabled until the end of the evaluation period.
- You cannot see the results of your submission. They will be posted on a later date after the evaluation period ends.
- You can still see if your submission was successful or resulted in some error.
- In case of error, you can view the error log.

SemEval-2020

International Workshop on Semantic Evaluation
Sponsored by SIGLEX