Task Results and Initial Analysis
1. Overview of Results for Task 11
15 teams participated in the task and a total of 35 runs have been submitted. The best reported system is by team CLaC, which achieves the score of .758 using the Cosine Similarity measure, and a score of 2.117 using the Mean Squared Error (MSE) measure. The score of each system ranges from 0.059 to 0.758 using Cosine similarity and from 11.274 to 2.117 using MSE.
The following is an overview table (sorted by Cosine Similarity scores) of the performance of the best runs of each participating system. Note that the best MSE score for each team may come from a different run other than the one chosen here (see Section 2 for more details).
Team |
Cosine |
MSE |
ClaC |
0.758 |
2.117 |
UPF |
0.710 |
2.458 |
LLT_PolyU |
0.678 |
2.600 |
LT3 |
0.6581 |
3.398 |
elirf |
0.6579 |
3.096 |
ValenTo |
0.634 |
2.999 |
HLT |
0.630 |
4.088 |
CPH |
0.625 |
3.079 |
prhlt |
0.623 |
3.023 |
DsUniPi |
0.601 |
3.925 |
PKU |
0.574 |
3.746 |
KELabTeam |
0.552 |
6.090 |
RGU |
0.523 |
8.602 |
SHELLFBK |
0.431 |
7.701 |
BUAP |
0.059 |
6.785 |
2. Category Analysis Results by Figurative kind (Sarcasm, Irony, Metaphor and Other)
The dataset contains Figurative tweets and non-figurative tweets (labeleld the other category). There were 3 types of Figurative language types in the dataset – Sarcasm, Irony and Metaphor. The following table relates the performance of each system on each kind of language context. Note that the four conditions -- Sarcasm, Irony, Metaphor and Other -- are not necessarily mutually exclusive.
These conditions refer to the labels according to which the dataset was collated and organized. Data labeled Other is simply intended to be representative of language on Twitter in general, and may actually contain figurative phenomena -- the point here is that Other data is not selected because it has or has not any particular quality. It is merely a default set. In contrast, the subset of the data labeled Sarcasm was selected because each tweet exhibits sarcasm; data labeled Irony was selected because each tweet exhibits irony; and data labeled Metaphor was selected because each tweet employs one or more metaphors.
Note also that the goal of the task is to understand how well a computational system can recognize sentiment under different figurative conditions, and this makes the null case an important factor in the analysis.
|
Mean Squared Error measure |
Cosine Similarity measure |
||||||||||
Team |
Rank |
Overall |
Sarcasm |
Irony |
Metaphor |
Other |
Rank |
Overall |
Sarcasm |
Irony |
Metaphor |
Other |
ClaC |
1 |
2.117 |
1.023 |
0.779 |
3.155 |
3.411 |
1 |
0.758 |
0.892 |
0.904 |
0.655 |
0.584 |
UPF-Dec-19 |
2 |
2.458 |
0.934 |
1.041 |
4.186 |
3.772 |
2 |
0.711 |
0.903 |
0.873 |
0.520 |
0.486 |
UPF-Dec-19 |
|
2.458 |
0.934 |
1.041 |
4.186 |
3.772 |
|
0.711 |
0.903 |
0.873 |
0.520 |
0.486 |
LLT_PolyU-Dec-20_7_31_46 |
|
2.602 |
0.997 |
0.671 |
3.917 |
4.617 |
3 |
0.687 |
0.896 |
0.918 |
0.535 |
0.290 |
LLT_PolyU-Dec-20_7_10_29 |
|
2.673 |
1.021 |
0.702 |
4.102 |
4.685 |
|
0.677 |
0.892 |
0.914 |
0.506 |
0.293 |
LLT_PolyU-Dec-20_14_42_31 |
3 |
2.600 |
1.018 |
0.673 |
3.917 |
4.587 |
|
0.687 |
0.893 |
0.917 |
0.535 |
0.301 |
LT3-dec-19-10-21-28-run1 |
|
3.398 |
1.287 |
1.224 |
5.670 |
5.444 |
4 |
0.6581 |
0.891 |
0.897 |
0.443 |
0.346 |
LT3-dec-19-10-21-28-run2 |
4 |
2.912 |
1.286 |
1.083 |
4.793 |
4.503 |
|
0.648 |
0.872 |
0.861 |
0.355 |
0.357 |
LT3-dec-19-12-11-44-run1 |
|
3.398 |
1.287 |
1.224 |
5.670 |
5.444 |
|
0.6581 |
0.891 |
0.897 |
0.443 |
0.346 |
LT3-dec-19-12-11-44-run2 |
|
2.912 |
1.286 |
1.083 |
4.793 |
4.503 |
|
0.648 |
0.872 |
0.861 |
0.355 |
0.357 |
elirf |
8 |
3.096 |
1.349 |
1.034 |
4.565 |
5.235 |
5 |
0.6579 |
0.904 |
0.905 |
0.411 |
0.247 |
ValenTo |
5 |
2.999 |
1.004 |
0.777 |
4.730 |
5.315 |
6 |
0.634 |
0.895 |
0.901 |
0.393 |
0.202 |
HLT |
11 |
4.088 |
1.327 |
1.184 |
6.589 |
7.119 |
7 |
0.630 |
0.887 |
0.907 |
0.379 |
0.365 |
CPH-ridge |
|
3.079 |
1.041 |
0.904 |
4.916 |
5.343 |
8 |
0.625 |
0.897 |
0.886 |
0.325 |
0.218 |
CPH-esemble |
7 |
3.078 |
0.971 |
0.774 |
5.014 |
5.429 |
|
0.623 |
0.900 |
0.903 |
0.308 |
0.226 |
CPH-specialesemble |
|
11.274 |
19.267 |
9.124 |
7.806 |
7.027 |
|
0.298 |
-0.148 |
0.281 |
0.535 |
0.612 |
Prhlt-ETR-ngram |
6 |
3.023 |
1.028 |
0.784 |
5.446 |
4.888 |
9 |
0.623 |
0.891 |
0.901 |
0.167 |
0.218 |
Prhlt-ETR-word |
|
3.112 |
1.041 |
0.791 |
5.031 |
5.448 |
|
0.611 |
0.890 |
0.901 |
0.294 |
0.129 |
Prhlt-RFR-word |
|
3.107 |
1.060 |
0.809 |
5.115 |
5.345 |
|
0.613 |
0.888 |
0.898 |
0.282 |
0.170 |
Prhlt-RFR-ngram |
|
3.229 |
1.059 |
0.811 |
5.878 |
5.243 |
|
0.597 |
0.888 |
0.898 |
0.135 |
0.192 |
Prhlt-BRR-word |
|
3.299 |
1.146 |
0.934 |
5.178 |
5.773 |
|
0.592 |
0.883 |
0.880 |
0.280 |
0.110 |
Prhlt-BRR-ngram |
|
3.266 |
1.100 |
0.941 |
5.925 |
5.205 |
|
0.593 |
0.886 |
0.879 |
0.119 |
0.186 |
DsUniPi |
10 |
3.925 |
1.499 |
1.656 |
7.106 |
5.744 |
10 |
0.601 |
0.87 |
0.839 |
0.359 |
0.271 |
PKU |
9 |
3.746 |
1.148 |
1.015 |
5.876 |
6.743 |
11 |
0.574 |
0.883 |
0.877 |
0.350 |
0.137 |
KELabTeam |
|
5.552 |
1.198 |
1.255 |
7.264 |
9.905 |
|
0.531 |
0.883 |
0.895 |
0.341 |
0.117 |
KELabTeam-content based |
|
6.090 |
1.756 |
1.811 |
8.707 |
11.526 |
12 |
0.552 |
0.896 |
0.915 |
0.341 |
0.115 |
KELabTeam-emotiona pattern based |
12 |
4.177 |
1.189 |
0.809 |
6.829 |
7.628 |
|
0.533 |
0.874 |
0.900 |
0.289 |
0.135 |
RGU-testsentfinal |
13 |
5.143 |
1.954 |
1.867 |
8.015 |
8.602 |
13 |
0.523 |
0.829 |
0.832 |
0.291 |
0.165 |
RGU-testsentwarppred |
|
5.323 |
1.855 |
1.541 |
8.033 |
9.505 |
|
0.509 |
0.842 |
0.861 |
0.280 |
0.090 |
RGU-testsentpredictions |
|
5.323 |
1.855 |
1.541 |
8.033 |
9.505 |
|
0.509 |
0.842 |
0.861 |
0.280 |
0.090 |
SHELLFBK-run3 |
15 |
7.701 |
4.375 |
4.516 |
9.219 |
12.16 |
14 |
0.431 |
0.669 |
0.625 |
0.35 |
0.167 |
SHELLFBK-run2 |
|
9.265 |
5.183 |
5.047 |
11.058 |
15.055 |
|
0.427 |
0.681 |
0.652 |
0.346 |
0.146 |
SHELLFBK-run1 |
|
10.486 |
12.326 |
9.853 |
10.649 |
8.957 |
|
0.145 |
-0.013 |
0.104 |
0.167 |
0.308 |
SHELLFBK-run1_Dec_9 |
|
10.486 |
12.326 |
9.853 |
10.649 |
8.957 |
|
0.145 |
-0.013 |
0.104 |
0.167 |
0.308 |
BUAP |
14 |
6.785 |
4.339 |
7.609 |
8.93 |
7.253 |
15 |
0.058 |
0.412 |
-0.209 |
-0.023 |
-0.025 |