Results


 

Official results

The main evaluation column is "mean". The rank column gives the rank of the submission as ordered by the "mean" result.


English STS

 

RUN answers-forums answers-students belief headlines images Mean Rank
Baseline-tokencos 0.4453 0.6647 0.6517 0.5312 0.6039 0.5871 61
A96T-RUN1 0.6686 0.7192 0.7117 0.7357 0.7896 0.7337 29
ASAP-FIRSTRUN 0.2304 0.6503 0.3928 0.6614 0.6548 0.5695 63
ASAP-SECONDRUN 0.2374 0.7095 0.3986 0.7039 0.7294 0.6152 56
**ASAP-THIRDRUN 0.2303 0.6719 0.4342 0.7156 0.7250 0.6112 57
AZMAT-RUNABS 0.3099 0.4282 0.3568 0.5280 0.5118 0.4503 70
AZMAT-RUNCAP 0.2932 0.4282 0.3526 0.5350 0.5186 0.4512 69
AZMAT-RUNSCALE 0.2933 0.4293 0.3587 0.5264 0.5145 0.4490 71
BLCUNLP-1stRUN 0.4231 0.5152 0.5510 0.5651 0.7163 0.5709 62
BLCUNLP-2ndRUN 0.5725 0.6586 0.5510 0.7238 0.8271 0.6928 44
BLCUNLP-3rdRUN 0.5725 0.5753 0.4462 0.7309 0.8070 0.6556 49
BUAP-RUN1 0.5564 0.6901 0.6473 0.7167 0.7658 0.6936 43
DalGTM-run1 0.2902 -0.0534 0.0625 0.0598 0.0663 0.0623 74
DalGTM-run2 0.3537 0.1189 0.0625 0.2354 0.2042 0.1917 72
DalGTM-run3 0.1533 0.1189 -0.1319 -0.0395 0.2021 0.0731 73
DCU-RUN1 0.5556 0.6582 0.5464 0.8284 0.8394 0.7192 34
DCU-RUN2 0.5628 0.6233 0.7549 0.8187 0.8350 0.7340 28
DCU-RUN3 0.6530 0.6108 0.6977 0.8181 0.8434 0.7369 26
DLS@CU-S1 0.7390 0.7725 0.7491 0.8250 0.8644 0.8015 1
DLS@CU-S2 0.7241 0.7569 0.7223 0.8250 0.8631 0.7921 3
DLS@CU-U 0.6821 0.7879 0.7325 0.8238 0.8485 0.7919 5
ECNU-1stSVMALL 0.7145 0.7122 0.7282 0.7980 0.8467 0.7696 19
ECNU-2ndSVMONE 0.6865 0.7329 0.6977 0.8196 0.8358 0.7701 18
ECNU-3rdMTL 0.6919 0.7515 0.6951 0.8049 0.8575 0.7769 16
ExBThemis-default 0.6946 0.7505 0.7521 0.8245 0.8527 0.7878 8
ExBThemis-themis 0.6946 0.7505 0.7482 0.8245 0.8527 0.7873 9
ExBThemis-themisexp 0.6946 0.7784 0.7482 0.8245 0.8527 0.7942 2
FBK-HLT-RUN1 0.7131 0.7442 0.7327 0.8079 0.8574 0.7831 12
FBK-HLT-RUN2 0.7101 0.7410 0.7377 0.8008 0.8545 0.7801 13
FBK-HLT-RUN3 0.6555 0.7362 0.7460 0.7083 0.8389 0.7461 23
FCICU-Run1 0.6152 0.6686 0.6109 0.7418 0.7853 0.7022 41
FCICU-Run2 0.3659 0.6460 0.5896 0.6448 0.6194 0.5970 59
FCICU-Run3 0.7091 0.7096 0.7184 0.7922 0.8223 0.7595 20
IITNLP-FirstRun 0.3728 0.6605 0.7717 0.5996 0.8523 0.6712 47
MathLingBudapest-embedding 0.7039 0.7004 0.7325 0.7690 0.8038 0.7478 22
MathLingBudapest-hybrid 0.7231 0.7513 0.7473 0.8037 0.8442 0.7836 11
MathLingBudapest-machines 0.6977 0.7455 0.7363 0.8046 0.8414 0.7771 15
MiniExperts-Run1 0.6781 0.7304 0.6294 0.6912 0.8109 0.7216 33
MiniExperts-Run2 0.6454 0.7093 0.5165 0.6084 0.7999 0.6746 45
MiniExperts-Run3 0.6179 0.6977 0.3236 0.5775 0.7954 0.6353 55
NeRoSim-R1 0.5260 0.7251 0.6311 0.8131 0.8585 0.7438 24
NeRoSim-R2 0.6940 0.7446 0.7512 0.8077 0.8647 0.7849 10
NeRoSim-R3 0.6778 0.7357 0.7220 0.8123 0.8570 0.7762 17
RTM-DCU-1stPLS.svr 0.5484 0.5549 0.6223 0.7281 0.7189 0.6468 50
RTM-DCU-2ndST.svr 0.5484 0.5549 0.6223 0.7281 0.7189 0.6468 51
RTM-DCU-3rdST.rr 0.5484 0.5549 0.6223 0.7281 0.7189 0.6468 52
Samsung-alpha 0.6589 0.7827 0.7029 0.8342 0.8701 0.7920 4
Samsung-beta 0.6586 0.7819 0.6995 0.8342 0.8713 0.7916 7
Samsung-delta 0.6639 0.7825 0.6952 0.8417 0.8634 0.7918 6
SemantiKLUE-RUN1 0.4913 0.7005 0.5617 0.6681 0.7915 0.6717 46
SopaLipnIimas-MLP 0.6178 0.5864 0.6886 0.8121 0.8184 0.7175 36
SopaLipnIimas-RF 0.6709 0.5914 0.7238 0.8123 0.8414 0.7356 27
SopaLipnIimas-SVM 0.5918 0.5718 0.7028 0.7985 0.8104 0.7070 39
T2a-TrWP-run1 0.6857 0.6618 0.6769 0.7709 0.7865 0.7251 31
T2a-TrWP-run2 0.6857 0.6618 0.7245 0.7709 0.7865 0.7311 30
T2a-TrWP-run3 0.6857 0.6612 0.6772 0.7710 0.7865 0.7250 32
TATO-1stWTW 0.6796 0.6853 0.7206 0.7667 0.8167 0.7422 25
UBC-RUN1 0.4764 0.5459 0.6788 0.6368 0.7852 0.6364 53
UMDuluth-BlueTeam-Run1 0.6561 0.7816 0.7363 0.8085 0.8236 0.7775 14
UQeResearch-AllRuns-run1 0.5923 0.6876 0.5904 0.7521 0.7817 0.7032 40
UQeResearch-AllRuns-run2 0.6132 0.6882 0.6229 0.7602 0.7855 0.7130 37
UQeResearch-AllRuns-run3 0.6188 0.6757 0.7178 0.7549 0.7769 0.7189 35
USAAR_SHEFFIELD-modelx 0.3706 0.3609 0.4767 0.5183 0.5436 0.4616 68
USAAR_SHEFFIELD-modely 0.6264 0.7386 0.7050 0.7927 0.8162 0.7533 21
USAAR_SHEFFIELD-modelz 0.4237 0.6757 0.6994 0.5239 0.6833 0.6111 58
WSL-run1 0.3759 0.5269 0.6387 0.5462 0.5710 0.5379 66
WSL-run2 0.4287 0.6028 0.5231 0.6029 0.4879 0.5424 65
WSL-run3 0.3709 0.5437 0.6478 0.5752 0.6407 0.5672 64
Yamraj-1stRUNNAME 0.5634 0.6727 0.6387 0.6067 0.7425 0.6558 48
Yamraj-2ndRUNNAME 0.4367 0.4716 0.4890 0.5533 0.4799 0.4919 67
Yamraj-3rdRUNNAME 0.5168 0.5835 0.6540 0.5861 0.6097 0.5912 60
yiGou-midbaitu 0.5797 0.6571 0.6473 0.7115 0.8036 0.6964 42
yiGou-xiaobaitu 0.6102 0.6872 0.6065 0.7369 0.8133 0.7114 38
*UBC-RUN1 0.4764 0.5459 0.6788 0.6368 0.7852 0.6364 54

 


Spanish STS

 

RUN Wikipedia Newswire Mean Rank
Baseline-tokencos 0.52869 0.49493 0.50621 12
BUAP-run1 0.48873 0.40451 0.43266 15
ExBThemis-trainEn 0.67630 0.67054 0.67247 3
ExBThemis-trainEs 0.70545 0.68295 0.69047 1
ExBThemis-trainMini 0.70550 0.68113 0.68927 2
RTM-DCU-1stST.tree 0.58233 0.52513 0.54425 8
RTM-DCU-2ndST.rr 0.58233 0.52513 0.54425 7
RTM-DCU-3rdST.SVR 0.58233 0.52513 0.54425 6
SopaLipnIimas-MLP 0.25257 0.53416 0.44005 13
SopaLipnIimas-RF 0.56371 0.56545 0.56487 5
SopaLipnIimas-SVM 0.41941 0.40067 0.40693 16
UMDuluth-BlueTeam-run1 0.59364 0.65471 0.63430 4
MiniExperts-run1 0.52390 0.50760 0.51305 11
MiniExperts-run2 0.46707 0.54370 0.51809 9
MiniExperts-run3 0.44015 0.55243 0.51490 10
Yamraj-1stNoConfidence 0.57681 0.36541 0.43606 14
Yamraj-1stWithConfidence 0.53240 0.34154 0.40533 17

 


Pilot on Interpretable STS

 

GOLD CHUNKS

RUN F1 ALI F1 TYPE F1 SCORE F1 TYP + SCO F1 ALI F1 TYPE F1 SCORE F1 TYP + SCO
baseline 0.8448 0.5556 0.7551 0.5556 0.8388 0.4328 0.721 0.4326
ExBThemis__avgScorer 0.8146 0.4943 0.7171 0.4885 0.8057 0.4413 0.6992 0.4246
ExBThemis__mostFreqScorer 0.8146 0.4943 0.714 0.4884 0.8057 0.4413 0.7007 0.4296
ExBThemis__regressionScorer 0.8146 0.4943 0.7158 0.4883 0.8052 0.4406 0.6989 0.4288
FCICU__Run1 0.8455 0.448 0.716 0.4325 0.8457 0.474 0.7273 0.4482
NeRoSim__R1 0.8984 0.6543 0.8262 0.6389 0.887 0.6143 0.7877 0.5841
NeRoSim__R2 0.8972 0.6558 0.8263 0.6401 0.88 0.5854 0.7818 0.5619
NeRoSim__R3 0.8976 0.6666 0.8157 0.6426 0.8834 0.6035 0.7837 0.5759
**RTM-DCU__1stIBM2Alignment 0.4914 0.3712 0.455 0.3712 0.354 0.2283 0.3187 0.2282
SimCompass__combined 0.871 0.5813 0.7651 0.5239 0.849 0.4555 0.7294 0.3965
SimCompass__prefix 0.836 0.5834 0.7474 0.5338 0.8361 0.4708 0.7269 0.4157
SimCompass__word2vec 0.8716 0.5806 0.7654 0.5253 0.8624 0.4599 0.7405 0.4017
UMDuluth_BlueTeam__1 0.8861 0.5962 0.796 0.5887 0.8853 0.5842 0.7932 0.5729
UMDuluth_BlueTeam__2 0.8861 0.5962 0.7968 0.5883 0.8853 0.6095 0.7968 0.5964
UMDuluth_BlueTeam__3 0.8861 0.59 0.798 0.5834 0.8853 0.5964 0.7909 0.5822
*UBC__RUN1 0.8991 0.5882 0.8031 0.5882 0.8846 0.4749 0.7709 0.4746
*UBC__RUN2 0.8991 0.6402 0.8211 0.6185 0.8846 0.6557 0.8085 0.6159

 

SYSTEM CHUNKS

RUN F1 ALI F1 TYPE F1 SCORE F1 TYP + SCO F1 ALI F1 TYPE F1 SCORE F1 TYP + SCO
baseline 0.6701 0.4571 0.6066 0.4571 0.706 0.3696 0.6092 0.3693
ExBThemis__avgScorer 0.7032 0.4331 0.6224 0.429 0.6966 0.397 0.6068 0.3806
ExBThemis__mostFreqScorer 0.7032 0.4331 0.62 0.4288 0.6966 0.397 0.6106 0.387
ExBThemis__regressionScorer 0.7032 0.4331 0.6209 0.4284 0.6966 0.397 0.6092 0.3867
**RTM-DCU__1stIBM2Alignment 0.4914 0.3712 0.455 0.3712 0.354 0.2283 0.3187 0.2282
SimCompass__combined 0.6467 0.4333 0.5636 0.387 0.5433 0.2854 0.4545 0.2421
SimCompass__prefix 0.631 0.4284 0.5526 0.3872 0 0 0 0
SimCompass__word2vec 0.6461 0.4334 0.5619 0.3878 0.5428 0.2831 0.4561 0.2427
UMDuluth_BlueTeam__1 0.782 0.5058 0.6968 0.5004 0.8336 0.5529 0.7498 0.5431
UMDuluth_BlueTeam__2 0.782 0.5109 0.6986 0.5049 0.8336 0.5759 0.7511 0.5634
UMDuluth_BlueTeam__3 0.782 0.5154 0.7024 0.5098 0.8336 0.5605 0.7456 0.5473
*UBC__RUN1 0.7709 0.5019 0.6892 0.5019 0.8388 0.445 0.728 0.4447
*UBC__RUN2 0.7709 0.4865 0.7014 0.4705 0.8388 0.6019 0.7634 0.5643

 



Notes

* Marks submissions which involve organizers of the task.

** Post-deadlines submissions/fixes.

Contact Info

email list: sts-semeval@googlegroups.com

Other Info

Announcements

  • NEW Nov. 10: final train data for interpretable STS, with updated evaluation script
  • Oct. 16: interpretable STS updated description, train data, guidelines
  • Aug. 15: subtasks with descriptions and trial data available
  • Please fill in SemEval registration form
  • Please join the mailing list for updates