Task Results and Initial Analysis

 

1. Overview of Results for Task 11

15 teams participated in the task and a total of 35 runs have been submitted. The best reported system is by team CLaC, which achieves the score of .758 using the Cosine Similarity measure, and a score of 2.117 using the Mean Squared Error (MSE) measure. The score of each system ranges from 0.059 to 0.758 using Cosine similarity and from 11.274 to 2.117 using MSE. 


The following is an overview table (sorted by Cosine Similarity scores) of the performance of the best runs of each participating system. Note that the best MSE score for each team may come from a different run other than the one chosen here (see Section 2 for more details).

Team

Cosine

MSE

ClaC

0.758

2.117

UPF

0.710

2.458

LLT_PolyU

0.678

2.600

LT3

0.6581

3.398

elirf

0.6579

3.096

ValenTo

0.634

2.999

HLT

0.630

4.088

CPH

0.625

3.079

prhlt

0.623

3.023

DsUniPi

0.601

3.925

PKU

0.574

3.746

KELabTeam

0.552

6.090

RGU

0.523

8.602

SHELLFBK

0.431

7.701

BUAP

0.059

6.785

 

2. Category Analysis Results by Figurative kind (Sarcasm, Irony, Metaphor and Other)

 

The dataset contains Figurative tweets and non-figurative tweets (labeleld the other category). There were 3 types of Figurative language types in the dataset – Sarcasm, Irony and Metaphor. The following table relates the performance of each system on each kind of language context. Note that the four conditions -- Sarcasm, Irony, Metaphor and Other -- are not necessarily mutually exclusive.


These conditions refer to the labels according to which the dataset was collated and organized. Data labeled Other is simply intended to be representative of language on Twitter in general, and may actually contain figurative phenomena -- the point here is that Other data is not selected because it has or has not any particular quality. It is merely a default set. In contrast, the subset of the data labeled Sarcasm was selected because each tweet exhibits sarcasm; data labeled Irony was selected because each tweet exhibits irony; and data labeled Metaphor was selected because each tweet employs one or more metaphors.


Note also that the goal of the task is to understand how well a computational system can recognize sentiment under different figurative conditions, and this makes the null case an important factor in the analysis.

 

 

 

 

Mean Squared Error measure

Cosine Similarity measure

Team

Rank 

Overall 

Sarcasm 

Irony 

Metaphor 

Other 

Rank 

Overall 

Sarcasm 

Irony

Metaphor 

Other 

ClaC

1

2.117

1.023

0.779

3.155

3.411

1

0.758

0.892

0.904

0.655

0.584

UPF-Dec-19

2

2.458

0.934

1.041

4.186

3.772

2

0.711

0.903

0.873

0.520

0.486

UPF-Dec-19

 

2.458

0.934

1.041

4.186

3.772

 

0.711

0.903

0.873

0.520

0.486

 LLT_PolyU-Dec-20_7_31_46

 

2.602

0.997

0.671

3.917

4.617

3

0.687

0.896

0.918

0.535

0.290

 LLT_PolyU-Dec-20_7_10_29

 

2.673

1.021

0.702

4.102

4.685

 

0.677

0.892

0.914

0.506

0.293

 LLT_PolyU-Dec-20_14_42_31

3

2.600

1.018

0.673

3.917

4.587

 

0.687

0.893

0.917

0.535

0.301

LT3-dec-19-10-21-28-run1

 

3.398

1.287

1.224

5.670

5.444

4

0.6581

0.891

0.897

0.443

0.346

LT3-dec-19-10-21-28-run2

4

2.912

1.286

1.083

4.793

4.503

 

0.648

0.872

0.861

0.355

0.357

LT3-dec-19-12-11-44-run1

 

3.398

1.287

1.224

5.670

5.444

 

0.6581

0.891

0.897

0.443

0.346

LT3-dec-19-12-11-44-run2

 

2.912

1.286

1.083

4.793

4.503

 

0.648

0.872

0.861

0.355

0.357

elirf

8

3.096

1.349

1.034

4.565

5.235

5

0.6579

0.904

0.905

0.411

0.247

ValenTo

5

2.999

1.004

0.777

4.730

5.315

6

0.634

0.895

0.901

0.393

0.202

HLT

11

4.088

1.327

1.184

6.589

7.119

7

0.630

0.887

0.907

0.379

0.365

CPH-ridge

 

3.079

1.041

0.904

4.916

5.343

8

0.625

0.897

0.886

0.325

0.218

CPH-esemble

7

3.078

0.971

0.774

5.014

5.429

 

0.623

0.900

0.903

0.308

0.226

CPH-specialesemble

 

11.274

19.267

9.124

7.806

7.027

 

0.298

-0.148

0.281

0.535

0.612

Prhlt-ETR-ngram

6

3.023

1.028

0.784

5.446

4.888

9

0.623

0.891

0.901

0.167

0.218

Prhlt-ETR-word

 

3.112

1.041

0.791

5.031

5.448

 

0.611

0.890

0.901

0.294

0.129

Prhlt-RFR-word

 

3.107

1.060

0.809

5.115

5.345

 

0.613

0.888

0.898

0.282

0.170

Prhlt-RFR-ngram

 

3.229

1.059

0.811

5.878

5.243

 

0.597

0.888

0.898

0.135

0.192

Prhlt-BRR-word

 

3.299

1.146

0.934

5.178

5.773

 

0.592

0.883

0.880

0.280

0.110

Prhlt-BRR-ngram

 

3.266

1.100

0.941

5.925

5.205

 

0.593

0.886

0.879

0.119

0.186

DsUniPi

10

3.925

1.499

1.656

7.106

5.744

10

0.601

0.87

0.839

0.359

0.271

PKU

9

3.746

1.148

1.015

5.876

6.743

11

0.574

0.883

0.877

0.350

0.137

KELabTeam

 

5.552

1.198

1.255

7.264

9.905

 

0.531

0.883

0.895

0.341

0.117

KELabTeam-content based

 

6.090

1.756

1.811

8.707

11.526

12

0.552

0.896

0.915

0.341

0.115

KELabTeam-emotiona pattern based

12

4.177

1.189

0.809

6.829

7.628

 

0.533

0.874

0.900

0.289

0.135

RGU-testsentfinal

13

5.143

1.954

1.867

8.015

8.602

13

0.523

0.829

0.832

0.291

0.165

RGU-testsentwarppred

 

5.323

1.855

1.541

8.033

9.505

 

0.509

0.842

0.861

0.280

0.090

RGU-testsentpredictions

 

5.323

1.855

1.541

8.033

9.505

 

0.509

0.842

0.861

0.280

0.090

SHELLFBK-run3

15

7.701

4.375

4.516

9.219

12.16

14

0.431

0.669

0.625

0.35

0.167

SHELLFBK-run2

 

9.265

5.183

5.047

11.058

15.055

 

0.427

0.681

0.652

0.346

0.146

SHELLFBK-run1

 

10.486

12.326

9.853

10.649

8.957

 

0.145

-0.013

0.104

0.167

0.308

SHELLFBK-run1_Dec_9

 

10.486

12.326

9.853

10.649

8.957

 

0.145

-0.013

0.104

0.167

0.308

BUAP

14

6.785

4.339

7.609

8.93

7.253

15

0.058

0.412

-0.209

-0.023

-0.025


 

Contact Info

Organizers

  • John Barnden (J.A.Barnden@cs.bham.ac.uk) University of Birmingham, UK.
  • Antonio Reyes (antonioreyes@isit.edu.mx) Superior Institute of Interpreters and Translators
  • Ekaterina Shutova (shutova.e@gmail.com) ICSI, UC Berkeley
  • Paolo Rosso (prosso@dsic.upv.es) Technical University of Valencia
  • Tony Veale (tony.veale@ucd.ie ) University College Dublin

email : tony.veale@UCD.ie

Other Info

Announcements

  • Initial Analysis of Results is now available here
  • Test data for this task will be available from Dec 5th. To obtain the test data, you must register for the task. Here is the link
    Note: you have 5 days to submit your results from the time your download the data. Do not download until you are ready to use it!
  • We have now released a Java scorer for download: please see the Data and Tools page.
  • Note: the dates for the evaluation period for SemEval-2015 have changed! (Dec. 5 -- 22, 2014)
  • Training data for this task (8000 figurative tweets annotated with sentiment scores in the range -5...+5) is now available.
  • Trial data for this task (1000 figurative tweets annotated with sentiment scores in the range -5...+5) is now available.
  • Follow @MetaphorMagnet -- a Twitterbot that uses metaphor theory to automatically generate novel metaphors