Evaluation

Evaluation Measures

Structural Measures:

|V|: number of dinstict vertices;
|E|: number of dinstict edges;
#c.c.: number of connected components;
cycles: YES = the taxonomy contains cycles, NO = the taxonomy is a Directed Acyclic Graph (DAG).
#intermediate nodes = |V| - |L| where L is the set of leaves

Comparison against gold standard:

# vertices in common: |{vertices in common with the gold standard taxonomy}|;

vertex coverage: |{vertices in common with the gold standard taxonomy}| / |{gold standard vertices}| ;

# edges in common: |{edges in common with the gold standard taxonomy}|;

edge coverage: |{edges in common with the gold standard taxonomy}| / |{gold standard edges}| ;

ratio of novel edges: ( |{taxonomy edges}| - |{edges in common with the gold standard taxonomy}| ) / |{gold standard edges}|;

P = | {edges in common with the gold standard taxonomy} | / |{system edges}|

R = | {edges in common with the gold standard taxonomy} | / |{gold standard edges}|

F = 2(P*R)/(P+R)

Cumulative Fowlkes&Mallows Measure: cumulative measure of similarity⁷ .

Manual quality assessment of novel edges

correct ISA = ISA AND domain specific AND not over-generic

P = |correct ISA| / |sample|

Gold Standard

The gold standard taxonomies (.taxo) are tab-separated fields:
relation_id <TAB> term <TAB> hypernym
where:
- relation_id: is a relation identifier;
- term: is a term of the taxonomy;
- hypernym: is a hypernym for the term.
e.g
0<TAB>cat<TAB>animal
1<TAB>dog<TAB>animal
2<TAB>car<TAB>animal
....

Comparative Evaluation

	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR
Cycles	3	4	2	1	3	4
Cumulative Fowlkes&Mallows Measure	2	1	6	3	4	5
Intermediate nodes	2	5	3	6	4	1
"Gold Standard evaluation (F-score ranking)"	2	1	4	5	6	3
No of domains submitted	1	3	1	2	1	1
"Manual evaluation (Precision ranking)"	2	1	4	5	6	3
Final Ranking	1	2	4	5	6	3

vertex coverage

	chemical	wn_chemical	equipment	wn_equipment	food	wn_food	science	wn_science
INRIASAC	0.7037	0.9829	0.8300	0.9789	0.8425	0.9730	0.9159	0.8531
LT3	n.a.	n.a.	0.4248	0.9726	0.4389	0.9899	0.6327	0.8624
ntnu	0.0333	0.5965	0.1046	0.5242	0.1356	0.4973	0.2300	0.6480
QASSIT	n.a.	0.9985	0.9918	1.0000	0.8695	1.0000	0.9977	0.8624
TALN-UPF	1.0000	0.9970	1.0000	1.0000	0.8695	1.0000	0.9977	0.8624
USAAR-WLV	0.7838	0.8675	0.5490	0.7431	0.6092	0.8068	0.7831	0.7132

	Avg chemical	Avg equipment	Avg food	Avg science	Avg
INRIASAC	0.8433	0.90445	0.90775	0.8845	0.885
LT3	n.a	0.6987	0.7144	0.74755	0.7202
ntnu	0.3149	0.314385	0.31645	0.439	0.34618375
QASSIT	n.a	0.9959	0.93475	0.93005	0.9609
TALN-UPF	0.9985	1.0000	0.93475	0.93005	0.965825
USAAR-WLV	0.82565	0.64605	0.708	0.74815	0.7319625

edge coverage

	chemical	wn_chemical	equipment	wn_equipment	food	wn_food	science	wn_science
INRIASAC	0.0969	0.4657	0.4959	0.3793	0.5179	0.4735	0.4494	0.5442
LT3	n.a.	n.a.	0.3219	0.9484	0.2974	0.9719	0.3806	0.8639
ntnu	0.0013	0.5594	0.0065	0.4597	0.0541	0.4664	0.0451	0.6122
QASSIT	n.a.	0.0843	0.2455	0.1979	0.0655	0.0593	0.2559	0.2902
TALN-UPF	0.0004	0.0930	0.1577	0.0453	0.0359	0.0782	0.0172	0.1111
USAAR-WLV	0.0977	0.3835	0.3691	0.3072	0.2696	0.3581	0.3720	0.3537

	Avg chemical	Avg equipment	Avg food	Avg science	Avg
INRIASAC	0.2813	0.4376	0.4957	0.4968	0.42785
LT3	n.a	0.63515	0.63465	0.62225	0.6306
ntnu	0.28035	0.2331	0.26025	0.32865	0.2755875
QASSIT	n.a	0.2217	0.0624	0.27305	0.1712
TALN-UPF	0.0467	0.1015	0.05705	0.06415	0.06735
USAAR-WLV	0.2406	0.33815	0.31385	0.36285	0.3138625

ratio of novel edges

	chemical	wn_chemical	equipment	wn_equipment	food	wn_food	science	wn_science
INRIASAC	1.0491	2.8586	1.4032	2.4432	2.2312	2.2909	2.0537	1.9546
LT3	n.a.	n.a.	0.1365	2.0453	0.7309	3.5375	0.5677	2.7029
ntnu	0.0616	0.7779	0.3951	2.2886	0.7189	1.3339	0.7849	0.9319
QASSIT	n.a.	0.9105	0.7528	0.8123	0.9174	0.9445	0.8430	0.6984
TALN-UPF	0.7089	0.9531	0.9235	6.9030	0.9527	0.9315	3.4731	0.7800
USAAR-WLV	1.1268	1.8566	0.5219	0.8206	1.4265	1.9021	1.6752	1.6689

	Avg chemical	Avg equipment	Avg food	Avg science	Avg
INRIASAC	1.95385	1.9232	2.26105	2.00415	2.0355625
LT3	n.a	1.0909	2.1342	1.6353	1.6201
ntnu	0.41975	1.34185	1.0264	0.8584	0.9116
QASSIT	n.a	0.78255	0.93095	0.7707	0.8398
TALN-UPF	0.831	3.91325	0.9421	2.12655	1.953225
USAAR-WLV	1.4917	0.67125	1.6643	1.67205	1.374825

Average Precision, Recall, and F-measure against gold standard

	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
Avg. P	0.1721	0.3612	0.1754	0.1563	0.0720	0.2014
Avg. R	0.4279	0.6307	0.2756	0.1588	0.1165	0.3139
Avg. F	0.2427	0.3886	0.2075	0.1575	0.0798	0.2377

Cumulative Fowlkes&Mallows Measure

	chemical	wn_chemical	equipment	wn_equipment	food	wn_food	science	wn_science
INRIASAC	0.2353	0.0084	0.4905	0.0700	0.4522	0.4804	0.4706	0.4153
LT3	n.a	n.a	0.1137	0.6892	0.2163	0.5899	0.3303	0.5391
ntnu	0.0009	0.0719	0.0000	0.0935	0.0076	0.2673	0.0088	0.0158
QASSIT	n.a	0.3947	0.4881	0.3637	0.3405	0.3153	0.5232	0.2921
TALN-UPF	0.2225	0.2787	0.4482	0.0901	0.3267	0.3091	0.2202	0.2126
USAAR-WLV	0.00001	0.2103	0.0000	0.0015	0.0037	0.0036	0.2249	0.1721

	Avg chemical	Avg equipment	Avg food	Avg science	Avg
INRIASAC	0.12185	0.28025	0.4663	0.44295	0.3278375
LT3	n.a	0.40145	0.4031	0.4347	0.4130
ntnu	0.0364	0.04675	0.13745	0.0123	0.058225
QASSIT	n.a.	0.4259	0.3279	0.40765	0.3882
TALN-UPF	0.2506	0.26915	0.3179	0.2164	0.2635125
USAAR-WLV	0.105155	0.00075	0.00365	0.1985	0.07701375

Precision of novel edges

	equipment	food	science	wn_equipment	wn_food	wn_science	Avg. prec.
INRIASAC	59	37	51	63	37	41	48.0
LT3	94	58	69	53	44	40	59.6
ntnu	40	32	23	27	41	49	35.3
QASSIT	44	1	38	21	2	42	24.7
TALN-UPF	14	2	13	12	11	9	10.2
USAAR	80	34	34	45	25	34	42.0

Detailed Evaluation

Domain: chemical

Gold Standard download
The gold standard is an excerpt of the ChEBI¹ chemical ontology.

Structural measures

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	17584	12432	n.a	1114	n.a.	17584	13785
\|E\|	24817	28444	n.a	1563	n.a.	17606	30392
# c.c.	1	293	n.a	116	n.a.	1	302
cycles	NO	YES	n.a	NO	n.a.	NO	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	12374	n.a	586	n.a.	17584	13784
vertex coverage	0.7037	n.a	0.0333	n.a.	1.0	0.7838
# edges in common	2407	n.a	34	n.a.	11	2427
edge coverage	0.0969	n.a	0.0013	n.a.	0.0004	0.0977
ratio of novel edges	1.0491	n.a	0.0616	n.a.	0.7089	1.1268

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.0846	n.a	0.0217	n.a	0.0006	0.0798
R	0.0969	n.a	0.0013	n.a	0.0004	0.0977
F	0.0903	n.a	0.0025	n.a	0.0005	0.0879

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.2353
LT3	n.a.
ntnu	0.0009
QASSIT	n.a.
TALN-UPF	0.2225
USAAR-WLV	0.00001

Domain: equipment

Gold Standard download
The gold standard is an excerpt of the Material Handling Equipment² combined with IS-A relations from WiBi³

Structural measures

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	612	520	260	251	610	612	337
\|E\|	615	1168	282	247	614	665	548
# c.c.	1	6	10	35	1	1	28
cycles	NO	NO	YES	NO	NO	YES	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	508	260	64	607	612	336
vertex coverage	0.8300	0.4248	0.10457	0.9918	1.0	0.5490
# edges in common	305	198	4	151	97	227
edge coverage	0.4959	0.3219	0.0065	0.2455	0.1577	0.3691
ratio of novel edges	1.4032	0.1365	0.3951	0.7528	0.9235	0.5219

precision, recall and F-measure

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.2611	0.7021	0.0161	0.2459	0.1458	0.4142
R	0.4959	0.3219	0.0065	0.2455	0.1577	0.3691
F	0.3421	0.4414	0.0092	0.2457	0.1515	0.3903

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.4905
LT3	0.1137
ntnu	0
QASSIT	0.4881
TALN-UPF	0.4482
USAAR-WLV	0.0018

Domain: food

Gold Standard download
The gold standard is an excerpt of the The Google product taxonomy⁴ combined with IS-A relations from WiBi³

Structural measures

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	1156	1518	819	834	1550	1549	1118
\|E\|	1587	4363	1632	1227	1560	1569	2692
# c.c.	1	2	6	27	1	1	23
cycles	NO	YES	YES	YES	YES	NO	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	1311	683	211	1353	1353	948
vertex coverage	0.8425	0.4389	0.1356	0.8695	0.8695	0.6092
# edges in common	822	472	86	104	57	428
edge coverage	0.5179	0.2974	0.0541	0.0655	0.0359	0.2696
ratio of novel edges	2.2312	0.7309	0.7189	0.9174	0.9527	1.4265

precision, recall and F-MEasure

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.1884	0.2892	0.0700	0.0666	0.0363	0.1589
R	0.5179	0.2974	0.0541	0.0655	0.0359	0.2696
F	0.2763	0.2932	0.0611	0.0660	0.0361	0.2000

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.4522
LT3	0.2163
ntnu	0.0076
QASSIT	0.3405
TALN-UPF	0.3267
USAAR-WLV	0.0037

Domain: science

Gold Standard download
The gold standard is an excerpt of the The TAXONOMY OF FIELDS AND THEIR SUBFIELDS⁵ combined with IS-A relations from WiBi³

Structural measures

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	452	417	287	338	453	1280	355
\|E\|	465	1164	441	386	511	1623	952
# c.c.	1	3	8	23	1	1	14
cycles	NO	NO	YES	NO	NO	YES	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	414	286	104	451	451	354
vertex coverage	0.9159	0.6327	0.2300	0.9977	0.9977	0.7831
# edges in common	209	177	21	104	119	173
edge coverage	0.4494	0.3806	0.0451	0.2559	0.0172	0.3720
ratio of novel edges	2.0537	0.5677	0.7849	0.8430	3.4731	1.6752

precision, recall and F-measure

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.1795	0.4013	0.0544	0.2035	0.0733	0.1817
R	0.4494	0.3806	0.0451	0.2236	0.2559	0.3720
F	0.2565	0.3907	0.0493	0.2131	0.1139	0.2441

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.4706
LT3	0.3303
ntnu	0.0088
QASSIT	0.5232
TALN-UPF	0.2202
USAAR-WLV	0.2249

Domain: wn_chemical

Gold Standard download
The gold standard relations were extracted from the Wordnet⁶ taxonomy under the node "chemical".
Structural measures

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	1351	1913	n.a	1475	1351	1347	1173
\|E\|	1387	4611	n.a	1855	1380	1451	3107
# c.c.	1	2	n.a	28	1	1	31
cycles	NO	YES	n.a	YES	NO	YES	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	1328	n.a	806	1349	1347	1172
vertex coverage	0.9829	n.a	0.5965	0.9985	0.9970	0.8675
# edges in common	646	n.a	776	117	129	532
edge coverage	0.4657	n.a	0.5594	0.0843	0.0930	0.3835
ratio of novel edges	2.8586	n.a	0.7779	0.9105	0.9531	1.8566

precision, recall and F-measure

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.1400	n.a.	0.4183	0.0847	0.0889	0.1712
R	0.4657	n.a.	0.5594	0.0843	0.0930	0.3835
F	0.2154	n.a.	0.4787	0.0845	0.0909	0.2367

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.0084
LT3	n.a.
ntnu	0.0719
QASSIT	0.3947
TALN-UPF	0.2787
USAAR-WLV	0.2103

Domain: wn_equipment

Structural measures
Gold Standard download
The gold standard relations were extracted from the Wordnet⁶ taxonomy under the node "equipment".

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	475	468	462	1081	476	2574	354
\|E\|	485	1369	1452	1333	490	3370	547
# c.c.	1	1	1	12	1	1	43
cycles	NO	YES	YES	YES	NO	YES	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	465	462	249	475	475	353
vertex coverage	0.9789	0.9726	0.5242	1.0	1.0	0.7431
# edges in common	184	460	223	96	97	149
edge coverage	0.3793	0.9484	0.4597	0.1979	0.0453	0.3072
ratio of novel edges	2.4432	2.0453	2.2886	0.8123	6.9030	0.8206

precision, recall and F-measure

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.1344	0.3168	0.1672	0.1959	0.0287	0.2723
R	0.3793	0.9484	0.4597	0.1979	0.2000	0.3072
F	0.1984	0.4749	0.2453	0.1969	0.0503	0.2887

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.0700
LT3	0.6892
ntnu	0.0935
QASSIT	0.3637
TALN-UPF	0.0901
USAAR-WLV	0.0015

Domain: wn_food

Gold Standard download
The gold standard relations were extracted from the Wordnet⁶ taxonomy under the node "food"
Structural measures

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	1486	1458	1471	1834	1478	1486	1200
\|E\|	1533	4238	6913	2760	1539	1548	3465
# c.c.	1	2	1	35	1	1	23
cycles	NO	NO	YES	YES	NO	YES	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	1446	1471	739	1486	1486	1199
vertex coverage	0.9730	0.9899	0.4973	1.0	1.0	0.8068
# edges in common	726	1490	715	91	120	549
edge coverage	0.4735	0.9719	0.4664	0.0593	0.0782	0.3581
ratio of novel edges	2.2909	3.5375	1.3339	0.9445	0.9315	1.9021

precision, recall adn F-measure

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.1713	0.2155	0.2590	0.0591	0.0775	0.1584
R	0.4735	0.9719	0.4664	0.0593	0.0782	0.3581
F	0.2516	0.3528	0.3331	0.0592	0.0778	0.2196

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.4804
LT3	0.5899
ntnu	0.2673
QASSIT	0.3153
TALN-UPF	0.3091
USAAR-WLV	0.0036

Domain: wn_science

Gold Standard download
The gold standard relations were extracted from the Wordnet⁶ taxonomy under the node "science"
Structural measures

Measure	gold standard	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
\|V\|	429	366	370	524	371	370	307
\|E\|	441	1102	1573	681	436	393	892
# c.c.	1	1	1	11	1	1	8
cycles	NO	YES	YES	NO	NO	NO	YES

Comparison against gold standard

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
# vertices in common	366	370	278	370	370	306
vertex coverage	0.8531	0.8624	0.6480	0.8624	0.8624	0.7132
# edges in common	240	381	270	104	49	156
edge coverage	0.5442	0.8639	0.6122	0.2902	0.1111	0.3537
ratio of novel edges	1.9546	2.7029	0.9319	0.6984	0.7800	1.6689

precision, recall and F-measure

Measure	INRIASAC	LT3	ntnu	QASSIT	TALN-UPF	USAAR-WLV
P	0.2177	0.2422	0.3964	0.2385	0.1246	0.1748
R	0.5442	0.8639	0.6122	0.2358	0.1111	0.3537
F	0.3110	0.3783	0.4812	0.2371	0.1175	0.2340

	Cumulative Fowlkes&Mallows Measure:
INRIASAC	0.4153
LT3	0.5391
ntnu	0.0158
QASSIT	0.2921
TALN-UPF	0.2126
USAAR-WLV	0.1721

References

^{1. Degtyarenko Kirill, de Matos Paula, Ennis Marcus, Hastings Janna, Zbinden Martin, McNaught Alan, Alcantara Rafael, Darsow Michael, Guedj Mickael and Ashburner Michael ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research, 36:suppl 1, D344-D350, 2008.}
^{2. Material Handling Equipment taxonomy, http://www.ise.ncsu.edu/kay/mhetax/index.htm.}
^{3. Tiziano Flati, Daniele Vannella, Tommaso Pasini, Roberto Navigli. Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project. Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, Maryland, USA June 22-27, 2014.}
^{4. The Google product taxonomy. http://www.google.com/basepages/producttype/taxonomy.en-US.txt}
^{5. TAXONOMY OF FIELDS AND THEIR SUBFIELDS, http://sites.nationalacademies.org/PGA/Resdoc/PGA_044522}
^{6. Fellbaum, Christiane (2005). WordNet and wordnets. In: Brown, Keith et al. (eds.), Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 665-670}
^{7. P. Velardi, S. Faralli, R. Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press, 2013, pp. 665-707.}

SemEval-2015 Task 17

Evaluation

Evaluation Measures

Structural Measures:

Comparison against gold standard:

Manual quality assessment of novel edges

Gold Standard

Comparative Evaluation

Detailed Evaluation

Domain: chemical

Domain: equipment

Domain: food

Domain: science

Domain: wn_chemical

Domain: wn_equipment

Domain: wn_food

Domain: wn_science

References

Contact Info

Organizers

Other Info

Announcements