Evaluation


Evaluation Measures


Structural Measures:

  • |V|: number of dinstict vertices;
  • |E|: number of dinstict edges;
  • #c.c.: number of connected components;
  • cycles: YES = the taxonomy contains cycles, NO = the taxonomy is a Directed Acyclic Graph (DAG).
  • #intermediate nodes = |V| - |L| where L is the set of leaves

Comparison against gold standard:

  • # vertices in common: |{vertices in common with the gold standard taxonomy}|;
  • vertex coverage: |{vertices in common with the gold standard taxonomy}| / |{gold standard vertices}| ;
  • # edges in common: |{edges in common with the gold standard taxonomy}|;
  • edge coverage: |{edges in common with the gold standard taxonomy}| / |{gold standard edges}| ;
  • ratio of novel edges: ( |{taxonomy edges}| - |{edges in common with the gold standard taxonomy}| ) / |{gold standard edges}|;
  • P = | {edges in common with the gold standard taxonomy} | / |{system edges}|
  • R = | {edges in common with the gold standard taxonomy} | / |{gold standard edges}|
  • F = 2(P*R)/(P+R)
  • Cumulative Fowlkes&Mallows Measure: cumulative measure of similarity7 .

    Manual quality assessment of novel edges

  • correct ISA = ISA AND domain specific AND not over-generic
  • P = |correct ISA| / |sample|

  • Gold Standard

    The gold standard taxonomies (.taxo) are tab-separated fields:
    relation_id <TAB> term <TAB> hypernym
    where:
    - relation_id: is a relation identifier;
    - term: is a term of the taxonomy;
    - hypernym: is a hypernym for the term.
    e.g
    0<TAB>cat<TAB>animal
    1<TAB>dog<TAB>animal
    2<TAB>car<TAB>animal
    ....


    Comparative Evaluation


      INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR
    Cycles 3 4 2 1 3 4
    Cumulative Fowlkes&Mallows Measure 2 1 6 3 4 5
    Intermediate nodes 2 5 3 6 4 1
    "Gold Standard evaluation (F-score ranking)" 2 1 4 5 6 3
    No of domains submitted 1 3 1 2 1 1
    "Manual evaluation (Precision ranking)" 2 1 4 5 6 3
    Final Ranking 1 2 4 5 6 3

    vertex coverage

      chemical wn_chemical equipment wn_equipment food wn_food science wn_science
    INRIASAC 0.7037 0.9829 0.8300 0.9789 0.8425 0.9730 0.9159 0.8531
    LT3 n.a. n.a. 0.4248 0.9726 0.4389 0.9899 0.6327 0.8624
    ntnu 0.0333 0.5965 0.1046 0.5242 0.1356 0.4973 0.2300 0.6480
    QASSIT n.a. 0.9985 0.9918 1.0000 0.8695 1.0000 0.9977 0.8624
    TALN-UPF 1.0000 0.9970 1.0000 1.0000 0.8695 1.0000 0.9977 0.8624
    USAAR-WLV 0.7838 0.8675 0.5490 0.7431 0.6092 0.8068 0.7831 0.7132
      Avg chemical Avg equipment Avg food Avg science Avg
    INRIASAC 0.8433 0.90445 0.90775 0.8845 0.885
    LT3 n.a 0.6987 0.7144 0.74755 0.7202
    ntnu 0.3149 0.314385 0.31645 0.439 0.34618375
    QASSIT n.a 0.9959 0.93475 0.93005 0.9609
    TALN-UPF 0.9985 1.0000 0.93475 0.93005 0.965825
    USAAR-WLV 0.82565 0.64605 0.708 0.74815 0.7319625

    edge coverage

      chemical wn_chemical equipment wn_equipment food wn_food science wn_science
    INRIASAC 0.0969 0.4657 0.4959 0.3793 0.5179 0.4735 0.4494 0.5442
    LT3 n.a. n.a. 0.3219 0.9484 0.2974 0.9719 0.3806 0.8639
    ntnu 0.0013 0.5594 0.0065 0.4597 0.0541 0.4664 0.0451 0.6122
    QASSIT n.a. 0.0843 0.2455 0.1979 0.0655 0.0593 0.2559 0.2902
    TALN-UPF 0.0004 0.0930 0.1577 0.0453 0.0359 0.0782 0.0172 0.1111
    USAAR-WLV 0.0977 0.3835 0.3691 0.3072 0.2696 0.3581 0.3720 0.3537
      Avg chemical Avg equipment Avg food Avg science Avg
    INRIASAC 0.2813 0.4376 0.4957 0.4968 0.42785
    LT3 n.a 0.63515 0.63465 0.62225 0.6306
    ntnu 0.28035 0.2331 0.26025 0.32865 0.2755875
    QASSIT n.a 0.2217 0.0624 0.27305 0.1712
    TALN-UPF 0.0467 0.1015 0.05705 0.06415 0.06735
    USAAR-WLV 0.2406 0.33815 0.31385 0.36285 0.3138625

    ratio of novel edges

      chemical wn_chemical equipment wn_equipment food wn_food science wn_science
    INRIASAC 1.0491 2.8586 1.4032 2.4432 2.2312 2.2909 2.0537 1.9546
    LT3 n.a. n.a. 0.1365 2.0453 0.7309 3.5375 0.5677 2.7029
    ntnu 0.0616 0.7779 0.3951 2.2886 0.7189 1.3339 0.7849 0.9319
    QASSIT n.a. 0.9105 0.7528 0.8123 0.9174 0.9445 0.8430 0.6984
    TALN-UPF 0.7089 0.9531 0.9235 6.9030 0.9527 0.9315 3.4731 0.7800
    USAAR-WLV 1.1268 1.8566 0.5219 0.8206 1.4265 1.9021 1.6752 1.6689
      Avg chemical Avg equipment Avg food Avg science Avg
    INRIASAC 1.95385 1.9232 2.26105 2.00415 2.0355625
    LT3 n.a 1.0909 2.1342 1.6353 1.6201
    ntnu 0.41975 1.34185 1.0264 0.8584 0.9116
    QASSIT n.a 0.78255 0.93095 0.7707 0.8398
    TALN-UPF 0.831 3.91325 0.9421 2.12655 1.953225
    USAAR-WLV 1.4917 0.67125 1.6643 1.67205 1.374825

     Average Precision, Recall, and F-measure against gold standard

      INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    Avg. P 0.1721 0.3612 0.1754 0.1563 0.0720 0.2014
    Avg. R 0.4279 0.6307 0.2756 0.1588 0.1165 0.3139
    Avg. F 0.2427 0.3886 0.2075 0.1575 0.0798 0.2377

    Cumulative Fowlkes&Mallows Measure

      chemical wn_chemical equipment wn_equipment food wn_food science wn_science
    INRIASAC 0.2353 0.0084 0.4905 0.0700 0.4522 0.4804 0.4706 0.4153
    LT3 n.a n.a 0.1137 0.6892 0.2163 0.5899 0.3303 0.5391
    ntnu 0.0009 0.0719 0.0000 0.0935 0.0076 0.2673 0.0088 0.0158
    QASSIT n.a 0.3947 0.4881 0.3637 0.3405 0.3153 0.5232 0.2921
    TALN-UPF 0.2225 0.2787 0.4482 0.0901 0.3267 0.3091 0.2202 0.2126
    USAAR-WLV 0.00001 0.2103 0.0000 0.0015 0.0037 0.0036 0.2249 0.1721
      Avg chemical Avg equipment Avg food Avg science Avg
    INRIASAC 0.12185 0.28025 0.4663 0.44295 0.3278375
    LT3 n.a 0.40145 0.4031 0.4347 0.4130
    ntnu 0.0364 0.04675 0.13745 0.0123 0.058225
    QASSIT n.a. 0.4259 0.3279 0.40765 0.3882
    TALN-UPF 0.2506 0.26915 0.3179 0.2164 0.2635125
    USAAR-WLV 0.105155 0.00075 0.00365 0.1985 0.07701375

    Precision of novel edges

      equipment food science wn_equipment wn_food wn_science Avg. prec.
    INRIASAC 59 37 51 63 37 41 48.0
    LT3 94 58 69 53 44 40 59.6
    ntnu 40 32 23 27 41 49 35.3
    QASSIT 44 1 38 21 2 42 24.7
    TALN-UPF 14 2 13 12 11 9 10.2
    USAAR 80 34 34 45 25 34 42.0

     

     


    Detailed Evaluation


    Domain: chemical

    Gold Standard download
    The gold standard is an excerpt of the ChEBI1 chemical ontology.


    Structural measures
     

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 17584 12432 n.a 1114 n.a. 17584 13785
    |E| 24817 28444 n.a 1563 n.a. 17606 30392
    # c.c. 1 293 n.a 116 n.a. 1 302
    cycles NO YES n.a NO n.a. NO YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 12374 n.a 586 n.a. 17584 13784
    vertex coverage 0.7037 n.a 0.0333 n.a. 1.0 0.7838
    # edges in common 2407 n.a 34 n.a. 11 2427
    edge coverage 0.0969 n.a 0.0013 n.a. 0.0004 0.0977
    ratio of novel edges 1.0491 n.a 0.0616 n.a. 0.7089 1.1268

     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.0846 n.a 0.0217 n.a 0.0006 0.0798
    R 0.0969 n.a 0.0013 n.a 0.0004 0.0977
    F 0.0903 n.a 0.0025 n.a 0.0005 0.0879

     

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.2353
    LT3 n.a.
    ntnu 0.0009
    QASSIT n.a.
    TALN-UPF 0.2225
    USAAR-WLV 0.00001
     
     
     

    Domain: equipment

    Gold Standard download
    The gold standard is an excerpt of the Material Handling Equipment2 combined with IS-A relations from WiBi3

    Structural measures
     

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 612 520 260 251 610 612 337
    |E| 615 1168 282 247 614 665 548
    # c.c. 1 6 10 35 1 1 28
    cycles NO NO YES NO NO YES YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 508 260 64 607 612 336
    vertex coverage 0.8300 0.4248 0.10457 0.9918 1.0 0.5490
    # edges in common 305 198 4 151 97 227
    edge coverage 0.4959 0.3219 0.0065 0.2455 0.1577 0.3691
    ratio of novel edges 1.4032 0.1365 0.3951 0.7528 0.9235 0.5219

    precision, recall and F-measure

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.2611 0.7021 0.0161 0.2459 0.1458 0.4142
    R 0.4959 0.3219 0.0065 0.2455 0.1577 0.3691
    F 0.3421 0.4414 0.0092 0.2457 0.1515 0.3903


     

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.4905
    LT3 0.1137
    ntnu 0
    QASSIT 0.4881
    TALN-UPF 0.4482
    USAAR-WLV 0.0018
     
     

    Domain: food

    Gold Standard download
    The gold standard is an excerpt of the The Google product taxonomy4 combined with IS-A relations from WiBi3

    Structural measures
     

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 1156 1518 819 834 1550 1549 1118
    |E| 1587 4363 1632 1227 1560 1569 2692
    # c.c. 1 2 6 27 1 1 23
    cycles NO YES YES YES YES NO YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 1311 683 211 1353 1353 948
    vertex coverage 0.8425 0.4389 0.1356 0.8695 0.8695 0.6092
    # edges in common 822 472 86 104 57 428
    edge coverage 0.5179 0.2974 0.0541 0.0655 0.0359 0.2696
    ratio of novel edges 2.2312 0.7309 0.7189 0.9174 0.9527 1.4265

    precision, recall and F-MEasure

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.1884 0.2892 0.0700 0.0666 0.0363 0.1589
    R 0.5179 0.2974 0.0541 0.0655 0.0359 0.2696
    F 0.2763 0.2932 0.0611 0.0660 0.0361 0.2000

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.4522
    LT3 0.2163
    ntnu 0.0076
    QASSIT 0.3405
    TALN-UPF 0.3267
    USAAR-WLV 0.0037
     
     

    Domain: science

    Gold Standard download
    The gold standard is an excerpt of the The TAXONOMY OF FIELDS AND THEIR SUBFIELDS5 combined with IS-A relations from WiBi3

    Structural measures
     

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 452 417 287 338 453 1280 355
    |E| 465 1164 441 386 511 1623 952
    # c.c. 1 3 8 23 1 1 14
    cycles NO NO YES NO NO YES YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 414 286 104 451 451 354
    vertex coverage 0.9159 0.6327 0.2300 0.9977 0.9977 0.7831
    # edges in common 209 177 21 104 119 173
    edge coverage 0.4494 0.3806 0.0451 0.2559 0.0172 0.3720
    ratio of novel edges 2.0537 0.5677 0.7849 0.8430 3.4731 1.6752

    precision, recall and F-measure

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.1795 0.4013 0.0544 0.2035 0.0733 0.1817
    R 0.4494 0.3806 0.0451 0.2236 0.2559 0.3720
    F 0.2565 0.3907 0.0493 0.2131 0.1139 0.2441

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.4706
    LT3 0.3303
    ntnu 0.0088
    QASSIT 0.5232
    TALN-UPF 0.2202
    USAAR-WLV 0.2249
     
     

    Domain: wn_chemical

    Gold Standard download
    The gold standard relations were extracted from the Wordnet6 taxonomy under the node "chemical".
    Structural measures
     

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 1351 1913 n.a 1475 1351 1347 1173
    |E| 1387 4611 n.a 1855 1380 1451 3107
    # c.c. 1 2 n.a 28 1 1 31
    cycles NO YES n.a YES NO YES YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 1328 n.a 806 1349 1347 1172
    vertex coverage 0.9829 n.a 0.5965 0.9985 0.9970 0.8675
    # edges in common 646 n.a 776 117 129 532
    edge coverage 0.4657 n.a 0.5594 0.0843 0.0930 0.3835
    ratio of novel edges 2.8586 n.a 0.7779 0.9105 0.9531 1.8566

    precision, recall and F-measure

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.1400 n.a. 0.4183 0.0847 0.0889 0.1712
    R 0.4657 n.a. 0.5594 0.0843 0.0930 0.3835
    F 0.2154 n.a. 0.4787 0.0845 0.0909 0.2367

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.0084
    LT3 n.a.
    ntnu 0.0719
    QASSIT 0.3947
    TALN-UPF 0.2787
    USAAR-WLV 0.2103
     
     

    Domain: wn_equipment

    Structural measures
    Gold Standard download
    The gold standard relations were extracted from the Wordnet6 taxonomy under the node "equipment".

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 475 468 462 1081 476 2574 354
    |E| 485 1369 1452 1333 490 3370 547
    # c.c. 1 1 1 12 1 1 43
    cycles NO YES YES YES NO YES YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 465 462 249 475 475 353
    vertex coverage 0.9789 0.9726 0.5242 1.0 1.0 0.7431
    # edges in common 184 460 223 96 97 149
    edge coverage 0.3793 0.9484 0.4597 0.1979 0.0453 0.3072
    ratio of novel edges 2.4432 2.0453 2.2886 0.8123 6.9030 0.8206

    precision, recall and F-measure

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.1344 0.3168 0.1672 0.1959 0.0287 0.2723
    R 0.3793 0.9484 0.4597 0.1979 0.2000 0.3072
    F 0.1984 0.4749 0.2453 0.1969 0.0503 0.2887

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.0700
    LT3 0.6892
    ntnu 0.0935
    QASSIT 0.3637
    TALN-UPF 0.0901
    USAAR-WLV 0.0015
     
     

    Domain: wn_food

    Gold Standard download
    The gold standard relations were extracted from the Wordnet6 taxonomy under the node "food"
    Structural measures
     

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 1486 1458 1471 1834 1478 1486 1200
    |E| 1533 4238 6913 2760 1539 1548 3465
    # c.c. 1 2 1 35 1 1 23
    cycles NO NO YES YES NO YES YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 1446 1471 739 1486 1486 1199
    vertex coverage 0.9730 0.9899 0.4973 1.0 1.0 0.8068
    # edges in common 726 1490 715 91 120 549
    edge coverage 0.4735 0.9719 0.4664 0.0593 0.0782 0.3581
    ratio of novel edges 2.2909 3.5375 1.3339 0.9445 0.9315 1.9021

    precision, recall adn F-measure

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.1713 0.2155 0.2590 0.0591 0.0775 0.1584
    R 0.4735 0.9719 0.4664 0.0593 0.0782 0.3581
    F 0.2516 0.3528 0.3331 0.0592 0.0778 0.2196

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.4804
    LT3 0.5899
    ntnu 0.2673
    QASSIT 0.3153
    TALN-UPF 0.3091
    USAAR-WLV 0.0036
     
     

    Domain: wn_science

    Gold Standard download
    The gold standard relations were extracted from the Wordnet6 taxonomy under the node "science"
    Structural measures
     

    Measure gold standard INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    |V| 429 366 370 524 371 370 307
    |E| 441 1102 1573 681 436 393 892
    # c.c. 1 1 1 11 1 1 8
    cycles NO YES YES NO NO NO YES

    Comparison against gold standard
     

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    # vertices in common 366 370 278 370 370 306
    vertex coverage 0.8531 0.8624 0.6480 0.8624 0.8624 0.7132
    # edges in common 240 381 270 104 49 156
    edge coverage 0.5442 0.8639 0.6122 0.2902 0.1111 0.3537
    ratio of novel edges 1.9546 2.7029 0.9319 0.6984 0.7800 1.6689

    precision, recall and F-measure

    Measure INRIASAC LT3 ntnu QASSIT TALN-UPF USAAR-WLV
    P 0.2177 0.2422 0.3964 0.2385 0.1246 0.1748
    R 0.5442 0.8639 0.6122 0.2358 0.1111 0.3537
    F 0.3110 0.3783 0.4812 0.2371 0.1175 0.2340

      Cumulative Fowlkes&Mallows Measure:
    INRIASAC 0.4153
    LT3 0.5391
    ntnu 0.0158
    QASSIT 0.2921
    TALN-UPF 0.2126
    USAAR-WLV 0.1721
     
     

    References

    1. Degtyarenko Kirill, de Matos Paula, Ennis Marcus, Hastings Janna, Zbinden Martin, McNaught Alan, Alcantara Rafael, Darsow Michael, Guedj Mickael and Ashburner Michael ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research, 36:suppl 1, D344-D350, 2008.
    2. Material Handling Equipment taxonomy, http://www.ise.ncsu.edu/kay/mhetax/index.htm.
    3. Tiziano Flati, Daniele Vannella, Tommaso Pasini, Roberto Navigli. Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project. Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, Maryland, USA June 22-27, 2014.
    4. The Google product taxonomy. http://www.google.com/basepages/producttype/taxonomy.en-US.txt
    5. TAXONOMY OF FIELDS AND THEIR SUBFIELDS, http://sites.nationalacademies.org/PGA/Resdoc/PGA_044522
    6. Fellbaum, Christiane (2005). WordNet and wordnets. In: Brown, Keith et al. (eds.), Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 665-670
    7. P. Velardi, S. Faralli, R. Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press, 2013, pp. 665-707.

    Contact Info

    Organizers

    • Dr. Paul Buitelaar - Insight, Centre for Data Analytics, National University of Ireland, Galway
    • Dr. Georgeta Bordea - Insight, Centre for Data Analytics, National University of Ireland, Galway
    • Prof. Roberto Navigli - Linguistic Computing Laboratory Dept. of Computer Science Sapienza University of Rome, Italy
    • Stefano Faralli - Linguistic Computing Laboratory Dept. of Computer Science Sapienza University of Rome, Italy

    email :
    • Paul Buitelaar: paul[dot]buitelaar[at]insight-centre[dot]org
    • Georgeta Bordea: georgeta[dot]bordea[at]insight-centre[dot]org
    • Roberto Navigli: navigli[at]di[dot]uniroma1[dot]it
    • Stefano Faralli: faralli[at]di[dot]uniroma1[dot]it

    Other Info

    Announcements

    • Terminologie released on December 06, 2014
    • Target Domains announced on November 06, 2014
    • Important Dates updated
    • Trial data and tools released on May 30, 2014