Task 10: Detecting Minimal Semantic Units and their Meanings

The DiMSUM shared task is concerned with predicting, given an English sentence, a broad-coverage representation of lexical semantics. The representation consists of two closely connected facets: a segmentation into minimal semantic units, and a labeling of some of those units with semantic classes known as supersenses.

For example, given the POS-tagged sentence

IPRP googledVBD restaurantsNNS inIN theDT areaNN andCC FujiNNP SushiNNP cameVBD upRB andCC reviewsNNS wereVBD greatJJ soRB IPRP madeVBD aDT carryVB outRP orderNN

the goal is to predict the representation

I googledcommunication restaurantsGROUP in the areaLOCATION and Fuji_SushiGROUP came_upcommunication and reviewsCOMMUNICATION werestative great so I made_ a carry_outpossession _ordercommunication

where lowercase labels are verb supersenses, UPPERCASE labels are noun supersenses, and _ joins tokens within a multiword expression. (carry_outpossession and made_ordercommunication are separate MWEs.)

Systems are expected to produce the both facets of the representation, though the manner in which they do this (e.g., pipeline vs. joint model) is up to you.

Gold standard training data labeled with the combined representation will be provided in two domains: online reviews and tweets. Blind test data will be in these two domains as well as a third, surprise domain.

For further details, see the task website: http://dimsum16.github.io/

Contact Info


  • Nathan Schneider, University of Edinburgh
  • Marine Carpuat, University of Maryland
  • Anders Johannsen, University of Copenhagen
  • Dirk Hovy, University of Copenhagen

website : http://dimsum16.github.io

email : nschneid -AT- inf.ed.ac.uk

group : https://groups.google.com/group/dimsum16

Other Info