Task 10: Detecting Minimal Semantic Units and their Meanings
The DiMSUM shared task is concerned with predicting, given an English sentence, a broad-coverage representation of lexical semantics. The representation consists of two closely connected facets: a segmentation into minimal semantic units, and a labeling of some of those units with semantic classes known as supersenses.
For example, given the POS-tagged sentence
I
PRPgoogledVBDrestaurantsNNSinINtheDTareaNNandCCFujiNNPSushiNNPcameVBDupRBandCCreviewsNNSwereVBDgreatJJsoRBIPRPmadeVBDaDTcarryVBoutRPorderNN
the goal is to predict the representation
I googled
communicationrestaurantsGROUPin the areaLOCATIONand Fuji_SushiGROUPcame_upcommunicationand reviewsCOMMUNICATIONwerestativegreat so I made_a carry_outpossession_ordercommunication
where lowercase labels are verb supersenses, UPPERCASE labels are noun supersenses, and _ joins tokens within a multiword expression. (carry_outpossession and made_ordercommunication are separate MWEs.)
Systems are expected to produce the both facets of the representation, though the manner in which they do this (e.g., pipeline vs. joint model) is up to you.
Gold standard training data labeled with the combined representation will be provided in two domains: online reviews and tweets. Blind test data will be in these two domains as well as a third, surprise domain.
For further details, see the task website: http://dimsum16.github.io/