Task 10: Detecting Minimal Semantic Units and their Meanings
The DiMSUM shared task is concerned with predicting, given an English sentence, a broad-coverage representation of lexical semantics. The representation consists of two closely connected facets: a segmentation into minimal semantic units, and a labeling of some of those units with semantic classes known as supersenses.
For example, given the POS-tagged sentence
I
PRP
googledVBD
restaurantsNNS
inIN
theDT
areaNN
andCC
FujiNNP
SushiNNP
cameVBD
upRB
andCC
reviewsNNS
wereVBD
greatJJ
soRB
IPRP
madeVBD
aDT
carryVB
outRP
orderNN
the goal is to predict the representation
I googled
communication
restaurantsGROUP
in the areaLOCATION
and Fuji_
SushiGROUP
came_
upcommunication
and reviewsCOMMUNICATION
werestative
great so I made_
a carry_
outpossession
_
ordercommunication
where lowercase
labels are verb supersenses, UPPERCASE
labels are noun supersenses, and _
joins tokens within a multiword expression. (carry_
outpossession
and made_
ordercommunication
are separate MWEs.)
Systems are expected to produce the both facets of the representation, though the manner in which they do this (e.g., pipeline vs. joint model) is up to you.
Gold standard training data labeled with the combined representation will be provided in two domains: online reviews and tweets. Blind test data will be in these two domains as well as a third, surprise domain.
For further details, see the task website: http://dimsum16.github.io/