Task Description

The clinical narrative is abundant in mentions of clinical conditions, anatomical sites, medications, and procedures, which is in stark contrast with the newswire domain where text is dominated by mentions of countries, locations and people. Many surface forms are representations of the same concept. Unlike the general domain, in biomedicine there are rich lexical and ontological resources that can be leveraged when building applications. The Unified Medical Language System (UMLS) represents over 130 lexicons/thesauri with terms from a variety of languages. The UMLS Metathesaurus integrates resources used world-wide in clinical care, public health, and epidemiology, including SNOMED-CT, ICD9, and RxNORM. In addition, the UMLS also provides a semantic network in which every concept in the Metathesaurus is represented by its Concept Unique Identifier (CUI) and is semantically typed (Bodenreider and McCray, 2003).

Because the recognition and normalization of named entity mentions is a fundamental task, it will be the focus of this shared task which comprises two parts.

Task A

This includes the recognition of mentions of concepts that belong to the UMLS semantic group disorders

Here are a few examples—more are provided in the annotation guidelines and in the page on Datasets.

  1. The rhythm appears to be atrial fibrillation.
  2. The left atrium is moderately dilated.
  3. 53 year old man s/p fall from ladder.

In examples 1. and 3., the phrases atrial fibrillation and fall from ladder fall in the disorder semantic group in the UMLS.  Example 2. is a case of discontigous mentions represented by left atrium...dialated. This phenomena where a discontiguous phrase is the best representative of the disorder occurs more commonly in the clinical domain than in the general domain, and therefore is annotated as such.

Task B

This task involves the mapping of each disorder mention to a unique UMLS CUI.  This is referred to as the task of normalization and the mapping is limited to UMLS CUIs of SNOMED codes.

The disorder entities in the Considering examples above map to the following CUIs:

  1. atrial fibrillation - C0004238; UMLS preferred term atrial fibrillation
  2. left atrium...dilated - C0344720; UMLS preferred term left atrial dilatation
  3. fall from ladder - C0337212; UMLS preferred term is accidental fall from ladder

Example 1. represents the easiest cases;  Example 2. represents instances of disorders as listed in the UMLS are best mapped using disjoint mentions; Example 3. is harder as one has to infer that the description is a synonym of the UMLS preferred term.

Participants are free to use any UMLS resources as well as other supplemental content such as WordNet, Wikipedia, etc. In addition to this, we will also make the rest of the MIMIC corpus of clinical notes (from which these notes were sampled for annotation) available as a larger corpus for exploring semi-supervised and unsupervised methods.


Bodenreider, O. and McCray, A. 2003. Exploring semantic groups through visual approaches. Journal of Biomedical Informatics, 36(2203): pp. 414-432.

Contact Info


  • Sameer S. Pradhan, Harvard University
  • Suresh Manandhar, University of York, UK
  • Wendy W. Chapman, University of Utah
  • Noemie Elhadad, Columbia University
  • Guergana K. Savova, Harvard University

  • Other Info