00: What unlabeled data should we use to inform unsupervised methods?

Participants are allowed to use any external resources, for example Biomedical terminology dictionaries, Wikipedia, WordNet, etc. However if you use the MIMIC II corpus for informing your unsupervised methods, then you should ONLY use the subset that we are distributing through the SemEval-2015 project within the PhysioNet website.

To be more specific in regards to the MIMIC II corpus:
  • do NOT download the MIMIC II corpus that is available separately on the PhysioNetWorks website, and
  • Please do NOT use the test documents -- once provided -- to inform the unsupervised methods.

02: What does it mean for a disorder to be "CUI-less"?

This means that the disorder does not have a corresponding CUI in the SNOMED-CT portion of the UMLS dictionary. What that further means is that none of the SNOMED-CT descriptions in the UMLS version 2012AB are synonymous with the string in the clinical note which identifies that disorder.

03: How many runs can each participant submit?

We will allow at most three runs from each participant. This would potentially allow participants to submit systems that are purely supervised, or use supervised methods, etc. As long as the system description paper has enough information on each of the variation, we will plan to provide a cross-system, cross-strategy analysis in the overview paper.

04: Why are some disorder strings marked as CUI-less even if there is a UMLS CUI available in the dictionary?

We define a disorder mention as any span of text that can be mapped to a concept in the SNOMED-CT terminology, which belongs to the following UMLS semantic types:
  • Congenital Abnormality
  • Acquired Abnormality
  • Injury or Poisoning
  • Pathologic Function
  • Disease or Syndrome
  • Mental or Behavioral Dysfunction
  • Cell or Molecular Dysfunction
  • Experimental Model of Disease
  • Anatomical Abnormality
  • Neoplastic Process
  • Signs and Symptoms
The notion of what a disorder is can sometimes be broader than what is allowed by this set of semantic types. The annotators were asked to be careful not to allow their prejudice about what they considered a disorder get in the way of annotating disorder mentions.

However, during the normalization step there will be some mentions that are not mapped to a CUI because they do not belong to the above semantic types. For example,

Screen mammogram showed increasing calcifications in right breast.

The disorder calcification is a CUI-less disorder, since it does not belong to any of the above semantic types (it belongs to the Findings semantic type). It is still a disorder according to the definition and therefore is spanned.

Same for the following example:

The patient was admitted with low blood pressure

The span low blood pressure is a Finding in UMLS. In this case, however, because it does indeed describe a disorder, it is annotated and normalized as CUI-less.

There is another category of disorder mentions as in the following example:

Patient diagnosed with monkey disease.

Here monkey disease is not in the UMLS SNOMED-CT, however still maps to our definition of Disorders, therefore it is a CUI-less Disorder mention.

