Detailed Information

 

Detailed Data Description

We offer the task in two languages, English and Arabic, with some differences in the type of data provided. For English, we will have a question (short title + extended description) and a list of several community answers to that question. For Arabic, we will have a question and a set of possible answers, which will include (i) a highly accurate answer, (ii) potentially useful answers from other questions, and (iii) answers to random questions. Both datasets can be freely downloaded from the Data and Tools page. See below for a more detailed description of the format, sources and annotation of these datasets.

 

English Data (CQA-QL corpus)

The source of the CQA-QL corpus is the Qatar Living Forum data (http://www.qatarliving.com/forum). A sample of questions and answer threads was automatically selected and posteriorly manually filtered and annotated with the categories defined in the task. We are providing a split into three files: training, development and test.

The datasets are XML-formated and the text encoding is UTF-8. A dataset file is a sequence of examples (Questions):

          <root>
            <Question> ... <\Question>
            <Question> ... <\Question>
            ...
            <Question> ... <\Question>
          </root>

Each Question tag has a list of attributes, as in the following example:

<Question QID="Q1" QCATEGORY="Pets and Animals" QDATE="2009-03-07 19:24:00" QUSERID="U1" QTYPE="YES_NO" QGOLD_YN="Yes">

- QID: internal question identifier
- QCATEGORY: the question category, according to the Qatar Living taxonomy 
- QDATE: date of posting
- QUSERID: internal identifier for the user who posted the question; consistent across questions
- QTYPE: type of question, can be GENERAL or YES_NO
- QGOLD_YN: overall Yes/No summary of the set of good answers for the concrete YES_NO question (or "Not Applicable" in the case of GENERAL questions); this value is a class label to be predicted at test time

The structure of a Question is the following:

          <Question ...>
            <QSubject> text </QSubject>
            <QBody> text </QBody>
               <Comment> ... </Comment>
               <Comment> ... </Comment>
               ...
               <Comment> ... </Comment>
          </Question>

The text between the <QSubject> and the </QSubject> tags is the short version of the question as provided by user QUSERID. The text between tags <QBody> and </QBody> is the long version of the question as provided by user QUSERID. What follows is a list of Comments, each corresponding to an answer (to the focus question) posted by a particular user.

Every Comment tag has some attributes, as in the following example:

          <Comment CID="Q1_C1" CUSERID="U4" CGOLD="Good" CGOLD_YN="No">

- CID: Internal identifier of the comment: the part before the "_" encodes the question number
- CUSERID: Internal identifier of the user posting the comment
- CGOLD: human assessment about whether the comment is "Good", "Bad", "Potential", "Dialogue", "non-English" or "other". This is a class label to be predicted at test time.
- CGOLD_YN: human assessment on whether the comment is answering positively ("Yes"), negatively ("No") or as unsure ("Unsure") to the question (or "Not Applicable" in the case of "GENERAL" questions); this label is only available at training time; at test time, participating systems are not required to produce CGOLD_YN but only QGOLD_YN.

Comments are structured as follows:

          <Comment ...>
            <CSubject> text </CSubject>
            <CBody> text </CBody>
          </Comment>

The text between the <CSubject> and the </CSubject> tags is the short version of the comment. The text between the <CBody> and the </QBody> tags is the long version of the comment.

 

Annotation of the CQA-QL corpus:

 

The manual annotation was a joint effort between the CSAIL-MIT and ALT-QCRI groups (see organizing team).

After a first internal labeling of the TRIAL dataset (50+50 questions) by several independent annotators, we defined the annotation procedure and prepared detailed annotation guidelines.

Amazon's Mechanical Turk was used to collect the human annotations for the large corpus. Nicole Schmidt (CSAIL-MIT) implemented the Mechanical Turk-based annotation. Several HITs were defined to produce all the required annotation: HIT 1) Select appropriate example questions and classify them as GENERAL vs. YES_NO; HIT 2) Annotate every comment in the general questions as "Good", "Bad", "Potential", "Dialogue", "non-English", or "Other". 3) Annotate the "YES_NO" questions with the same information at the comment level, plus a label ("Yes"/"No"/"Unsure") indicating whether the comment answers the question with a clear "Yes", a clear "No" or in an undefined way. In all HITs, we collected the annotation of several annotators for each decision (there were between 3 and 5 human annotators) and resolved discrepancies using majority voting. Ties lead to the elimination of some comments and even of complete examples.


The "Yes"/"No"/"Unsure" labels at the question level (QGOLD_YN) were assigned automatically based on the "Yes"/"No"/"Unsure" labels at the comment level. More concretely, a YES_NO question is labeled as "Unsure" except in the case in which there is a majority of "Yes" or "No" labels among the "Yes"/"No"/"Unsure" labels from the comments that are labeled as "Good". In that case, the majority label is assigned.

Some statistics about the datasets (training & development):

TRAINING:

  • QUESTIONS:
    • TOTAL: 2,600
    • GENERAL: 2,376 (91.38%)
    • YES_NO: 224 (8.62%)
  • COMMENTS:
    • TOTAL: 16,541
    • MIN: 1
    • MAX: 143
    • AVG: 6.36
  • CGOLD VALUES:
    • Good: 8,069 (48.78%)
    • Bad: 2,981 (18.02%)
    • Potential: 1,659 (10.03%)
    • Dialogue: 3,755 (22.70%)
    • Not English: 74 ( 0.45%)
    • Other: 3 ( 0.02%)
  • CGOLD_YN COMMENT VALUES (excluding "Not Applicable"):
    • yes: 346 (43.52%)
    • no: 236 (29.69%)
    • unsure: 213 (26.79%)
  • QGOLD_YN VALUES (excluding "Not Applicable"):
    • yes: 87 (38.84%)
    • no: 47 (20.98%)
    • unsure: 90 (40.18%)


DEVELOPMENT:

  • QUESTIONS:
    • TOTAL: 300
    • GENERAL: 266 (88.67%)
    • YES_NO:  34 (11.33%)
  • COMMENTS:
    • TOTAL: 1645
    • MIN: 1
    • MAX: 32
    • AVG: 5.48
  • CGOLD VALUES:
    • Good: 875 (53.19%)
    • Bad: 269 (16.35%)
    • Potential: 187 (11.37%)
    • Dialogue: 312 (18.97%)
    • Not English: 2 ( 0.12%)
    • Other: 0 ( 0.00%)
  • CGOLD_YN COMMENT VALUES (excluding "Not Applicable"):
    • yes: 62 (53.91%)
    • no: 32 (27.83%)
    • unsure: 21 (18.26%)
  • QGOLD_YN VALUES (excluding "Not Applicable"):
    • yes: 16 (47.06%)
    • no: 8 (23.53%)
    • unsure: 10 (29.41%)

 

Arabic Data


For Arabic, we used data from the Fatwa website (http://fatwa.islamweb.net/). Fatwa is a question about the Islamic religion. This website contains questions by ordinary users and answers by professional scholars in Islamic studies. The user question can be general, for example "How to pray?'", or it can be very personal, e.g., the user has a specific problem in his/her life and wants to find out how to deal with it according to Islam.


Each question (Fatwa) is answered carefully by a knowledgeable scholar. The answer is usually very descriptive: it contains and introduction to the topic of the question, then the general rules in Islam on the topic, and finally an actual answer to the specific question and/or guidance on how to deal with the problem. Typically, links to related questions are also provided to the user to read more about similar situations and to look at related questions.


In this task, a question form the website is provided with a set of five different answers. Each answer of the provided five ones carries one of the following labels:
- direct: This is a direct answer to the question
- related: The answer is not directly answering the question but contain information related to the topic of question
- irrelevant: An answer to another question not related to the topic

The participants would be provided with a set of 1300 training examples, and the task is to learn this labels in order to be able to apply it for similar unseen data by tagging answers to a given question as direct, related, or irrelevant. Along with the training examples, we provide a set of 200 development examples, so participants can test their approaches on it. Data of the Arabic task can be found here (alt.qcri.org/semeval2015/task3/index.php?id=data-and-tools).

The datasets are XML-formated and the text encoding is UTF-8.

A dataset file is a sequence of examples (Questions):

<root>
  <Question> ... <\Question>
  <Question> ... <\Question>
  ...
  <Question> ... <\Question>
</root>

Each Question tag has a list of attributes, as in the following example:

<Question QID = "20831" QCATEGORY = "فقه العبادات > الطهارة" QDATE = "2002-13-08">

- QID: internal question identifier
- QCATEGORY: the question category
- QDATE: date of posting

The structure of a Question is the following:

<Question ...>
  <QSubject> text </QSubject>
  <QBody> text </QBody>
    <Answer> ... </Answer>
    <Answer> ... </Answer>
    ...
    <Answer> ... </Answer>
</Question>

The text between the <QSubject> and the </QSubject> tags is the title of the question that is created by Islamweb.
The text between the <QBody> and teh </QBody> tags is the full question that is provided by the user.
What follows is a list of possible answers, which can be the original direct answer on Islamweb or related answers that are linked to that answer, or randomly selected answers that should have no relation to the topic of the question.

Each Answer tag has some attributes, as in the following example:

<Answer CID = "41673" CGOLD = "?">

- CID: the answer ID
- CGOLD: the classification of the answer (Hierarchical classification)

The text between the <Answer> and the </Answer> tags is the answer text. It can contain tags such as the following:
- NE: names entities in the text, usually person names
- Quran: Quran verse
- Hadeeth: A saying by the Islamic prophet

 

 

 

Contact Info

Organizers

  • Lluís Màrquez, Qatar Computing Research Institute
  • James Glass, CSAIL-MIT
  • Walid Magdy, Qatar Computing Research Institute
  • Alessandro Moschitti, Qatar Computing Research Institute
  • Preslav Nakov, Qatar Computing Research Institute
  • Bilal Randeree, Qatar Living

email : semeval-cqa@googlegroups.com

Other Info

Announcements

  • The official results are here
  • Download the test data here
  • Download the English and Arabic training data, format checkers, and scorers here
  • Download the English baseline systems here
  • Download raw text of questions and comments from Qatar Living (English) here
  • Join the Google group: semeval-cqa@googlegroups.com
  • Register to participate here