Due to the extended use of Web forums, such as Qatar Living,
Yahoo! Answers or Stackoverflow,
there has been a renewed interest in Community Question Answering (cQA).
cQA combines traditional question answering with a modern Web scenario, where users pose
questions hoping to get the right answers from other users. If a user posts a new question which is similar (even semantically equivalent) to a previously posted question,she should not wait for answers or for another user to address her to the archived forum thread.
An automatic system can search for previously-posted relevant questions,
providing light to the current information requirement instantaneously.
In this challenge, given a new user question and a set of —previously posted— forum questions, together with their corresponding answer threads, a machine learning model must rank the forum questions according to their relevance against the new question.
Even if this task involves both natural language processing (NLP) and information retrieval (IR), the goal of the challenge is to focus on the machine learning aspect. Therefore, we take care of NLP and IR and provide the participants with features derived from the text in the original and forum questions, as well as the similarity matrices built by applying kernel functions to their parse trees. A few other features express the relevance of the thread comments, associated to the forum question, against the original question.
Participants are expected to exploit these data in order to build a machine learning model to predict the best possible ranking of forum questions given a new one: the most relevant questions in the thread must appear on top of the ranking.
Each new question u has associated a set of questions qi (typically 10). Each qi is labeled as Perfect Match, Relevant, or Irrelevant with respect to u. In the following example, q1 is Relevant, q2 is Irrelevant, and q3 is a Perfect Match.
Which is a good bank as per your experience in Doha
|q1||Best Credit Card in Doha
I would like to apply for a credit card that gives me Points when i spend and not just a simple card that does not get me anything in return for using it... Which card would you recommend gives the most reward for using the card and which card should i stay away from?? (PS I understand that i would either need to shift my salary to the bank or put a deposit of some sort...) thanks!!
|q2|| PERSONAL LOAN AND WORK TERMINATION
I am currently working here in Qatar; our employer informed us that they will transfer to contractor with less salary; but we have option not to accept; but the problem is I have Personal Loan in one of the BAnk here in DOHA; what if I will not accept the offer; what will happen to me ; I mean did the BAnk will force to pay or not?
|q3|| Best bank in Qatar?
Greetings everybody. I will like to see if someone can help me; I want to know which is the best bank in Qatar for opening a personal bank account. Best regards. Have a nice weekend everyone.
In order to focus on the machine learning aspects of the competition, we provide a set of state-of-the-art features for the task as well as similarity matrices between the parse trees of the questions.
A total of 64 features are provided, divided in three sub-groups:
We provide some pre-computed kernel matrices that contain the tree kernel similarities between the syntactic parse trees of the questions (multiple combinations between u and q are possible, see this link for details). Given two pairs of new and forum questions pi=(ui,qi) and pj=(uj,qj), we provide four different matrices, which store all the possible Tree Kernel (TK) computations, i.e., TK(ui,uj), TK(qi,qj), TK(ui,qj), and TK(qi,uj). Furthermore, we provide a Java program to combine such values into customizable tree kernel combinations.
The baseline system consists of a combination of vectorial features and tree kernels.
Once you have registered and downloaded the data, you can create the kernel matrix used in the baselineby running the following command (on a single line):
java -jar TreeKernelCombinationBuilder.jar trainDevUvsU.txt trainDevQvsQ.txt
trainDevUvsQ.txt trainDevQvsU.txt outputKernelMatrix
The jar file and its source code are provided together with the corpus. This program can be used to create more kernel matrices. The source code of the program is provided for further combinations/modifications: refer to file
and modify line 138.
The baseline is a binary Perfect Match+Relevant against Irrelevant SVM classifier obtained by combining a linear kernel on the features and the tree kernels computed with the Java program above.
The submission format consists of a three columns tab-separated file:
<example_id>TAB<relevance_score>TAB<predicted_label> where the first column is the id of the example, the second one is the relevance score produced by the system, and the third column is the predicted class. Given the above example, if the developed system ranked the related questions qi as follows:
|Best Performing system on the test set:||1000 EUR*|
|Best Performing system on the development set:||500 EUR*+|
|May 16th||Release of the training and development sets.|
|May 16th||Opening of the online oracle for submissions on the development set.|
|July 22nd, 12:00:00||Registration deadline.
End of submission period on the development set.
|July 23rd||Release of the test set.|
|July 24th, 23:59:59||Deadline for submission of the system description draft version.|
|July 30th, 12:00:00||End of submission period on the test set.|
|July 30th||Winners preliminary announcement|
|Aug 7th||Deadline for submission of the system description camera-ready version.|
|September 23rd, 11:40||Workshop on the Challenge at ECML|