SAP-RI: A Constrained and Supervised Approach for Aspect-Based Sentiment Analysis

We describe the submission of the SAP Research & Innovation team to the Se-mEval 2014 Task 4: Aspect-Based Sentiment Analysis (ABSA). Our system follows a constrained and supervised approach for aspect term extraction, categorization and sentiment classiﬁcation of on-line reviews and the details are included in this paper.


Introduction
The increasing popularity of the internet as a source of information, and e-commerce as a way of life, has led to a major surge in the number of reviews that can be found online, for a wide range of products and services. Consequently, more and more consumers have taken to consulting these online reviews as part of their pre-purchase research before deciding on availing services from a local business or investing in a product from a particular brand. This calls for innovative techniques for the sentiment analysis of online reviews so as to generate accurate and relevant recommendations. Sentiment analysis has been extensively studied and applied in different domains. Predicting the sentiment polarity (positive, negative, neutral) of user opinions by mining user reviews (Hu and Liu, 2004;Liu, 2012;Pang and Lee, 2008;Liu, 2010) has been of high commercial and research interest. In these studies, sentiment analysis is often conducted at one of the three levels: document level, sentence level or attribute level.
Through the SemEval 2014 Task 4 on Aspect Based Sentiment Analysis (Pontiki et al., 2014), we explore sentiment analysis at the aspect level. * The work was done during an internship at SAP. This work is licenced under a Creative Commons Attribution 4.0 International License. Page numbers and proceedings footer are added by the organizers. License details: http: //creativecommons.org/licenses/by/4.0/ The task consists of four subtasks: in subtask 1 aspect term extraction, participants need to identify the aspect terms present in a sentence and return a list containing all distinct aspect terms, in subtask 2 aspect term polarity, participants were to determine the polarity of each aspect term in a sentence, in subtask 3 aspect category detection, participants had to identify the aspect categories discussed in a given sentence, and in subtask 4 aspect category polarity, participants were to determine the polarity of each aspect category. The polarity classification subtasks consider sentiment analysis to be a three-way classification problem between positive, negative and neutral sentiment. On the other hand, the aspect category detection subtask is a multi-label classification problem where one sentence can be labelled with more than one aspect category.
In this paper, we describe the submission of the SAP-RI team to the SemEval 2014 Task 4. We make use of supervised techniques to extract the aspects of interest (Jakob and Gurevych, 2010), categorize them (Lu et al., 2011) and predict the sentiment of customer online reviews on Laptops and Restaurants. We developed a constrained system for aspect-based sentiment analysis of these online reviews. The system is constrained in the sense that we only use the training data that was provided by the challenge organizers and no other external data sources. Our system performed reasonably well, especially with a F 1 score of 75.61% for the aspect category polarity subtask, 79.04% F 1 score on the aspect category detection task and 66.61% F 1 score on the aspect term extraction task.

Subtask 1: Aspect Term Extraction
Given a review with annotated entities in the training set, the task was to extract the aspect terms for reviews in the test set. For this subtask, training, development and testing were conducted for both the laptop and the restaurant domain.

Features
Each review was represented as a feature vector made up of the following features: • Word N-grams: all unigrams, bigrams and trigrams from the review text

Method
We approach the task by casting it as a sequence tagging task where each token in a candidate sentence is labelled as either Beginning, Inside or Outside (BIO). We then employ conditional random fields (CRF), which is a discriminative, probabilistic model for sequence data with state-of-theart performance (Lafferty et al., 2001). A linearchain CRF tries to estimate the conditional probability of a label sequence y given the observed features x, where each label y t is conditioned on the previous label y t−1 . In our case, we use BIO CoNLL-style tags (Sang and De Meulder, 2003). During development, we split the training data in the ratio of 60:20:20 as training, development (dev) and testing (dev-test). We train the CRF model on the training set of the data, perform feature selection based on the dev set, and test the resulting model on the dev-test. In all experiments, we use the CRF++ 1 implementation of conditional random fields with the parameter c=4.0. This value was chosen based on manual observation. We perform a feature ablation study and the results are reported in Table 1. Features listed in section 2.1 were those that were retained for the final run. 1 code.google.com/p/crfpp/

Subtask 2: Aspect Term Polarity Estimation
For this subtask, the training, development and testing was done using reviews on laptops and restaurants. Given the aspect terms in a sentence, the task was to predict their sentiment polarities.

Features
For each review, we used the following features: • Word N-grams: all lowercased unigrams, bigrams and trigrams from the review text • Polarity of neighbouring adjectives: extracted word sentiment from SentiWordNet lexicon (Baccianella et al., 2010) • Neighbouring POS tags: the POS tags of up to neighbouring 3 words • Parse dependencies and relations: parse dependency relations of the aspects, i.e., presence/absence of adjectives and adverbs in the dependency parse tree

Method
For each aspect term of a sentence, the aforementioned features were extracted. For example, for the term Sushi in the sentence Sushi was delicious., the following feature vector is constructed, {aspect: 'sushi', advmod:'null', amod:'delicious', uni sushi: 1, uni was: 1, uni delicious, uni the: 0, .. }. We then treat the aspect sentiment polarity estimation as a multi-class classification task where each instance would be labelled as either positive, negative or neutral. For the classification task, we experimented with Naive Bayes and Support Vector Machines (SVM) -both linear and RBF kernels -and it was observed that linear SVM performed best. Hence, we use linear SVM for the classification task. Table 2 summarizes the results obtained from our experiments for various feature combinations. The classifiers used are implementations from scikit-learn 2 , which is also used for the remaining tasks.

Subtask3: Aspect Category Detection
Given a review with annotated entities or aspect terms, the task was to predict the aspect categories.  As one sentence in a review could belong to multiple aspect categories, we model the task as a multi-label classification problem, i.e., given an instance, predict all labels that the instance fits to.

Features
We experimented with different features, for example unigrams, dependency tree relations, bigrams, POS tags and sentiment of the words (Sen-tiWordNet), but using just the unigrams alone happened to yield the best result. The feature vector was merely a bag-of-words vector indicating the presence or absence of a word in an instance.

Method
The training instances were divided into 5 sets based on the aspect categories and thereby, we treated the multi-label classification task as 5 different binary classification tasks. Hence, we used an ensemble of binary classifiers for the multilabel classification. An SVM model was trained using one classifier per class to distinguish it from all other classes. For the binary classification tasks, directly estimating a linear separating function (such as linear SVM) gave better results, as shown in Table 3. Finally, the results of the 5 binary classifiers were combined to label the test instance.
The category Miscellaneous was observed to have the lowest accuracy, probably due to the fact that miscellaneous captures all those aspects terms that do not have a clearly defined category.

Subtask4 Aspect Category Polarity Detection
For each review with pre-labelled aspect categories, the task was to produce a model which predicts the sentiment polarity of each aspect category.

Features
The training data contains reviews with the polarity for the corresponding aspect category. The models performed best on using just unigram and bigram features.

Method
The training instances were split into 5 sets based on the aspect categories. We make use of the sentiment polarity classifier, as described in section 3.2, thereby, training one sentiment polarity classifier for each aspect category. Table 4 indicates the performance of different classifiers for this task, using features as discussed in section 5.1. Table 5 gives an overview of the performance of our system in this year's task based on the official scores from the organizers. We see that our system performs relatively well for subtasks 1, 3 and 4, while for subtask 2 the F 1 scores are behind the best system by about 12%. As observed, a sentence could have more than one aspect and each of these aspects could have different polarities expressed. Including features that preserve the context of the aspect could probably improve the performance in the subtask 2. In most cases, a simple set of features was enough to result in a   Table 4: Training-phase experimental results (F 1 score) for Subtask 4.

Results
high F 1 score, for example, in subtask 3 a bag-ofwords feature set proved to yield a relatively high F 1 score. In general, for the classification tasks, we observe that the linear SVM performs best.

Conclusion
In this paper, we have described the submission of the SAP-RI team to the SemEval 2014 Task 4. We model the classification tasks using linear SVM and the term extraction task using CRF in order to develop an aspect-based sentiment analysis system that performs reasonably well.