iTac: Aspect Based Sentiment Analysis using Sentiment Trees and Dictionaries

This paper describes our approach for the fourth task of the SemEval 2014 challenge: Aspect Based Sentiment Analysis . Our sys-tem is designed to solve all four subtasks: (i) identifying aspect terms, (ii) determining the polarity of an aspect term, (iii) detecting aspect categories, and (iv) determining the polarity of a predeﬁned aspect category. Our system is based on the Stanford sentiment tree.


Introduction
Online reviewing, rating, and recommendation have become quite popular nowadays. Based on online reviews and rating, people may decide whether to buy a certain product or visit a certain place (restaurant, shop, etc.). Due to the increasing number of reviews, an automatic system is needed that can evaluate these reviews as positive, negative, or neutral.
In this paper, we propose a system for the fourth task of the SemEval 2014 challenge (Aspect Based Sentiment Analysis). The target is to identify aspects of given target entities and to determine the sentiment that is expressed towards each aspect in terms of a polarity. The problem has been divided into four different subtasks: (i) extracting aspects from a given sentence, (ii) determining the polarity of each aspect, (iii) matching suitable aspect categories and (iv) identifying the polarity of these categories.

Related Work
There are several different approaches to perform sentiment analysis on a given sentence. References Turney (2002) and Pang et al. (2002) started This work is licenced under a Creative Commons Attribution 4.0 International License. Page numbers and proceedings footer are added by the organizers. License details: http://creativecommons.org/licenses/by/4.0/ to classify a given sentence to be either positive or negative. Dave et al. (2003) continued to include the neutral semantic orientation to his work. These approaches perform sentiment analysis on a whole sentence and use phrases such as adjectives and adverbs to get a polarity. They collect all these phrases and determine their polarity (e.g. positive, neutral, or negative). Hence, it differs from our work that performs sentiment analysis based on each aspect term.
Another approach by Snyder and Barzilay (2007) tries to perform aspect based sentiment analysis, which performs sentiment analysis for various aspects for a given restaurant. Our work differs from their approach and is more closely related to Hu and Liu (2004). Individual parts of the sentence are classified separately since different parts can express different polarities. But the authors only consider product features instead of aspect terms. Aspect terms can be product features but they can also include conditions such as ambience that influences an opinion which have not been addressed in Hu and Liu (2004).

Preliminaries
Our system is based on Natural Language Processing (NLP) libraries such as the Stanford CoreNLP. 1 The system is heavily based on the Stanford sentiment tree.

Stanford sentiment tree
The sentiment treebank introduced by Socher et al. (2013) was developed at the University of Stanford to predict the sentiment of movie reviews. It contains approximately 12,000 sentiment annotated parse trees of movie reviews. The sentiment prediction can determine five sentiment classes (very negative, negative, neutral, positive, very positive) using a recursive neural tensor network trained on

Implementation
Our system is divided into four subsystems that are described separately in the following section. Although described separately, some subtasks depend on each other (e.g. Aspect Category Extraction and Aspect Category Polarity).

Aspect term extraction
The aim of this subtask is to find aspect terms that are discussed in a given sentence. Our approach follows an idea presented by Hu and Liu (2004). A word in a given sentence is considered to be an aspect term if it satisfies the following three conditions. C1.1 It is tagged as a noun (tagged with NN, NNS, NNP, or NNPS).

C1.2
It is one of the 20% most common nouns of all given sentences. Table 1).

C1.3 It does not belong to a forbidden word category (listed in
Following this extraction, adjacent aspect terms are combined to multi-word aspect terms.
Example 1 "My wife bought it and was very happy, especially with the hard drives and battery life." The result of the rule application is shown in Table 2. When multi-word aspect terms are considered, battery and life are combined to a single term. The row indicated by terms shows the extracted aspect terms of the sentence. In the last row gold terms are compared to actual aspect terms given by the training data.
The results of our system are shown in Table 3. These results could be improved by using typed dependencies. The use of the adjectival modifier (amod) and the noun compound modifier (nn) relations can help to improve finding multi-word aspect terms.

Aspect term polarity
After extracting the aspect term from the sentence the next task is to predict its polarity. For this task we are using the Stanford sentiment tree. The sentiment tree is designed to predict the sentiment of a whole sentence. Because the sentiment tree contains polarities for every node of the parse tree it is reasonable to use it for aspect sentiment prediction.
Our algorithm examines the sentiment tree nodes to predict the polarity of an aspect. The following outlines the basic steps for aspect sentiment prediction. (2) neutral (3) negative (4) Figure 1: Example of the sentiment tree algorithm for the sentence "The keyboard is too slik.".
1. Create the sentiment tree for the sentence and fetch the node of the aspect term stem.
2. Traverse the tree from that node up to the root. The first non-neutral polarity on the path from the node to the root node is chosen.
3. If the algorithm reaches the root node without finding a non-neutral polarity, the aspect term is predicted as neutral. Example 2 Figure 1 illustrates the algorithm for the sentence "The keyboard is too slik.". The aspect term keyboard is underlined. The algorithm starts at the keyboard node (denoted with 1) and examines the parent node (2). Since the parent node has a neutral polarity, the root node needs to be examined (3). Due to the negative polarity of the root node, the aspect term keyboard is negative (4).
The results of the algorithm with the test data set are shown in Table 4. We got quite good results for negative and positive aspect terms. But there are problems to predict neutral aspect terms, due to the fact that the sentiment tree rarely predicts neutral polarities. Overall our accuracy is nearly 10 percent points above the ABSA baselines.

Aspect category detection
This section describes the approach for the third subtask that identifies aspect categories discussed in a given sentence, using a predefined set of aspect categories, such as food, service, ambience, price, and anecdotes/miscellaneous as a neutral category. Our approach is twofold, depending on whether the sentence contains aspect terms or not.
Sentences with aspect terms. We illustrate our approach with the following example sentence.
Example 3 Consider the sentence "Even though it is good seafood, the prices are too high." with the predefined aspects terms seafood and price.
1. If the aspect term is a category, it can be directly assigned as a category. In this example the category price is present and will be assigned.
2. Dishes are very challenging to detect as an aspect term. For that problem we added a list of dishes scraped from Wikipedia to detect them. If a noun is not part of the list we search DuckDuckGo 2 for the description of that noun 2 https://duckduckgo.com 3. For unassigned aspect terms, the similarity between aspect terms and all categories will be calculated. For this purpose, RiTa.WordNet similarity has been used. If the path length is smaller than 0.4 (with the help of the training data we experimentally determined the best comparison value) the aspect term is assigned to the category. In our example seafood is similar to food and therefore the category is food.
4. If no aspect category could be found, the category is anecdotes/miscellaneous.
Sentences without aspect term. The third step from the previous approach is executed for all nouns in the sentence. But the threshold is decreased to 0.19 to reduce the number of recognized categories. If no similarity falls below the threshold, the category is anecdotes/miscellaneous. The results of the third subtask are presented in Table 5. Although the presented results are moderately good, there exist some issues worth to be considered here: Using WordNet (Miller, 1995), it is only possible to find the similarity between two concepts and not a group of concepts. For example Japanese Tapas with food would not work. Furthermore, WordNet only recognizes the similarity between words of the same part of speech, it means many possible relations between verbs and nouns, and also adjectives and nouns are missing. Also, we were not able to calculate the similarity between a term and the default category.

Category polarity
This section describes the last subtask which aims to find the polarity of an aspect category for a given sentence. For the given aspect category which can be food, service, ambience, price, or anecdotes/miscellaneous, the task is to find its polarity. This subtask is applied only for the topic restaurant. The second and third subtask must have been solved since their evaluations are required to classify which aspect term belongs to which aspect category. In the third subtask all aspect terms are grouped in categories and in the second one the aspect terms are set with their polarities, which we use to calculate how many times a specific polarity is chosen under the same aspect category. Then we can assign a polarity to a specific aspect category. In order to find the polarities of an aspect category we carefully analyzed the training data and defined a set of rules to find all possible cases. We will discuss these rules in the following.
R4.1 If the aspect term polarities of the same category are equal, then their polarity is tagged as the category polarity.
Example 4 "Prices are higher to dine in and their chicken tikka marsala is quite good." The found aspect terms in this sentence are Prices which is negative and chicken tikka marsala which is positive. Both aspect terms belong to different categories. The category food (chicken tikka marsala) is positive and the category price (Prices) is negative.
R4.2 If one of the aspects of a specified category is neutral, it has no influence on the polarity of a category, as long as at least one other polarity exists. The polarities of all other aspect terms will determine the polarity of a specific category.
Example 5 "Our server checked on us maybe twice during the entire meal." In this sentence the following aspect terms are found: server as negative and meal as neutral. Both aspect terms belong to the same category service, so the category service has the value negative.
R4.3 If the aspect term polarities under a same category are both positive and negative, then the category polarity is tagged as conflict.
Example 6 As an example consider the sentence: "The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable." Here four aspect terms were found: sweet lassi, lamb chettinad, and garlic naan with positive polarities but rasamalai has a negative polarity. This results in a conflict polarity for the category food.
R4.4 If the found category was annotated as anecdotes/miscellaneous but no aspect term was found in the second subtask, then we use the sentiment tree. It generates a specific polarity for the entire sentence which we define as the category's polarity.
Example 7 The sentence: "A guaranteed delight!" has no aspect term. Using the sentiment tree the polarity for the category anecdotes/miscellaneous is positive. We applied our approach on the training data. The results are shown in Table 6. We achieved an F-measure of 0.85 for the positive polarity. Our accuracy is 0.56 which is not a good achievement in comparison to other submissions in this subtask. The possible reason for this result could be that the first subtask also did not reach good accuracy measures.

Conclusion & future works
This paper describes our system to solve the individual subtasks by using the Stanford CoreNLP, RiTa.WordNet (Guerini et al., 2013) and a food database developed by ourselves. These libraries offer methods to classify sentences and determine the polarities.
Through the usage of the library based methods, it is not possible to take effect to the result. At this point other libraries such as NLTK 3 could help to increase it. They offer the possibility to train several classifiers with own data. But the classifier are not domain independent, because they need to be trained with sentences that belong to a specific domain, e.g. laptop or restaurant, in order to get the right polarity.
Our approach is more domain independent, because we do not need any domain to calculate the right polarities. That's why we can use our tool to process sentences of any domain, without further changing the algorithms.
In the future, we expect progress towards the following directions. First, we want to improve the identification of aspect terms which consist of more than two consecutive nouns. Second, we want to identify aspect terms which are not available as a part of the sentence. Finally, improvements to determine polarity of sentences with unclear context (i.e. the absence of adjectives).