ÚFAL: Using Hand-crafted Rules in Aspect Based Sentiment Analysis on Parsed Data

This paper describes our submission to Se-mEval 2014 Task 4 1 (aspect based sentiment analysis). The current work is based on the assumption that it could be advantageous to connect the subtasks into one workﬂow, not necessarily following their given order. We took part in all four sub-tasks (aspect term extraction, aspect term polarity, aspect category detection, aspect category polarity), using polarity items detection via various subjectivity lexicons and employing a rule-based system applied on dependency data. To determine aspect categories, we simply look up their WordNet hypernyms. For such a basic method using no machine learning techniques, we consider the results rather satisfactory.


Introduction
In a real-life scenario, we usually do not have any golden aspects at our disposal. Therefore, it could be practical to be able to extract both aspects and their polarities at once. So we first parse the data, bearing in mind that it is very difficult to detect both sources/targets and their aspects on plain text corpora. This holds especially for pro-drop languages, e.g. Czech (Veselovská et al., 2014) but the proposed method is still language independent to some extent. Secondly, we detect the polarity items in the parsed text using a union of two different existing subjectivity lexicons (see Section 2). Afterwards, we extract the aspect terms in the dependency structures containing polarity ex-1 http://alt.qcri.org/semeval2014/ task4/ This work is licensed under a Creative Commons Attribution 4.0 International Licence. Page numbers and proceedings footer are added by the organisers. Licence details: http://creativecommons.org/licenses/by/4.0/ pressions. In this task, we employ several handcrafted rules detecting aspects based on syntactic features of the evaluative sentences, inspired by the method by Qiu et al. (2011). Finally, we identify aspect term categories with the help of the English WordNet and derive their polarities based on the polarities of individual aspects. The obtained results are discussed in Section 4.

Related Work
This work is related to polarity detection based on a list of evaluative items, i.e. subjectivity lexicons, generally described e.g. in Taboada et al. (2011). The English ones we use are minutely described in Wiebe et al. (2005) and several papers by Bing Liu, starting with Hu and Liu (2004). Inspired by Kobayashi et al. (2007), who make use of evaluative expressions when learning syntactic patterns obtained via pattern mining to extract aspect-evaluation pairs, we use the opinion words to detect evaluative structures in parsed data. The issue of target extraction in sentiment analysis is discussed in articles proposing different methods, mainly tested on product review datasets (Popescu and Etzioni, 2005;Mei et al., 2007;Scaffidi et al., 2007). Some of the authors take into consideration also product aspects (features), defined as product components or product attributes (Liu, 2006). Hu and Liu (2004) take as the feature candidates all noun phrases found in the text. Stoyanov and Cardie (2008) see the problem of target extraction as part of a topic modelling problem, similarly to Mei et al. (2007). In this contribution, we follow the work of Qiu et al. (2011) who learn syntactic relations from dependency trees.

Pipeline
Our workflow is illustrated in Figure 1. We first pre-process the data, then mark all aspects seen in the training data (still on plain text). The rest of the pipeline is implemented in Treex (Popel and  Obj aspect I liked the beer selection.  Zabokrtský, 2010) and consists of linguistic analysis (tagging, dependency parsing), identification of evaluative words, and application of syntactic rules to find the evaluated aspects. Finally, for restaurants, we also identify aspect categories and their polarity.

Data
We used the training and trial data provided by the organizers. During system development, we used the trial section as a held-out set. In the final submission, both datasets are utilized in training.

Pre-processing
The main phase of pre-processing (apart from parsing the input files and other simple tasks) is running a spell-checker. As data for this task comes from real-world reviews, it contains various typos and other small errors. We therefore implemented a statistical spell-checker which works in two stages: 1. Run Aspell 2 to detect typos and obtain suggestions for them.
2. Select the appropriate suggestions using a language model (LM).
We trained a trigram LM from the English side of CzEng 1.0 (Bojar et al., 2012) using SRILM (Stolcke, 2002). We binarized the LM and use the Lazy decoder (Heafield et al., 2013) for selecting the suggestions that best fit the current context. Our script is freely available for download. 3 We created a list of exceptions (domain-specific words, such as "netbook", are unknown to Aspell's dictionary) which should not be corrected and also skip named entities in spell-checking.

Marking Known Aspects
Before any linguistic processing, we mark all words (and multiword expressions) which are marked as aspects in the training data. For our final submission, the list also includes aspects from the provided development sets.

Morphological Analysis and Parsing
Further, we lemmatize the data and parse it using Treex (Popel andŽabokrtský, 2010), a modular framework for natural language processing (NLP). Treex is focused primarily on dependency syntax and includes blocks (wrappers) for taggers, parsers and other NLP tools. Within Treex, we used the Morče tagger (Hajič et al., 2007) and the MST dependency parser (McDonald et al., 2005).

Finding Evaluative Words
In the obtained dependency data, we detect polarity items using MPQA subjectivity lexicon (Wiebe et al., 2005) and Bing Liu's subjectivity clues. 4 Task 1: aspect extraction   Table 3: Results of our system on the Laptops dataset as evaluated by the task organizers.
We lemmatize both lexicons and look first for matching surface forms, then for matching lemmas. (English lemmas as output by Morče are sometimes too coarse, eliminating e.g. negation -we can mostly avoid their matching by looking at surface forms first.)

Syntactic Rules
Further, we created six basic rules for finding aspects in sentences containing evaluative items from the lexicons, e.g. "If you find an adjective which is a part of a verbonominal predicate, the subject of its governing verb should be an aspect.", see Table 1. Situational functions are marked with subscript, PAdj and PNoun stand for adjectival and nominal predicative expressions.
Moreover, we applied three more rules concerning coordinations. We suppose that if we find an aspect, every member of a given coordination must be an aspect too. The excellent mussels, puff pastry, goat cheese and salad.
Concerning but-clauses, we expect that if there is no other aspect in the second part of the sentence, we assign the conflict value to the identified aspect.
The food was pretty good, but a little flavorless.
If there are two aspects identified in the but-coordination, they should be marked with opposite polarity.
The place is cramped, but the food is fantastic!

Aspect Categories
We collect a list of aspects from the training data and find all their hypernyms in WordNet (Fellbaum, 1998). We hand-craft a list of typical hypernyms for each category (such as "cooking" or "consumption" for the category "food"). Moreover, we look at the most frequent aspects in the training data and add as exceptions those for which our list would fail.
We rely on the output of aspect identification for this subtask. For each aspect marked in the sentence, we look up all its hypernyms in Word-Net and compare them to our list. When we find a known hypernym, we assign its category to the aspect. Otherwise, we put the aspect in the "anecdotes/miscellaneous" category. For category polarity assignment, we combine the polarities of all aspects in that category in the following way: Table 2 and Table 3 summarize the results of our submission. We do not achieve the best performance in any particular task, our system overall ranked in the middle.

Results and Discussion
We tend to do better in terms of recall than precision. This effect is mainly caused by our decision to also automatically mark all aspects seen in the training data.

Effect of the Spell-checker
We evaluated the performance of our system with and without the spell-checker. Overall, the impact is very small (f-measure stays within 2-decimal rounding error). In some cases its corrections are useful ("convienent" → "convenient parking"), sometimes its limited vocabulary harms our system ("fettucino alfredo" → "fitting Alfred"). This issue could be mitigated by providing a custom lexicon to Aspell.

Sources of Errors
As we always extract aspects that were observed in the training data, our system often marks them in non-evaluative contexts, leading to a considerable number of false positives. However, using this approach improves our f-measure score due to the limited recall of the syntactic rules.
The usefulness of our rules is mainly limited by the (i) sentiment lexicons and (ii) parsing errors.
(i) Since we used the lexicons directly without domain adaptation, many domain-specific terms are missed ("flavorless", "crowded") and some are matched incorrectly.
(ii) Parsing errors often confuse the rules and negatively impact both recall and precision. Often, they prevented the system from taking negation into account, so some of the negated polarity items were assigned incorrectly.
The "conflict" polarity value was rarely correct -all aspects and their polarity values need to be correctly discovered to assign this value. However, this type of polarity is infrequent in the data, so the overall impact is small.
Having participated in all four tasks, our system can be readily deployed as a complete solution which covers the whole process from plain text to aspects and aspect categories annotated with polarity. Considering the number of tasks covered and the fact that our system is entirely rule-based, the achieved results seem satisfactory.

Conclusion and Future Work
In our work, we developed a purely rule-based system for aspect based sentiment analysis which can both detect aspect terms (and categories) and assign polarity values to them. We have shown that even such a simple approach can achieve relatively good results.
In the future, our main plan is to involve machine learning in our system. We expect that outputs of our rules can serve as useful indicator features for a discriminative learning model, along with standard features such as bag-of-words (lemmas) or n-grams.