Biocom Usp: Tweet Sentiment Analysis with Adaptive Boosting Ensemble

We describe our approach for the SemEval-2014 task 9: Sentiment Analysis in Twitter . We make use of an ensemble learning method for sentiment classiﬁcation of tweets that relies on varied features such as feature hashing, part-of-speech, and lexical features. Our system was evaluated in the Twitter message-level task


Introduction
The sentiment analysis is a field of study that investigates feelings present in texts. This field of study has become important, especially due to the internet growth, the content generated by its users, and the emergence of the social networks. In the social networks such as Twitter people post their opinions in a colloquial and compact language, and it is becoming a large dataset, which can be used as a source of information for various automatic tools of sentiment inference. There is an enormous interest in sentiment analysis of Twitter messages, known as tweets, with applications in several segments, such as (i) directing marketing campaigns, extracting consumer reviews of services and products (Jansen et al., 2009); (ii) identifying manifestations of bullying (Xu et al., 2012); (iii) predicting to forecast box-office revenues for movies (Asur and Huberman, 2010); and (iv) predicting acceptance or rejection of presidential candidates (Diakopoulos and Shamma, 2010;O'Connor et al., 2010).

This work is licensed under a Creative Commons
Attribution 4.0 International Licence.
Page numbers and proceedings footer are added by the organisers.
Licence details: http://creativecommons.org/licenses/by/4.0/ One of the problems encountered by researchers in tweet sentiment analysis is the scarcity of public datasets. Although Twitter sentiment datasets have already been created, they are either small -such as Obama-McCain Debate corpus (Shamma et al., 2009) and Health Care Reform corpus (Speriosu et al., 2011) or big and proprietary such as in (Lin and Kolcz, 2012). Others rely on noisy labels obtained from emoticons and hashtags (Go et al., 2009). The SemEval-2014 task 9: Sentiment Analysis in Twitter (Nakov et al., 2013) provides a public dataset to be used to compare the accuracy of different approaches.
In this paper, we propose to analyse tweet sentiment with the use of Adaptive Boosting (Freund and Schapire, 1997), making use of the well-known Multinomial Classifier. Boosting is an approach to machine learning that is based on the idea of creating a highly accurate prediction rule by combining many relatively weak and inaccurate rules. The AdaBoost algorithm (Freund and Schapire, 1997) was the first practical boosting algorithm, and remains one of the most widely used and studied, with applications in numerous fields. Therefore, it has potential to be very useful for tweet sentiment analysis, as we address in this paper.

Related Work
Classifier ensembles for tweet sentiment analysis have been underexplored in the literature -a few exceptions are (Lin and Kolcz, 2012;Clark and Wicentwoski, 2013;Rodriguez et al., 2013;Hassan et al., 2013). Lin and Kolcz (2012) used logistic regression classifiers learned from hashed byte 4grams as features -The feature extractor considers the tweet as a raw byte array. It moves a four-byte sliding window along the array, and hashes the contents of the bytes, the value of which was taken as the feature id. Here the 4-grams refers to four characters (and not to four words). They made no attempt to perform any linguistic processing, not even word tokenization. For each of the (proprietary) datasets, they experimented with ensembles of different sizes. The ensembles were formed by different models, obtained from different training sets, but with the same learning algorithm (logistic regression). Their results show that the ensembles lead to more accurate classifiers. Rodrígues et al. (2013) and Clark et al. (2013) proposed the use of classifier ensembles at the expression-level, which is related to Contextual Polarity Disambiguation. In this perspective, the sentiment label (positive, negative, or neutral) is applied to a specific phrase or word within the tweet and does not necessarily match the sentiment of the entire tweet.
Finally, another type of ensemble framework has been recently proposed by Hassan et al. (2013), who deal with class imbalance, sparsity, and representational issues. The authors propose to enrich the corpus using multiple additional datasets related to the task of sentiment classification. Differently from previous works, the authors use a combination of unigrams and bigrams of simple words, partof-speech, and semantic features.
None of the previous works used AdaBoost (Freund and Schapire, 1996). Also, lexicons and/or part-of-speech in combination with feature hashing, like in (Lin and Kolcz, 2012) have not been addressed in the literature.

AdaBoost Ensemble
Boosting is a relatively young, yet extremely powerful, machine learning technique. The main idea behind boosting algorithms is to combine multiple weak learners -classification algorithms that perform only slightly better than random guessing -into a powerful composite classifier. Our focus is on the well known AdaBoost algorithm (Freund and Schapire, 1997) based on Multinomial Naive Bayes as base classifiers (Figure 1).
AdaBoost and its variants have been applied to diverse domains with great success, owing to their solid theoretical foundation, accurate prediction, and great simplicity (Freund and Schapire, 1997). For example, Viola and Jones (2001) used AdaBoost to face detection, Hao and Luo (2006) dealt with image segmentation, recognition of handwritten digits, and outdoor scene classification problems. In (Bloehdorn and Hotho, 2004) text classification is explored.

Feature Engineering
The most commonly used text representation method adopted in the literature is known as Bag of Words (BOW) technique, where a document is considered as a BOW, and is represented by a feature vector containing all the words appearing in the corpus. In spite of BOW being simple and very effective in text classification, a large amount of information from the original document is not considered, word order is ruptured, and syntactic structures are broken. Therefore, sophisticated feature extraction methods with a deeper understanding of the documents are required for sentiment classification tasks. Instead of using only BOW, alternative ways to represent text, including Part of Speech (PoS) based features, feature hashing, and lexicons have been addressed in the literature.
We implemented an ensemble of classifiers that receive as input data a combination of three features sets: i) lexicon features that captures the semantic aspect of a tweet; ii) feature hashing that captures the surface-form as abbreviations, slang terms from this type of social network, elongated words (for example, loveeeee), sentences with words without a space between them (for instance, Ilovveapple!), and so on; iii) and a specific syntactic features for tweets. Technical details of each feature set are provided in the sequel.

Lexicon Features
We use the sentimental lexicon provided by (Thelwall et al., 2010) and (Hu and Liu, 2004). The former is known as SentiStrength and provides: an emotion vocabulary, an emoticons list (with positive, negative, and neutral icons), a negation list, and a booster word list. We use the negative list in cases where the next term in a sentence is an opinion word (either positive or negative). In such cases we have polarity inversion. For example, in the sentence "The house is not beautiful", the negative word "not" invert the polarity of the opinion word beautiful. The booster word list is composed by adverbs that suggest more or less emphasis in the sentiment. For example, in the sentence "He was incredibly rude." the term "incredibly" is an adverb that lay emphasis on the opinion word "rude". Besides using SentiStrength, we use the lexicon approach proposed by (Hu and Liu, 2004). In their approach, a list of words and associations with positive and negative sentiments has been provided that are very useful for sentiment analysis.
These two lexicons were used to build the first feature set according to Table 1, where it is presented an example of tweet representation for the tweet 1 : "The soccer team didn't play extremely bad last Wednesday." The word "bad" exists in the lexicon list of (Hu and Liu, 2004), and it is a negative word. The word "bad" also exists in the negation list provided by (Thelwall et al., 2010). The term "didn't" is a negative word according to SentiStrength (Thelwall et al., 2010) and there is a polarity inversion of the opinion words ahead. Finally, the term "extremely" belongs the booster word list and this word suggests more emphasis to the opinion words existing ahead.

Feature hashing
Feature hashing has been introduced for text classification in (Shi et al., 2009), (Weinberger et al., 2009, (Forman and Kirshenbaum, 2008), (Langford et al., 2007), (Caragea et al., 2011). In the context of tweet classification, feature hashing offers an approach to reducing the number of features provided as input to a learning algorithm. The original high-dimensional space is "reduced" by hashing the features into a lower-dimensional space, i.e., mapping features to hash keys. Thus, multiple features can be mapped to the same hash key, thereby "aggregating" their counts.

Specific syntactic (PoS) features
We used the Part of Speech (PoS) tagged for tweets with the Twitter NLP tool (Gimpel et al., 2011). It encompasses 25 tags including Nominal, Nominal plus Verbal, Other openclass words like adjectives, adverbs and interjection, Twitter specific tags such as hashtags, mention, discourse marker, just to name a few. Table 3 shows an example of syntactic features representation.
tag 1 tag 2 tag 3 tag 4 · · · tag 25 class A combination of lexicons, feature hashing, and part-of-speech is used to train the ensemble classifiers, thereby resulting in 1024 features from feature hashing, 3 features from lexicons, and 25 features from PoS.

Experimental Setup and Results
We conducted experiments by using the WEKA platform 1 . Table 4 shows the class distributions in training, development, and testing sets. Table 5 presents the results for positive and negative classes with the classifiers used in training set, and Table 6 Table 6: Results in the test sets -AdaBoost plus Multinomial Naive Bayes, which was the best algorithm in cross validation. that further investigations are necessary before making strong claims about this result.
Overall, the SemEval Tasks have make evident the usual challenges when mining opinions from Social Media channels: noisy text, irregular grammar and orthography, highly specific lingo, and others. Moreover, temporal dependencies can affect the performance if the training and test data have been gathered at different.