Introduction

 

Financial data is a challenging use case for Sentiment Analysis because it has been shown that sentiments and opinions can affect market dynamics [Goonatilake and Herath, 2007,Van de Kauter et al., 2015].  Sentiment is in some cases derived from news which might present macroeconomic factors, company-specific, or political information. Good news tend to lift the markets and increase optimism [Van de Kauter et al., 2015, Schuster, 2003]. Furthermore, news articles frequently include market-relevant information [Sinha, 2014]. The finance domain is particularly relevant as various Financial Technology (Fintech) projects and companies have merged, which has affected “strategies, market positioning, and value propositions” .

Given the link between sentiment and market dynamics, the analysis of public sentiment becomes a powerful method to predict the market reaction. However, the accuracy of machine learning-based sentiment analysis approaches rarely exceeds seventy percent [Takala et al., 2014, Eagle Alpha, 2016]. Research effort is required to overcome and address complex linguistic issues, such as sarcasm, irony and poorly-structured and/or colloquial language [Eagle Alpha, 2016].  In addition, text that is short in length (such as microblog messages) can be quite opinionated, dense in information and challenging to parse, due to the different vocabularies used [Sinha, 2014]. These tangible points for further improvement motivate us to provide a task which aims at assessing the overall market sentiment as well as sentiment about specific stocks, which will enable us to make use of their predictive power.

Enhancing the quality of sentiment analysis will directly benefit various groups and have an economic impact, which makes it highly interesting as well as valuable. This will also empower both the public and private sectors to develop innovative services and products that are able to leverage the large amounts of sentiment data which are constantly produced and published on various social media networks and newspapers, in the financial domain.

 

Task Description

 

What?

 

The proposed task aims at catalysing discussions around approaches of semantic interpretation of financial texts by targeting a concrete sentiment analysis task, which identifies bullish (optimistic; believing that the stock price will increase) and bearish (pessimistic; believing that the stock price will decline) sentiment associated with companies and stocks.

 

Why?

  1. Developing state-of-the-art on classification methods for sentiment analysis in the domain of financial texts.
  2. Incentivising the creation of new lexical resources for the financial domain.
  3. Understanding how state-of-the-art sentiment analysis performs on a domain-specific / highly technical corpora.
  4. Improving the understanding of linguistic phenomena and the creation of semantic models for the financial domain.

 

How?

Participating systems will need to fulfil the following task: given a text instance (microblog message in Track 1, news statement or headline in Track 2), predict the sentiment score for each of the companies/stocks mentioned. Sentiment values need to be floating point values in the range of -1 (very negative/bearish) to 1 (very positive/bullish), with 0 designating neutral sentiment.
 

Track 1 - Microblog Messages

StockTwits Messages Consists of microblog messages focusing on stock market events and assessments from investors and traders, exchanged via the StockTwits microblogging platform. Typical stocktwits consist of references to company stock symbols (so-called cashtags - a stock symbol preceded by “$”, e.g. “$AAPL” for the company Apple Inc.), a short supporting text or references to a link or pictures (typically containing charts showing stock values analysis).

Twitter Messages Some stock market discussion also takes place on the Twitter platform. In order to extend and diversify our data sources, we extract Twitter posts containing company stock symbols (cashtags).

 

Track 2 - News Statements & Headlines

Sentences taken from news headlines as well as news text. Textual content will be crawled from different sources on the internet, such as Yahoo Finance.

 

Acknowledgements

 

Horizon 2020 ICT Programme Project SSIX: Social Sentiment analysis financial IndeXes, ICT-2014-15.a Big OpenData, Grant agreement no: 645425 for Innovation action (2015-2018)

 

Contact Info

Announcements