## Evaluation

Evaluation of participating systems will be based on cosine similarity, in a spirit similar to [Ghosh et al., 2015]. As the sentiment scores to be predicted by systems lie on a continuous scale between -1 and 1, cosine enables us to compare the degree of agreement between gold standard and predicted results. At the same time, while not requiring exact correspondence between the gold and predicted score, a given instance does not need to be identical in order to achieve a good evaluation result. The scores are conceptualised as vectors, where each dimension represents a stock symbol or company within a given microblog message or headline . Note that the both vectors will have the same number of dimensions as the stock symbols and companies for which sentiment needs to be assigned will be given in the input data.. Cosine similarity will be calculated according the following equation, where G is the vector of gold standard scores and P is the vector of scores predicted by the system:

$cosine(G, P) = \frac{\sum\limits_{i=1}^n{G_i \times P_i}}{\sqrt{\sum\limits_{i=1}^n{G_i^2}} \times \sqrt{\sum\limits_{i=1}^n{P_i^2}}}$

In order to reward systems which attempt to answer all problems in the gold standard, the final score is obtained by weighting the cosine from (1) with the ratio of answered problems (scored instances), given below (in line with [Ghosh et al., 2015]).

$cosine\_weight = \frac{\left | P \right |}{\left | G\right |}$

The equation for the final score is the product of the cosine and the weight, given below:

$final\_cosine\_score = weight\_cosine \times cosine(G, P)$