As promised, here is the post-election analysis. Although my predicted voting percentage for AKP was much closer to the actual result compared to most of the traditional polls, it is also true that my predicted value for MHP is far off, making the overall prediction error bigger than most conventional polls (see table below).
Election results | AKP | CHP | MHP | HDP | Others | prediction error |
---|---|---|---|---|---|---|
49.4 | 25.4 | 11.9 | 10.7 | 2.5 | ||
My prediction | 47.3 | 22.4 | 18.8 | 11.68 | 0 | 3.1 |
Traditional polls: | ||||||
Andy-Ar | 43.7 | 27.1 | 14.0 | 13.0 | 2.2 | 2.42 |
Konda | 41.7 | 27.9 | 14.2 | 13.8 | 2.3 | 3.16 |
A&G | 47.2 | 25.3 | 13.5 | 12.2 | 1.8 | 1.22 |
Gezici | 43 | 26.1 | 14.9 | 12.2 | 3.8 | 2.58 |
Metropoll | 43.3 | 25.9 | 14.8 | 13.4 | 2.6 | 2.46 |
ORC | 43.3 | 27.4 | 14 | 12.2 | 3.1 | 2.46 |
So to be honest, I have to conclude that the results of this research do not point towards a clear victory for Twitter Data Analytics. Although it is not a clear victory, it is also not a clear loss. On the bright side, this research was done with a few Amazone EC2 instances with a total cost of about three dollars, while the cost of traditional polls was in the range of a few million (put mildly). For the ones who are interested, this is an interesting article about the current state of the polling industry.
I still believe that the content of Twitter can be representative of an electorate and political sentiment can be modeled from Twitter messages effectively. However, it is clear that further research is needed and challenges lie ahead.
At the moment I can not give a clear answer to the question why there is such a large discrepancy between the predicted and actual result. I hope to provide you with a better explanation later on, but for now I can already tell you that this discrepancy is partly caused by ‘Ahmet Kaya’.
There were two politicians of the MHP party, named ‘Ahmet Kaya’ who were also participating in the elections (one for the province of Diyarbakir and one for the province of Erzincan). Now, the problem with these two politicians is that Ahmet Kaya was also the name of a very famous Turkish singer (who happened to be born on 28 October). Ofcourse the Twitter Data Collector is not smart enough to distinguish between Ahmet Kaya the politician and Ahmet Kaya the singer and since I did not check the content of the Tweets or go through the dictionary containing the names of the ~550 politicians in great detail, MHP got thousands of Tweets more than it should have…
In later posts I will go into the more technical part about how to collect data from Twitter, for the ones interested in doing Twitter Data Analytics.