During the last weeks, the world of finance has been shaken by the “Gamestonk” effect, a real digital revolution allegedly triggered by a group of independent day-traders fed-up with institutional funds’ financial hegemony and carelessness. The latest decision of the latter to take a short position at the detriment of videogames iconic brand Gamestop represented the actual straw that breaks the camel’s back for an horde of tech-savvy retail investors. Leveraging on unconventional trading platforms like now famous Robinhood, they mobilized in union on the social media Reddit and were able within a few hours to enact a short squeeze which caused not few losses to powerful actors of the financial establishment like the Melvin Capital fund.
Through its suggestive tones of a David-vs-Goliath fight and an atmosphere which recalls the Occupy Wall Street Movement, the GME saga soon became legendary, so popular to be deemed worth of ad hoc media adaptations, as proven by the interest that producers of the likes of MGM and Netflix manifested for its movie rights.
Actually, the GME revolt hogged the attention also of the media, and anyone could read tons of articles shedding lights upon the financial details behind its dynamics, commenting the GME case’s details from a quite technical perspective. However, to date instead I haven’t spotted berely any investigation regarding the actual discourses taking place in the online community where apparently everything started– the subreddit r/WallStreetBets. So I thought, why not taking a look at what these digital rebels have shared on this platform?
Thus, I bothered Reddit’s API (one of the very few which in this period of APIcalypse still allows you to collect a decent amount of data) and gathered with relatively ease about 5.000 posts and 200.000 comments posted on the subreddit from October 2020 to the beginning of February 2021.
Then, as a first step I performed a “classic” sentiment analysis (both polarity and emotion detection), for which I relied on some widely-used lexicons like Harvard’s General Inquirer, Nielsen’s Afinn , NRC-Emolex  and best-in-class LIWC . Later, given the financial context of this saga, I decided to compare them to more finance-focused dictionaries like Henry’s  and Loughran & McDonald’s lexicons, despite their scope being narrower. All in all, the results of the sentiment polarity detection conform to what one might have expected: except for NRC-Emolex, all sentiment scores seem to intensify exactly around the key dates of the GME revolt. What’s instead more interesting is that such peaks in sentiment anticipate by a few days the peak in closing prices reached by GME’s “inflated” shares (i.e. $350 on Jan 27th), proving that there was particular excitement and mobilization on the platform.
As a matter of fact, considering emotion detection, it is precisely during those days of preparation of the short squeeze that negative emotions like anger and disgust intensify, whilst positive like joy tend to decline. To validate their performance, then I computed and aggregate sentiment polarity index and compared it with some metadata that the platform provides and that can be taken as proxies of user engagement (namely, comment and post scores). This time, it was pretty interesting to notice not only how these engagement metrics are (weakly) negatively correlated to the overall sentiment of the comments posted within the subreddit, but also how they suddenly plummeted precisely around the 15th-20th of January — when the overall sentiment showed a peak.
In conclusion, the evidences of these very preliminary analyses should be definitely taken “with grain of salts”. Every online community is characterized by its own specific netiquette and by an idiosyncratic jargon, which is very likely disregarded by mainstream lexicons developed for other uses and other contexts. Some words and lemmas, that tend to have a somehow taken-for-granted sentiment polarity in everyday language, in these particular spheres may turn out to be used with completely different nuances of meaning, causing some troubles during sentiment and emotion detection tasks. For example, r/WallStreetBets members share the habit of calling each other “retards”, which- despite being within the community a friendly acronym of traders, used to stress their non-belongingness to high spheres of the financial world- instead has surely a strong negative connotation in everyday communication.
Nevertheless, sentiment analysis based on bag-of-words approaches and predefined dictionaries is still a fast and useful way to grasp at least a general understanding of the sentiment narrative of a thick corpus of textual data, especially when dealing with massive amount of UGC, as it allows to detect peaks and troughs in sentiment which the researcher can further explore more in depth with other techniques. For the next steps, I’m planning to perform some topic modelling via LDA on the same data, as I’m curious to see which specific topics can be associated with each peak in sentiment.
**Methodological clarification: All analyses were perfomed in R, mainly within the quanteda approach. The other lexicons were retrieved either from dedicated libraries like syuzhet, qdapDictionaries or directly from the websites of their authors.
 Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. arXiv preprint arXiv:1103.2903.
 Mohammad, S. M., & Turney, P. D. (2013). Nrc emotion lexicon. National Research Council, Canada, 2
 Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29(1), 24–54.
 Henry (2008): Are Investors Influenced By How Earnings Press Releases Are Written?, Journal of Business Communication, 45:4, 363–407