I've mentioned before that I dislike wordclouds (for instance here, or here) and apparently others share that sentiment. In his recent Medium blog, Daniel McNichol goes as far as to refer to the wordcloud as the pie chart of text data! Among others, Daniel calls wordclouds disorienting, one-dimensional, arbitrary and opaque and he mentions their lack of order, … Continue reading Chatterplots
Aleszu Bajak at Storybench.org published a great demonstration of the power of text mining. He used the R tidytext package to analyse 150,000 wine reviews which Zach Thoutt had scraped from Wine Enthusiast in November of 2017. Aleszu started his analysis on only the French wines, with a simple word count per region: Next, he applied TF-IDF to surface the … Continue reading Become a data-driven Sommelier by text mining wine reviews
Jordan Dworkin, a Biostatistics PhD student at the University of Pennsylvania, is one of the few million fans of Stranger Things, a 80s-themed Netflix series combining drama, fantasy, mystery, and horror. Awaiting the third season, Jordan was curious as to the emotional voyage viewers went through during the series, and he decided to examine this … Continue reading Sentiment Analysis of Stranger Things Seasons 1 and 2
Two weeks ago, I started the Harry Plotter project to celebrate the 20th anniversary of the first Harry Potter book. I could not have imagined that the first blog would be so well received. It reached over 4000 views in a matter of days thanks to the lovely people in the data science and #rstats community that were kind enough to share it … Continue reading Harry Plotter: Part 2 – Hogwarts Houses and their Stereotypes
This is reposted from DavisVaughan.com with minor modifications. Introduction A while back, I saw a conversation on twitter about how Hadley uses the word “pleased” very often when introducing a new blog post (I couldn’t seem to find this tweet anymore. Can anyone help?). Out of curiosity, and to flex my R web scraping muscles a bit, … Continue reading Scraping RStudio blogs to establish how “pleased” Hadley Wickham is.
Reposted from Variance Explained with minor modifications. This post follows an earlier post on the same topic. A year ago today, I wrote up a blog post Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half. My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people, posting … Continue reading Variance Explained: Text Mining Trump’s Twitter – Part 2