It has been twenty years since the first Harry Potter novel, the sorcerer’s/philosopher’s stone, was published. To honour the series, I started a text analysis and visualization project, which my other-half wittily dubbed Harry Plotter. In several blogs, I intend to demonstrate how Hadley Wickham’s tidyverse and packages that build on its principles, such as tidytext (free book), have taken programming in R to an … Continue reading Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R→
Yesterday was the second anniversary of my website. I also reflected on this moment last year, and I thought to continue the tradition in 2019. Let me start with a great, big THANK YOUto all my readers for continuing to visit my website! You are the reason I continue to write down what I read. … Continue reading Two years of paulvanderlaken.com→
I’ve mentioned before that I dislike wordclouds (for instance here, or here) and apparently others share that sentiment. In his recent Medium blog, Daniel McNichol goes as far as to refer to the wordcloud as the pie chart of text data! Among others, Daniel calls wordclouds disorienting, one-dimensional, arbitrary and opaque and he mentions their lack of order, … Continue reading Chatterplots→
Aleszu Bajak at Storybench.org published a great demonstration of the power of text mining. He used the R tidytext package to analyse 150,000 wine reviews which Zach Thoutt had scraped from Wine Enthusiast in November of 2017. Aleszu started his analysis on only the French wines, with a simple word count per region: Next, he applied TF-IDF to surface the … Continue reading Become a data-driven Sommelier by text mining wine reviews→
One year ago, I registered the domain paulvanderlaken.com with three reasons in mind: (1) I wanted an online environment to store and showcase my pet projects, (2) to share and promote some of the great blogs and research others had been writing, and (3) to show others what I was doing on my path to … Continue reading One year of paulvanderlaken.com→
Sentiment analysis is a topic I cover regularly, for instance, with regard to Harry Plotter, Stranger Things, or Facebook. Usually I stick to the three sentiment dictionaries (i.e., lexicons) included in the tidytext R package (Bing, NRC, and AFINN) but there are many more one could use. Heck, I’ve even tried building one myself using a synonym/antonym … Continue reading Sentiment Analysis: Analyzing Lexicon Quality and Estimation Errors→