Aleszu Bajak at Storybench.org published a great demonstration of the power of text mining. He used the R
tidytext package to analyse 150,000 wine reviews which Zach Thoutt had scraped from Wine Enthusiast in November of 2017.
Aleszu started his analysis on only the French wines, with a simple word count per region:
Next, he applied TF-IDF to surface the words that are most characteristic for specific French wine regions — words used often in combination with that specific region, but not in relation to other regions.
The data also contained some price information, which Aleszu mapped France with
ggplot2 and the
maps package to demonstrate which French wine regions are generally more costly.
On the full dataset, Alezsu also demonstrated that there is a strong relationship between price and points, meaning that, in general, more expensive wines seem to get better reviews:
The full script and more details you can find in the orginal blog.