Harry Plotter: Network analysis of spell usage

Apparently, I was not the only geek who decided to celebrate the 20th anniversary of the Harry Potter saga with statistical analysis. Students Moritz Haine and Markus Dienstknecht of the Data Science for Decision Making Master at Maastricht University started their own celebratory project as part of a course Information Retrieval and Text Mining. Students in…

Harry Plotter: Shiny App of Spell Usage

In my second Harry Plotter blog (22-Aug-2017), I wrote: I would like to demonstrate how regular expressions can be used to retrieve (sub)strings that follow a specific format. We could use regex to examine, for instance, when, and by whom, which magical spells are cast. Well, Prusinowskik (real name unknown) beat me to it, and how! S/He formed a comprehensive…

Text Mining: Pythonic Heavy Metal

This blog summarized work that has been posted here, here, and here. Iain of degeneratestate.org wrote a three-piece series where he applied text mining to the lyrics of 222,623 songs from 7,364 heavy metal bands spread over 22,314 albums that he scraped from darklyrics.com. He applied a broad range of different analyses in Python, the code of which…

Harry Plotter: Part 2 – Hogwarts Houses and their Stereotypes

Two weeks ago, I started the Harry Plotter project to celebrate the 20th anniversary of the first Harry Potter book. I could not have imagined that the first blog would be so well received. It reached over 4000 views in a matter of days thanks to the lovely people in the data science and #rstats community that were kind enough to share it…

Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R

It has been twenty years since the first Harry Potter novel, the sorcerer’s/philosopher’s stone, was published. To honour the series, I started a text analysis and visualization project, which my other-half wittily dubbed Harry Plotter. In several blogs, I intend to demonstrate how Hadley Wickham’s tidyverse and packages that build on its principles, such as tidytext (free book), have taken programming in R to an…