Analysis of Media Coverage on Refugees

Hannah Yan Han is doing #100dayprojects on data science and visual storytelling and I can only recommend that you take a look yourself. Below you find her R text analysis (#41) of UNHCR speeches and TV coverage on refugees. Unsurprisingly, nouns like asylum, repatriation, displacement, persecution, plight, and crisis appear significantly more often in UNHCR speeches on refugees than…

Visualizing #IRMA Tweets

Reddit user LucasCu90 used the R package twitteR to retrieve all tweets that were sent with #Irma and a Geocode of central Miami (25 mile radius) from Saturday September 9, to Sunday September 10, 2017 (the period of Irma’s approach and initial landfall on the Florida Keys and the mainland). From the 29,000 tweets he collected, Lucas then…

Quantifying Gastronomy

A statistical analysis of 4000 recipes and their ingredients: Quantifying Gastronomy  

Summarizing our Daily News: Clustering 100.000+ Articles in Python

Andrew Thompson was interested in what 10 topics a computer would identify in our daily news. He gathered over 140.000 new articles from the archives of 10 different sources, as you can see in the figure below. In Python, Andrew converted the text of all these articles into a manageable form (tf-idf document term matrix…

Harry Plotter: Part 2 – Hogwarts Houses and their Stereotypes

Two weeks ago, I started the Harry Plotter project to celebrate the 20th anniversary of the first Harry Potter book. I could not have imagined that the first blog would be so well received. It reached over 4000 views in a matter of days thanks to the lovely people in the data science and #rstats community that were kind enough to share it…

Scraping RStudio blogs to establish how “pleased” Hadley Wickham is.

This is reposted from DavisVaughan.com with minor modifications. Introduction A while back, I saw a conversation on twitter about how Hadley uses the word “pleased” very often when introducing a new blog post (I couldn’t seem to find this tweet anymore. Can anyone help?). Out of curiosity, and to flex my R web scraping muscles a bit,…

R resources (free courses, books, tutorials, & cheat sheets)

Help yourself to these free books, tutorials, packages, cheat sheets, and many more materials for R programming. There’s a separate overview for handy R programming tricks. If you have additions, please comment below or contact me! LAST UPDATED: 2019-10-19 Table of Contents (clickable) Beginner Advanced Cheat sheets Data manipulation Data visualization Dashboards & Shiny Markdown…

Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R

It has been twenty years since the first Harry Potter novel, the sorcerer’s/philosopher’s stone, was published. To honour the series, I started a text analysis and visualization project, which my other-half wittily dubbed Harry Plotter. In several blogs, I intend to demonstrate how Hadley Wickham’s tidyverse and packages that build on its principles, such as tidytext (free book), have taken programming in R to an…

Text Mining: Shirin’s Twitter Feed

Text mining and analytics, natural language processing, and topic modelling have definitely become sort of an obsession of mine. I am just amazed by the insights one can retrieve from textual information, and with the ever increasing amounts of unstructured data on the internet, recreational analysts are coming up with the most amazing text mining…