Scraping RStudio blogs to establish how “pleased” Hadley Wickham is.

This is reposted from DavisVaughan.com with minor modifications. Introduction A while back, I saw a conversation on twitter about how Hadley uses the word “pleased” very often when introducing a new blog post (I couldn’t seem to find this tweet anymore. Can anyone help?). Out of curiosity, and to flex my R web scraping muscles a bit, … Continue reading Scraping RStudio blogs to establish how “pleased” Hadley Wickham is.

Variance Explained: Text Mining Trump’s Twitter – Part 2

Reposted from Variance Explained with minor modifications. This post follows an earlier post on the same topic. A year ago today, I wrote up a blog post Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half. My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people, posting … Continue reading Variance Explained: Text Mining Trump’s Twitter – Part 2

Variance Explained: Text Mining Trump’s Twitter – Part 1: Trump is Angrier on Android

Reposted from Variance Explained with minor modifications. Note this post was written in 2016, a follow-up was posted in 2017. This weekend I saw a hypothesis about Donald Trump’s twitter account that simply begged to be investigated with data:  Follow Todd Vaziri  ✔@tvaziri Every non-hyperbolic tweet is from iPhone (his staff). Every hyperbolic tweet is from … Continue reading Variance Explained: Text Mining Trump’s Twitter – Part 1: Trump is Angrier on Android

Leaving town at rush hour? Here’s how far you’re likely to get from America’s largest cities.

The Washinton Post is known for the lovely visualizations accompanying their stories. In a recent post, they visualized how long it would take you to get out of the downtown areas of various cities. They compared all the major U.S. cities and examined different leaving times. Unfortunately, I cannot copy the visualizations' text here, but … Continue reading Leaving town at rush hour? Here’s how far you’re likely to get from America’s largest cities.

Networks Among #rstats Twitterers

Reposted from Kasia Kulma's github with minor modifications. Have you ever wondered whether the most active/popular R-twitterers are virtual friends? 🙂 And by friends here I simply mean mutual followers on Twitter. In this post, I score and pick top 30 #rstats twitter users and analyse their Twitter network. You’ll see a lot of applications of rtweet and ggraph packages, as … Continue reading Networks Among #rstats Twitterers

t-SNE, the Ultimate Drum Machine and more

This blog explains t-SNE (t-Distributed Stochastic Neighbor Embedding) by a story of programmers joining forces with musicians to create the ultimate drum machine (if you are here just for the fun, you may start playing right away). Kyle McDonald, Manny Tan, and Yotam Mann experienced difficulties in pinpointing to what extent sounds are similar (ding, dong) … Continue reading t-SNE, the Ultimate Drum Machine and more

Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R

It has been twenty years since the first Harry Potter novel, the sorcerer's/philosopher’s stone, was published. To honour the series, I decided to start a text analysis and visualization project, which my other-half wittily dubbed Harry Plotter. In several blogs, I intend to demonstrate how Hadley Wickham’s tidyverse and packages that build on its principles, such as tidytext (free book), have taken programming in R … Continue reading Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R

Geographical maps using Shazam Recognitions

Shazam is a mobile app that can be asked to identify a song by making it "listen"’ to a piece of music. Due to its immense popularity, the organization's name quickly turned into a verb used in regular conversation ("Do you know this song? Let's Shazam it."). A successful identification is referred to as a Shazam recognition. Shazam users can opt-in … Continue reading Geographical maps using Shazam Recognitions

Digitizing the Tour de France 2017 – II

A few weeks back, I gave some examples of how data, predictive analytics, and visualization are changing the Tour de France experience. Today, I came across another wonderful example visualizing the sequences of geospatial data (i.e., the movement) of the cyclists during the 11th stage of the Tour de France  (blue dots). Moreover, the locations of … Continue reading Digitizing the Tour de France 2017 – II

Text Mining: Shirin’s Twitter Feed

Text mining and analytics, natural language processing, and topic modelling have definitely become sort of an obsession of mine. I am just amazed by the insights one can retrieve from textual information, and with the ever increasing amounts of unstructured data on the internet, recreational analysts are coming up with the most amazing text mining … Continue reading Text Mining: Shirin’s Twitter Feed