Scraping RStudio blogs to establish how “pleased” Hadley Wickham is.

This is reposted from DavisVaughan.com with minor modifications. Introduction A while back, I saw a conversation on twitter about how Hadley uses the word “pleased” very often when introducing a new blog post (I couldn’t seem to find this tweet anymore. Can anyone help?). Out of curiosity, and to flex my R web scraping muscles a bit, … Continue reading Scraping RStudio blogs to establish how “pleased” Hadley Wickham is.

Short ggplot2 tutorial by MiniMaxir

The following was reposted from minimaxir.com   QUICK INTRODUCTION TO GGPLOT2 ggplot2 uses a more concise setup toward creating charts as opposed to the more declarative style of Python’s matplotlib and base R. And it also includes a few example datasets for practicing ggplot2 functionality; for example, the mpg dataset is a dataset of the performance of popular models of cars … Continue reading Short ggplot2 tutorial by MiniMaxir

Variance Explained: Text Mining Trump’s Twitter – Part 2

Reposted from Variance Explained with minor modifications. This post follows an earlier post on the same topic. A year ago today, I wrote up a blog post Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half. My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people, posting … Continue reading Variance Explained: Text Mining Trump’s Twitter – Part 2

Variance Explained: Text Mining Trump’s Twitter – Part 1: Trump is Angrier on Android

Reposted from Variance Explained with minor modifications. Note this post was written in 2016, a follow-up was posted in 2017. This weekend I saw a hypothesis about Donald Trump’s twitter account that simply begged to be investigated with data:  Follow Todd Vaziri  ✔@tvaziri Every non-hyperbolic tweet is from iPhone (his staff). Every hyperbolic tweet is from … Continue reading Variance Explained: Text Mining Trump’s Twitter – Part 1: Trump is Angrier on Android

Networks Among #rstats Twitterers

Reposted from Kasia Kulma's github with minor modifications. Have you ever wondered whether the most active/popular R-twitterers are virtual friends? 🙂 And by friends here I simply mean mutual followers on Twitter. In this post, I score and pick top 30 #rstats twitter users and analyse their Twitter network. You’ll see a lot of applications of rtweet and ggraph packages, as … Continue reading Networks Among #rstats Twitterers

t-SNE, the Ultimate Drum Machine and more

This blog explains t-SNE (t-Distributed Stochastic Neighbor Embedding) by a story of programmers joining forces with musicians to create the ultimate drum machine (if you are here just for the fun, you may start playing right away). Kyle McDonald, Manny Tan, and Yotam Mann experienced difficulties in pinpointing to what extent sounds are similar (ding, dong) … Continue reading t-SNE, the Ultimate Drum Machine and more

R Resources (Cheatsheets, Tutorials, & Books)

Last Updated: 18-08-2017 Over the years, I have collected many open-source R resources, and I thought it nice to share them with the public. The list below is ever growing so if you have additions, please comment below or contact me! R Basics Cheatsheet: R Base Cheatsheet Cheatsheet: R Base Functions Reference Card Cheatsheet: R Advanced … Continue reading R Resources (Cheatsheets, Tutorials, & Books)

Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R

It has been twenty years since the first Harry Potter novel, the sorcerer's/philosopher’s stone, was published. To honour the series, I decided to start a text analysis and visualization project, which my other-half wittily dubbed Harry Plotter. In several blogs, I intend to demonstrate how Hadley Wickham’s tidyverse and packages that build on its principles, such as tidytext (free book), have taken programming in R … Continue reading Harry Plotter: Celebrating the 20 year anniversary with tidytext and the tidyverse in R

Text Mining: Shirin’s Twitter Feed

Text mining and analytics, natural language processing, and topic modelling have definitely become sort of an obsession of mine. I am just amazed by the insights one can retrieve from textual information, and with the ever increasing amounts of unstructured data on the internet, recreational analysts are coming up with the most amazing text mining … Continue reading Text Mining: Shirin’s Twitter Feed

‘Wie is de Mol?’ volgens Twitter – Deel 2 (s17e2)

Dit is een repost van mijn Linked-In artikel van 17 januari 2017. Helaas heb ik er door gebrek aan tijd geen vervolg meer aan gegeven. De twitter data ben ik wel blijven scrapen, dus wie weet komt het nog... TL;DR // Samenvatting Vorige week postte ik een eerste blog (Nederlands & Engels) waarin ik Twitter gebruik om … Continue reading ‘Wie is de Mol?’ volgens Twitter – Deel 2 (s17e2)