Overview of built-in colors in R

Most of my data visualizations I create using R programming — as you might have noticed from the content of my website. Though I am colorblind myself, I love to work with colors and color palettes in my visualizations. And I’ve come across quite some neat tricks in my time. For instance, did you it’s…

Causal Random Forests, by Mark White

I stumbled accros this incredibly interesting read by Mark White, who discusses the (academic) theory behind, inner workings, and example (R) applications of causal random forests: EXPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TUTORIAL (By Mark White) These so-called “honest” forests seem a great technique to identify opportunities…

Tidy Machine Learning with R’s purrr and tidyr

Jared Wilber posted this great walkthrough where he codes a simple R data pipeline using purrr and tidyr to train a large variety of models and methods on the same base data, all in a non-repetitive, reproducible, clean, and thus tidy fashion. Really impressive workflow!

Comparison between R dplyr and data.table code

Atrebas created this extremely helpful overview page showing how to program standard data manipulation and data transformation routines in R’s famous packages dplyr and data.table. The document has been been inspired by this stackoverflow question and by the data.table cheat sheet published by Karlijn Willems. Resources for data.table can be found on the data.table wiki, in the data.table vignettes,…

ROC, AUC, precision, and recall visually explained

A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well…

Understanding Data Distributions

Having trouble understanding how to interpret distribution plots? Or struggling with Q-Q plots? Sven Halvorson penned down a visual tutorial explaining distributions using visualisations of their quantiles. Because each slice of the distribution is 5% of the total area and the height of the graph is changing, the slices have different widths. It’s like we’re…

Putting R in Production, by Heather Nolis & Mark Sellors

It is often said that R is hard to put into production. Fortunately, there are numerous talks demonstrating the contrary. Here’s one by Heather Nolis, who productionizes R models at T-Mobile. Her teams even shares open-source version of some of their productionized Tensorflow models on github. Read more about that model here. There’s another great…

5 Quick Tips for Coding in the Classroom, by Kelly Bodwin

Kelly Bodwin is an Assistant Professor of Statistics at Cal Poly (San Luis Obispo) and teaches multiple courses in statistical programming. Based on her experiences, she compiled this great shortlist of five great tips to teach programming. Kelly truly mentions some best practices, so have a look at the original article, which she summarized as…

Recreating graphics from the Fundamentals of Data Visualization

Claus Wilke wrote the Fundamentals of Data Visualization – a great resource that’s definitely high on my list of recommended data visualization books. In a recent post, Claus shared the link to a GitHub repository where he hosts some of the R programming code with which Claus made the graphics for his dataviz book. The…