Generalized Additive Models Tutorial in R, by Noam Ross

Generalized Additive Models — or GAMs in short — have been somewhat of a mystery to me. I’ve known about them, but didn’t know exactly what they did, or when they’re useful. That came to an end when I found out about this tutorial by Noam Ross. In this beautiful, online, interactive course, Noam allows…

GIF visualizations of Type 1 and Type 2 error in relation to sample size

On twitter, I came across the tweet below showing some great GIF visualizations on the dangers of taking small samples. Created by Andrew Stewart, and tweeted by John Holbein, the visuals show samples taken from a normal distributed variable with a mean of 10 and a standard deviation of 2. In the left section, Andrew…

Creating plots with custom icons for data points

Data visualizations that make smart use of icons have a way of conveying information that sticks. Dataviz professionals like Moritz Stefaner know this and use the practice in their daily work. A recent #tidytuesday entry by Georgios Karamanis demonstrates how easy it is to integrate visual icons in your data figures when you write code…

Factors in R: forcats to help

Hugo Toscano wrote a great blog providing an overview of all the helpful functionalities of the R forcats package. The package includes functions that help handling categorical data, by setting a random or fixed order, and by recategorizing or anonymizing. These functions are specifically helpful when visualizing data with R’s ggplot2. A comprehensive overview is…

Pimp my RMD: Tips for R Markdown – by Yan Holtz

R markdown¬†creates interactive reports from¬†R code, including interactive reports, documents, dashboards, presentations, and even books. Have a look at this awesome gallery of R markdown examples. Yan Holtz recently created a neat little overview of handy R Markdown tips and tricks that improve the appearance of output documents. He dubbed this overview Pimp my RMD….

Propensity Score Matching Explained Visually

Propensity score matching (wiki) is a statistical matching technique that attempts to estimate the effect of a treatment (e.g., intervention) by accounting for the factors that predict whether an individual would be eligble for receiving the treatment. The wikipedia page provides a good example setting: Say we are interested in the effects of smoking on…

Learn from the Pros: How media companies visualize data

Past months, multiple companies shared their approaches to data visualization and their lessons learned. Click the companies in the list below to jump to their respective section The Financial Times The Britisch Broadcast Corporation The Economist FiveThirtyEight Financial Times The Financial Times (FT) released a searchable database of the many data visualizations they produced over…

StatQuest: Statistical concepts, clearly explained

Josh Starmer is assistant professor at the genetics department of the University of North Carolina at Chapel Hill. But more importantly: Josh is the mastermind behind StatQuest! StatQuest is a Youtube channel (and website) dedicated to explaining complex statistical concepts — like data distributions, probability, or novel machine learning algorithms — in simple terms. Once…

Free Programming Books (I still need to read)

There are multiple unread e-mails in my inbox. Links to books. Just sitting there. Waiting to be opened, read. For months already. The sender, you ask? Me. Paul van der Laken. A nuisance that guy, I tell you. He keeps sending me reminders, of stuff to do, books to read. Books he’s sure a more…