Generalized Additive Models Tutorial in R, by Noam Ross

Generalized Additive Models — or GAMs in short — have been somewhat of a mystery to me. I’ve known about them, but didn’t know exactly what they did, or when they’re useful. That came to an end when I found out about this tutorial by Noam Ross. In this beautiful, online, interactive course, Noam allows…

Logistic regression is not fucked, by Jake Westfall

Recently, I came across a social science paper that had used linear probability regression. I had never heard of linear probability models (LPM), but it seems just an application of ordinary least squares regression but to a binomial dependent variable. According to some, LPM is a commonly used alternative for logistic regression, which is what…

GIF visualizations of Type 1 and Type 2 error in relation to sample size

On twitter, I came across the tweet below showing some great GIF visualizations on the dangers of taking small samples. Created by Andrew Stewart, and tweeted by John Holbein, the visuals show samples taken from a normal distributed variable with a mean of 10 and a standard deviation of 2. In the left section, Andrew…

Propensity Score Matching Explained Visually

Propensity score matching (wiki) is a statistical matching technique that attempts to estimate the effect of a treatment (e.g., intervention) by accounting for the factors that predict whether an individual would be eligble for receiving the treatment. The wikipedia page provides a good example setting: Say we are interested in the effects of smoking on…

How to find two identical Skittles packs?

In a hilarious experiment the anonymous mathematician behind the website Possibly Wrong estimated that s/he only needed to open “about 400-500” packs of Skittles to find an identifical pack. From January 12th up to April 6th, s/he put it to the test and counted the contents of an astonishing 468 packs, containing over 27.000 individual…

Animating causal inference methods

Some time back the animations below went sort of viral in the statistical programming community. In them, economics professor Nick Huntington-Klein demonstrates step-by-step how statistical tests estimate effect sizes. You will find several other animations in Nick’s original blog, and the associatedtwitter thread. Moreover, if you are interested in the R code to generate these…

StatQuest: Statistical concepts, clearly explained

Josh Starmer is assistant professor at the genetics department of the University of North Carolina at Chapel Hill. But more importantly: Josh is the mastermind behind StatQuest! StatQuest is a Youtube channel (and website) dedicated to explaining complex statistical concepts — like data distributions, probability, or novel machine learning algorithms — in simple terms. Once…

Two years of paulvanderlaken.com

Yesterday was the second anniversary of my website. I also reflected on this moment last year, and I thought to continue the tradition in 2019. Let me start with a great, big THANK YOUto all my readers for continuing to visit my website! You are the reason I continue to write down what I read….

#100DaysOfCode: Machine Learning & Data Visualization

2018 seemed to be the year of challenges going viral on the web. Most of them were plain stupid and/or dangerous. However, one viral challenge I did like: #100DaysOfCode 1. Code minimum an hour every day for the next 100 days. 2. Tweet your progress every day with the #100DaysOfCode hashtag. 3. Each day, reach…