Need to save R's lm() or glm() models? Trim the fat!

I was training a predictive model for work for use in a Shiny App. However, as the training set was quite large (700k+ obs.), the model object to save was also quite large in size (500mb). This slows down your operation significantly! Basically, all you really need are the coefficients (and a link function, in…

Logistic regression is not fucked, by Jake Westfall

Recently, I came across a social science paper that had used linear probability regression. I had never heard of linear probability models (LPM), but it seems just an application of ordinary least squares regression but to a binomial dependent variable. According to some, LPM is a commonly used alternative for logistic regression, which is what…

StatQuest: Statistical concepts, clearly explained

Josh Starmer is assistant professor at the genetics department of the University of North Carolina at Chapel Hill. But more importantly: Josh is the mastermind behind StatQuest! StatQuest is a Youtube channel (and website) dedicated to explaining complex statistical concepts — like data distributions, probability, or novel machine learning algorithms — in simple terms. Once…

Simpson’s Paradox: Two HR examples with R code.

Simpson (1951) demonstrated that a statistical relationship observed within a population—i.e., a group of individuals—could be reversed within all subgroups that make up that population. This phenomenon, where X seems to relate to Y in a certain way, but flips direction when the population is split for W, has since been referred to as Simpson’s…

Must read: Computer Age Statistical Inference (Efron & Hastie, 2016)

Statistics, and statistical inference in specific, are becoming an ever greater part of our daily lives. Models are trying to estimate anything from (future) consumer behaviour to optimal steering behaviours and we need these models to be as accurate as possible. Trevor Hastie is a great contributor to the development of the field, and I…