Sean Owen created this handy cheat sheet that shows the most common probability distributions mapped by their underlying relationships. Probability distributions are fundamental to statistics, just like data structures are to computer science. They’re the place to start studying if you mean to talk like a data scientist. Sean Owen (via) Owen argues that the…

# Tag: statistics

## Simulating data with Bayesian networks, by Daniel Oehm

Daniel Oehm wrote this interesting blog about how to simulate realistic data using a Bayesian network. Bayesian networks are a type of probabilistic graphical model that uses Bayesian inference for probability computations. Bayesian networks aim to model conditional dependence, and therefore causation, by representing conditional dependence by edges in a directed graph. Through these relationships, one…

## Learn Julia for Data Science

Most data scientists favor Python as a programming language these days. However, there’s also still a large group of data scientists coming from a statistics, econometrics, or social science and therefore favoring R, the programming language they learned in university. Now there’s a new kid on the block: Julia. Advantages & Disadvantages According to some,…

## Bayes theorem, and making probability intuitive – by 3Blue1Brown

This video I’ve been meaning to watch for a while now. It another great visual explanation of a statistics topic by the 3Blue1Brown Youtube channel (which I’ve covered before, multiple times). This time, it’s all about Bayes theorem, and I just love how Grant Sanderson explains the concept so visually. He argues that rather then…

## Animated Machine Learning Classifiers

Ryan Holbrook made awesome animated GIFs in R of several classifiers learning a decision rule boundary between two classes. Basically, what you see is a machine learning model in action, learning how to distinguish data of two classes, say cats and dogs, using some X and Y variables. These visuals can be great to understand…

## Python Web Scraping: WordPress Visitor Statistics

I’ve had this WordPress domain for several years now, and in the beginning it was very convenient. WordPress enabled me to set up a fully functional blog in a matter of hours. Everything from HTML markup, external content embedding, databases, and simple analytics was already conveniently set up. However, after a while, I wanted to…

## Visualizing Sampling Distributions in ggplot2: Adding area under the curve

Thank you ggplot2tutor for solving one of my struggles. Apparently this is all it takes: I can’t begin to count how often I have wanted to visualize a (normal) distribution in a plot. For instance to show how my sample differs from expectations, or to highlight the skewness of the scores on a particular variable….

## Calibrating algorithmic predictions with logistic regression

I found this interesting blog by Guilherme Duarte Marmerola where he shows how the predictions of algorithmic models (such as gradient boosted machines, or random forests) can be calibrated by stacking a logistic regression model on top of it: by using the predicted leaves of the algorithmic model as features / inputs in a subsequent…

## The Causal Inference Book: DAGS and more

Harvard (bio)statisticians Miguel Hernan and Jamie Robins just released their new book, online and accessible for free! The Causal Inference book provides a cohesive presentation of causal inference, its concepts and its methods. The book is divided in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from…