I found this interesting blog by Guilherme Duarte Marmerola where he shows how the predictions of algorithmic models (such as gradient boosted machines, or random forests) can be calibrated by stacking a logistic regression model on top of it: by using the predicted leaves of the algorithmic model as features / inputs in a subsequent…

# Category: statistics

## A Visual Introduction to Hierarchical Models, by Michael Freeman

Hierarchical models I have covered before on this blog. These models are super relevant in practice. For instance, in HR, employee data is always nested within teams which are in turn nested within organizational units. Also in my current field of insurances, claims are always nested within policies, which can in turn be nested within…

## The Causal Inference Book: DAGS and more

Harvard (bio)statisticians Miguel Hernan and Jamie Robins just released their new book, online and accessible for free! The Causal Inference book provides a cohesive presentation of causal inference, its concepts and its methods. The book is divided in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from…

## Podcasts for Data Science Start-Ups

Christopher of Neurotroph.de compiled this short list of data science podcasts worth listening to. See Chris’ original article for more details on the podcasts, but the links below take you to them directly: Data Skeptic DataFramed Not So Standard Deviations Linear Digressions Rework

## Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups: Factorization Spectral and Statistical Fingerprints Deep Learning Graph Kernels Moreover, the overview contains links to similar collections on community detection, classification/regression trees and gradient boosting papers…

## Causal Random Forests, by Mark White

I stumbled accros this incredibly interesting read by Mark White, who discusses the (academic) theory behind, inner workings, and example (R) applications of causal random forests: EXPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TUTORIAL (By Mark White) These so-called “honest” forests seem a great technique to identify opportunities…

## 2019 Shortlist for the Royal Society Prize for Science Books

Since 1988, the Royal Society has celebrated outstanding popular science writing and authors. Each year, a panel of expert judges choose the book that they believe makes popular science writing compelling and accessible to the public. Over the decades, the Prize has celebrated some notable winners including Bill Bryson and Stephen Hawking. The author of the winning…

## ROC, AUC, precision, and recall visually explained

A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well…

## Understanding Data Distributions

Having trouble understanding how to interpret distribution plots? Or struggling with Q-Q plots? Sven Halvorson penned down a visual tutorial explaining distributions using visualisations of their quantiles. Because each slice of the distribution is 5% of the total area and the height of the graph is changing, the slices have different widths. It’s like we’re…