Thank you ggplot2tutor for solving one of my struggles. Apparently this is all it takes: ggplot(NULL, aes(x = c(-3, 3))) + stat_function(fun = dnorm, geom = "line") I can't begin to count how often I have wanted to visualize a (normal) distribution in a plot. For instance to show how my sample differs from expectations, … Continue reading Visualizing Sampling Distributions in ggplot2: Adding area under the curve

Tag: statistics

# Calibrating algorithmic predictions with logistic regression

I found this interesting blog by Guilherme Duarte Marmerola where he shows how the predictions of algorithmic models (such as gradient boosted machines, or random forests) can be calibrated by stacking a logistic regression model on top of it: by using the predicted leaves of the algorithmic model as features / inputs in a subsequent … Continue reading Calibrating algorithmic predictions with logistic regression

# The Causal Inference Book: DAGS and more

Harvard (bio)statisticians Miguel Hernan and Jamie Robins just released their new book, online and accessible for free! The Causal Inference book provides a cohesive presentation of causal inference, its concepts and its methods. The book is divided in 3 parts of increasing difficulty: causal inference without models, causal inference with models, and causal inference from … Continue reading The Causal Inference Book: DAGS and more

# Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups: FactorizationSpectral and Statistical FingerprintsDeep LearningGraph Kernels Moreover, the overview contains links to similar collections on community detection, classification/regression trees and gradient boosting papers with implementations. As … Continue reading Overviews of Graph Classification and Network Clustering methods

# 2019 Shortlist for the Royal Society Prize for Science Books

Since 1988, the Royal Society has celebrated outstanding popular science writing and authors. Each year, a panel of expert judges choose the book that they believe makes popular science writing compelling and accessible to the public. Over the decades, the Prize has celebrated some notable winners including Bill Bryson and Stephen Hawking. The author of the winning … Continue reading 2019 Shortlist for the Royal Society Prize for Science Books

# ROC, AUC, precision, and recall visually explained

A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well … Continue reading ROC, AUC, precision, and recall visually explained