Podcasts for Data Science Start-Ups

Christopher of Neurotroph.de compiled this short list of data science podcasts worth listening to. See Chris’ original article for more details on the podcasts, but the links below take you to them directly: Data Skeptic DataFramed Not So Standard Deviations Linear Digressions  Rework

Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups: Factorization Spectral and Statistical Fingerprints Deep Learning Graph Kernels Moreover, the overview contains links to similar collections on community detection, classification/regression trees and gradient boosting papers…

Causal Random Forests, by Mark White

I stumbled accros this incredibly interesting read by Mark White, who discusses the (academic) theory behind, inner workings, and example (R) applications of causal random forests: EXPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TUTORIAL (By Mark White) These so-called “honest” forests seem a great technique to identify opportunities…

2019 Shortlist for the Royal Society Prize for Science Books

Since 1988, the Royal Society has celebrated outstanding popular science writing and authors. Each year, a panel of expert judges choose the book that they believe makes popular science writing compelling and accessible to the public. Over the decades, the Prize has celebrated some notable winners including Bill Bryson and Stephen Hawking. The author of the winning…

ROC, AUC, precision, and recall visually explained

A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well…

Understanding Data Distributions

Having trouble understanding how to interpret distribution plots? Or struggling with Q-Q plots? Sven Halvorson penned down a visual tutorial explaining distributions using visualisations of their quantiles. Because each slice of the distribution is 5% of the total area and the height of the graph is changing, the slices have different widths. It’s like we’re…

Two Tinder Experiments: An Unequal Economy

I’ve seen a fair share of Tinder experiments come by, for instance, someone A/B-testing attractiveness with and without facial hair, but these new two posts on Medium are the best I’ve come across so far. In his first experiment, this self-proclaimed worst online dater went catfishing. He made a Tinder account using stock photos of…

E-Book: Probabilistic Programming & Bayesian Methods for Hackers

The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. Nevertheless, mathematical analysis is only one way to “think Bayes”. With cheap computing power, we can now afford to take an alternate route via probabilistic programming. Cam Davidson-Pilon wrote the book Bayesian Methods for…

Northstar: The interactive, drag-and-drop data science platform by MIT

MIT researchers have spent years developing the new drag-and-drop analytics tools they call Northstar. Northstar is an interactive data science platform that rethinks how people interact with data. It empowers users without programming experience, background in statistics or machine learning expertise to explore and mine data through an intuitive user interface, and effortlessly build, analyze,…