Animated Machine Learning Classifiers

Ryan Holbrook made awesome animated GIFs in R of several classifiers learning a decision rule boundary between two classes. Basically, what you see is a machine learning model in action, learning how to distinguish data of two classes, say cats and dogs, using some X and Y variables. These visuals can be great to understand…

Calibrating algorithmic predictions with logistic regression

I found this interesting blog by Guilherme Duarte Marmerola where he shows how the predictions of algorithmic models (such as gradient boosted machines, or random forests) can be calibrated by stacking a logistic regression model on top of it: by using the predicted leaves of the algorithmic model as features / inputs in a subsequent…

17 Principles of (Unix) Software Design

I came across this 1999-2003 e-book by Eric Raymond, on the Art of Unix Programming. It contains several relevant overviews of the basic principles behind the Unix philosophy, which are probably useful for anybody working in hardware, software, or other algoritmic design. First up, is a great list of 17 design rules, explained in more…

Helpful resources for A/B testing

Brandon Rohrer — (former) data scientist at Microsoft, iRobot, and Facebook — asked his network on Twitter and LinkedIn to share their favorite resources on A/B testing. It produced a nice list, which I summarized below. The order is somewhat arbitrary, and somewhat based on my personal appreciation of the resources. Course: A/B-testing by Google…

Beating Battleships with Algorithms and AI

Past days, I discovered this series of blogs on how to win the classic game of Battleships (gameplay explanation) using different algorithmic approaches. I thought they might amuse you as well : ) The story starts with this 2012 Datagenetics blog where Nick Berry constrasts four algorithms’ performance in the game of Battleships. The resulting levels…

Visualizing the inner workings of the k-means clustering algorithm

Originally, I wrote this blog to share this interactive visualization of the k-means algorithm (wiki) which I was all enthusiastic about. However, then I imagined that not everybody may be familiar with k-means, hence, I wrote the whole blog below.  Next thing I know, u/dashee87 on r/datascience points me to these two other blogs that had already…

Neural Networks play Super Mario Bros & Mario Kart

Seth Bling calls himself a video game designer, a hacker and an engineer. You might know him from MarI/O: his neural network that got extremely good to at playing Super Mario Bros. The video below shows the genetic approach Seth used to train this neural network. Seth randomly generated a starting population of neural networks where the…

Data Science, Machine Learning, & Statistics resources (free courses, books, tutorials, & cheat sheets)

Welcome to my repository of data science, machine learning, and statistics resources. Software-specific material has to a large extent been listed under their respective overviews: R Resources & Python Resources. I also host a list of SQL Resources and datasets to practice programming. If you have any additions, please comment or contact me! LAST UPDATED: 21-05-2018 Courses: Udacity: Introduction to Descriptive Statistics…

Must read: Computer Age Statistical Inference (Efron & Hastie, 2016)

Statistics, and statistical inference in specific, are becoming an ever greater part of our daily lives. Models are trying to estimate anything from (future) consumer behaviour to optimal steering behaviours and we need these models to be as accurate as possible. Trevor Hastie is a great contributor to the development of the field, and I…