Animated Machine Learning Classifiers

Ryan Holbrook made awesome animated GIFs in R of several classifiers learning a decision rule boundary between two classes. Basically, what you see is a machine learning model in action, learning how to distinguish data of two classes, say cats and dogs, using some X and Y variables. These visuals can be great to understand…

Need to save R's lm() or glm() models? Trim the fat!

I was training a predictive model for work for use in a Shiny App. However, as the training set was quite large (700k+ obs.), the model object to save was also quite large in size (500mb). This slows down your operation significantly! Basically, all you really need are the coefficients (and a link function, in…

Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups: Factorization Spectral and Statistical Fingerprints Deep Learning Graph Kernels Moreover, the overview contains links to similar collections on community detection, classification/regression trees and gradient boosting papers…

ROC, AUC, precision, and recall visually explained

A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well…

Artificial Stupidity – by Vincent Warmerdam @PyData 2019 London

PyData is famous for it’s great talks on machine learning topics. This 2019 London edition, Vincent Warmerdam again managed to give a super inspiring presentation. This year he covers what he dubs Artificial Stupidity™. You should definitely watch the talk, which includes some great visual aids, but here are my main takeaways: Vincent speaks of…

Logistic regression is not fucked, by Jake Westfall

Recently, I came across a social science paper that had used linear probability regression. I had never heard of linear probability models (LPM), but it seems just an application of ordinary least squares regression but to a binomial dependent variable. According to some, LPM is a commonly used alternative for logistic regression, which is what…

Tensorflow for R Gallery

Tensorflow is a open-source machine learning (ML) framework. It’s primarily used to build neural networks, and thus very often used to conduct so-called deep learning through multi-layered neural nets.  Although there are other ML frameworks — such as Caffe or Torch — Tensorflow is particularly famous because it was developed by researchers of Google’s Brain…

Facial Recognition Challenge: Chad Smith & Will Ferrell

The below summarizes Part 4 of a medium.com series by Adam Geitgey. Check out the original articles: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8! Adam Geitgey likes to write about computers and machine learning. He explains machine learning as “generic algorithms that can tell you something interesting about a set of data without you having to…

Data Science, Machine Learning, & Statistics resources (free courses, books, tutorials, & cheat sheets)

Welcome to my repository of data science, machine learning, and statistics resources. Software-specific material has to a large extent been listed under their respective overviews: R Resources & Python Resources. I also host a list of SQL Resources and datasets to practice programming. If you have any additions, please comment or contact me! LAST UPDATED: 21-05-2018 Courses: Udacity: Introduction to Descriptive Statistics…