Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups: Factorization Spectral and Statistical Fingerprints Deep Learning Graph Kernels Moreover, the overview contains links to similar collections on community detection, classification/regression trees and gradient boosting papers…

ROC, AUC, precision, and recall visually explained

A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well…

Artificial Stupidity – by Vincent Warmerdam @PyData 2019 London

PyData is famous for it’s great talks on machine learning topics. This 2019 London edition, Vincent Warmerdam again managed to give a super inspiring presentation. This year he covers what he dubs Artificial Stupidity™. You should definitely watch the talk, which includes some great visual aids, but here are my main takeaways: Vincent speaks of…

Logistic regression is not fucked, by Jake Westfall

Recently, I came across a social science paper that had used linear probability regression. I had never heard of linear probability models (LPM), but it seems just an application of ordinary least squares regression but to a binomial dependent variable. According to some, LPM is a commonly used alternative for logistic regression, which is what…

Tensorflow for R Gallery

Tensorflow is a open-source machine learning (ML) framework. It’s primarily used to build neural networks, and thus very often used to conduct so-called deep learning through multi-layered neural nets.  Although there are other ML frameworks — such as Caffe or Torch — Tensorflow is particularly famous because it was developed by researchers of Google’s Brain…

Facial Recognition Challenge: Chad Smith & Will Ferrell

The below summarizes Part 4 of a medium.com series by Adam Geitgey. Check out the original articles: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8! Adam Geitgey likes to write about computers and machine learning. He explains machine learning as “generic algorithms that can tell you something interesting about a set of data without you having to…

Data Science, Machine Learning, & Statistics resources (free courses, books, tutorials, & cheat sheets)

Welcome to my repository of data science, machine learning, and statistics resources. Software-specific material has to a large extent been listed under their respective overviews: R Resources & Python Resources. I also host a list of SQL Resources and datasets to practice programming. If you have any additions, please comment or contact me! LAST UPDATED: 21-05-2018 Courses: Udacity: Introduction to Descriptive Statistics…