Jared Wilber posted this great walkthrough where he codes a simple R data pipeline using purrr and tidyr to train a large variety of models and methods on the same base data, all in a non-repetitive, reproducible, clean, and thus tidy fashion. Really impressive workflow!
A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well … Continue reading ROC, AUC, precision, and recall visually explained
PyData is famous for it's great talks on machine learning topics. This 2019 London edition, Vincent Warmerdam again managed to give a super inspiring presentation. This year he covers what he dubs Artificial Stupidity™. You should definitely watch the talk, which includes some great visual aids, but here are my main takeaways: Vincent speaks of … Continue reading Artificial Stupidity – by Vincent Warmerdam @PyData 2019 London
MIT researchers have spent years developing the new drag-and-drop analytics tools they call Northstar. Northstar is an interactive data science platform that rethinks how people interact with data. It empowers users without programming experience, background in statistics or machine learning expertise to explore and mine data through an intuitive user interface, and effortlessly build, analyze, … Continue reading Northstar: The interactive, drag-and-drop data science platform by MIT
Survival of the Best Fit is a webgame that simulates what happens when companies automate their recruitment and selection processes. You - playing as the CEO of a starting tech company - are asked to select your favorite candidates from a line-up, based on their resumés. As your simulated company grows, the time pressure increases, … Continue reading Survival of the Best Fit: A webgame on AI in recruitment
Josh Starmer is assistant professor at the genetics department of the University of North Carolina at Chapel Hill. But more importantly: Josh is the mastermind behind StatQuest! StatQuest is a Youtube channel (and website) dedicated to explaining complex statistical concepts -- like data distributions, probability, or novel machine learning algorithms -- in simple terms. Once … Continue reading StatQuest: Statistical concepts, clearly explained