Why cancer screening is the last thing you should pick first to work on with AI

I came across this opinionated though informed commentary by Vinay Prasad on the recent Nature article where Google’s machine learning experts trained models to predict whether scans of patients’ breasts (mammogram’s) show cancerous cells or not. Vinay Prasad [official bio] is a practicing hematologist-oncologist and Associate Professor of Medicine at Oregon Health and Science University….

How to Read Scientific Papers

Cover image via wikihow.com/Read-a-Scientific-Paper Reddit is a treasure trove of random stuff. However, every now and then, in the better groups, quite valuable topics pop up. Here’s one I came across on r/statistics: Particularly the advice by grandzooby seemed worth a like, and he linked to several useful resources which I’ve summarized for you below….

How Booking.com deals with Selection Bias

I came across this PyData 2018 talk by Lucas Bernadi of Booking.com where he talks about the importance of selection bias for practical applications of machine learning. We can’t just throw data into machines and expect to see any meaning […], we need to think [about this]. I see a strong trend in the practitioners…

treevis.net – A Visual Bibliography of Tree Visualizations

Last week I cohosted a professional learning course on data visualization at JADS. My fellow host was prof. Jack van Wijk, and together we organized an amazing workshop and poster event. Jack gave two lectures on data visualization theory and resources, and mentioned among others treevis.net, a resource I was unfamiliar with up until then….

Anomaly Detection Resources

Carnegie Mellon PhD student Yue Zhao collects this great Github repository of anomaly detection resources: https://github.com/yzhao062/anomaly-detection-resources The repository consists of tools for multiple languages (R, Python, Matlab, Java) and resources in the form of: Books & Academic Papers Online Courses and Videos Outlier Datasets Algorithms and Applications Open-source and Commercial Libraries/Toolkits Key Conferences & Journals…

Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups: Factorization Spectral and Statistical Fingerprints Deep Learning Graph Kernels Moreover, the overview contains links to similar collections on community detection, classification/regression trees and gradient boosting papers…

Glossary of Statistical Terminology

Frank Harrel shared this 16-page glossary of statistical terminology created by the Department of Biostatistics of Vanderbilt University School of Medicine. The overview touches on everything from Bayes’ Theorem to p-values, explaining matters in just the right detail. Various study designs and model types are also discussed so it might just come in handy for…

Avoid bar plots for continuous data! Do this instead:

Tracey Weissgerber, Natasa Milic, Stacey Winham, and Vesna Garovic wrote this interesting 2015 paper on bar graphs. By a systematic review of physiology research, they demonstrate we need to reconsider how we present continuous data in small samples. Bar and line plots are commonly used to display continuous data. This is problematic, as many different data…

Papers with Code: State-of-the-Art

OK, this is a really great find! The website PapersWithCode.com lists all scientific publications of which the codes are open-sourced on GitHub. Moreover, you can sort these papers by the stars they accumulated on Github over the past days. The authors, @rbstojnic and @rosstaylor90, just made this in their spare time. Thank you, sirs! Papers with Code allows you to quickly…