ROC, AUC, precision, and recall visually explained

A receiver operating characteristic (ROC) curve displays how well a model can classify binary outcomes. An ROC curve is generated by plotting the false positive rate of a model against its true positive rate, for each possible cutoff value. Often, the area under the curve (AUC) is calculated and used as a metric showing how well…

Privacy, Compliance, and Ethical Issues with Predictive People Analytics

November 9th 2018, I defended my dissertation on data-driven human resource management, which you can read and download via this link. On page 149, I discuss several of the issues we face when implementing machine learning and analytics within an HRM context. For the references and more detailed background information, please consult the full dissertation. More interesting reads on ethics in machine learning can be found here….

PyData, London 2018

PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The communities approach data science using many languages, including (but not limited to) Python, Julia, and R. April 2018, a PyData conference was held in London, with three days of super…

Predicting Employee Turnover at SIOP 2018

The 2018 annual Society for Industrial and Organizational Psychology (SIOP) conference featured its first-ever machine learning competition. Teams competed for several months in predicting the enployee turnover (or churn) in a large US company. A more complete introduction as presented at the conference can be found¬†here. All submissions had to be open source and the…

R resources (free courses, books, tutorials, & cheat sheets)

Help yourself to these free books, tutorials, packages, cheat sheets, and many more materials for R programming. There’s a separate overview for handy R programming tricks. If you have additions, please comment below or contact me! LAST UPDATED: 2019-10-19 Table of Contents (clickable) Beginner Advanced Cheat sheets Data manipulation Data visualization Dashboards & Shiny Markdown…

Light GBM vs. XGBOOST in Python & R

XGBOOST stands for eXtreme Gradient Boosting. A big brother of the earlier AdaBoost, XGB is a supervised learning algorithm that uses an ensemble of adaptively boosted decision trees. For those unfamiliar with adaptive boosting algorithms, here’s a 2-minute explanation video¬†and a written tutorial. Although XGBOOST often performs well in predictive tasks, the training process can…