Simulate Datasets with DrawData.xyz

Vincent Warmerdam shared his new tool to quickly simulate artificial datasets: http://www.drawdata.xyz. The drawdata.xyz tool allows you to easily create your own line- and scatter-plot with different groups of datapoints following specific x-y patterns. After drawing your data, you can just click to export your new dataset to csv or json format. x y 106.04…

Comparison between R dplyr and data.table code

Atrebas created this extremely helpful overview page showing how to program standard data manipulation and data transformation routines in R’s famous packages dplyr and data.table. The document has been been inspired by this stackoverflow question and by the data.table cheat sheet published by Karlijn Willems. Resources for data.table can be found on the data.table wiki, in the data.table vignettes,…

Understanding Data Distributions

Having trouble understanding how to interpret distribution plots? Or struggling with Q-Q plots? Sven Halvorson penned down a visual tutorial explaining distributions using visualisations of their quantiles. Because each slice of the distribution is 5% of the total area and the height of the graph is changing, the slices have different widths. It’s like we’re…

What Every Programmer Needs To Know About Encodings

Kunststube wrote this great introduction to text encoding. Ever wondered why your Word document sometimes starts with ÉGÉìÉRÅ[ÉfÉBÉìÉOÇÕìÔǵÇ≠ǻǢ? Well, encoding‘s why. Kunststube introduces you to the wonderful world of ASCII, WLatin, Mac Latin, and UTF-8, -16 and -32. Read the original articla via http://kunststube.net/encoding/

People Analytics: Is nudging goed werkgeverschap of onethisch?

In Dutch only: Voor Privacyweb schreef ik onlangs over people analytics en het mogelijk resulterende nudgen van medewerkers: kleine aanpassingen of duwtjes die mensen in de goede richting zouden moeten sturen. Medewerkers verleiden tot goed gedrag, als het ware. Maar wie bepaalt dan wat goed is, en wanneer zouden werkgevers wel of niet mogen of…

Mathematical aRt

Marcus Volz is a research fellow at the University of Melbourne, studying geometric networks, optimisation and computational geometry. He’s interested in visualisation, and always looking for opportunities to represent complex information in novel ways to accelerate learning and uncover the unexpected. One of Marcus’ hobbies is the visualization of mathematical patterns and statistical algorithms via R….

Privacy, Compliance, and Ethical Issues with Predictive People Analytics

November 9th 2018, I defended my dissertation on data-driven human resource management, which you can read and download via this link. On page 149, I discuss several of the issues we face when implementing machine learning and analytics within an HRM context. For the references and more detailed background information, please consult the full dissertation. More interesting reads on ethics in machine learning can be found here….

Tidy Missing Data Handling

A recent open access paper by Nicholas Tierney and Dianne Cook — professors at Monash University — deals with simpler handling, exploring, and imputation of missing values in data.They present new methodology building upon tidy data principles, with a goal to integrating missing value handling as an integral part of data analysis workflows. New data structures are…

10 Simple Rules for Better Data Visualizations

Nicolas Rougier, Michael Droettboom, Philip Bourne wrote an open access article for the Public Library of Open Science (PLOS) in 2014, proposing ten simple rules for better figures. Below I posted these 10 rules and quote several main sentences extracted from the original article. Rule 1: Know Your Audience It is important to identify, as early…