Google's Dataset Search: Direct access to 25 million interesting datasets

I used to keep a repository of links to interesting datasets to learn data science. However, that page I can retire, as Google has launched its new service Dataset Search. The “world wide web” hosts millions of datasets, on nearly any topic you can think of. Google’s Dataset Search has indexed almost 25 million of these…

Dynamic Programming MIT Course

Cover image by xkcd Over the last months I’ve been working my way through Project Euler in my spare time. I wanted to learn Python programming, and what better way than solving mini-problems and -projects?! Well, Project Euler got a ton of these, listed in increasing order of difficulty. It starts out simple: to solve…

Google Fonts: 915 free font families

Looking for a custom typeface to use in your data visualizations? Google Fonts is an awesome databank of nearly a thousands font families you can access, download, and use for free. If you’re into design, the website includes a blog featuring articles on font design. Google Fonts among others provided the font for my dissertation…

Circular Map Cutouts in R

Katie Jolly wanted to surprise a friend with a nice geeky gift: a custom-made map cutout. Using R and some visual finetuning in Inkscape, she was able to made the below. A detailed write-up of how Katie got to this product is posted here. Basically, the R’s tigris package included all data on roads, and…

GoalKicker: Free Programming Books

This specific link has been on my to-do list for so-long now that I’ve decided to just share it without any further ado. The people behind GoalKicker, for whatever reason, decided to compile nearly 100 books on different programming languages based on among others StackOverflow entries. Their open access library contains books on languages from…

Papers with Code: State-of-the-Art

OK, this is a really great find! The website PapersWithCode.com lists all scientific publications of which the codes are open-sourced on GitHub. Moreover, you can sort these papers by the stars they accumulated on Github over the past days. The authors, @rbstojnic and @rosstaylor90, just made this in their spare time. Thank you, sirs! Papers with Code allows you to quickly…

Tidy Missing Data Handling

A recent open access paper by Nicholas Tierney and Dianne Cook — professors at Monash University — deals with simpler handling, exploring, and imputation of missing values in data.They present new methodology building upon tidy data principles, with a goal to integrating missing value handling as an integral part of data analysis workflows. New data structures are…

10 Simple Rules for Better Data Visualizations

Nicolas Rougier, Michael Droettboom, Philip Bourne wrote an open access article for the Public Library of Open Science (PLOS) in 2014, proposing ten simple rules for better figures. Below I posted these 10 rules and quote several main sentences extracted from the original article. Rule 1: Know Your Audience It is important to identify, as early…