Recommended Books on Data Visualization

Recommended Books on Data Visualization

Data visualization and the (in)effective communication of information are salient topics on this blog. I just love to read and write about best practices related to data visualization (or bad practices), or to explore novel types of complex graphs. However, I am not always online, and I am equally fond of reading about data visualization offline.

are written by some of the leading experts in the dataviz scene:

If you are also interested in programming and machine learning, have a look at this list of free programming books.

Generating Book Covers By Their Words — My Dissertation Cover

Generating Book Covers By Their Words — My Dissertation Cover

As some of you might know, I am defending my PhD dissertation later this year. It’s titled “Data-Driven Human Resource Management: The rise of people analytics and its application to expatriate management” and, over the past few months, I was tasked with designing its cover.

Now, I didn’t want to buy some random stock photo depicting data, an organization, or overly happy employees. I’d rather build something myself. Something reflecting what I liked about the dissertation project: statistical programming and sharing and creating knowledge with others.

Hence, I came up with the idea to use the collective intelligence of the People Analytics community to generate a unique cover. It required a dataset of people analytics-related concepts, which I asked People Analytics professionals on LinkedIn, Twitter, and other channels to help compile. Via a Google Form, colleagues, connections, acquitances, and complete strangers contributed hundreds of keywords ranging from the standard (employees, HRM, performance) to the surprising (monetization, quantitative scissors [which I had to Google]). After reviewing the list and adding some concepts of my own creation, I ended up with 1786 unique words related to either business, HRM, expatriation, data science, or statistics.

I very much dislike wordclouds (these are kind of cool though), but already had a different idea in mind. I thought of generating a background cover of the words relating to my dissertation topic, over which I could then place my title and other information. I wanted to place these keywords randomly, maybe using a color schema, or with some random sizes.

The picture below shows the result of one of my first attempts. I programmed everything in R, writing some custom functionality to generate the word-datasets, the cover-plot, and .png, .pdf, and .gif files as output.


Random colors did not produce a pleasing result and I definitely needed more and larger words in order to fill my 17cm by 24cm canvas!

Hence, I started experimenting. Using base R’s expand.grid() and set.seed() together with mapply(), I could quickly explore and generate a large amount of covers based on different parameter settings and random fluctuations.

expand.grid(seed = c(1:3), 
            dupl = c(1:4, seq(5, 30, 5)),
            font = c("sans", "League Spartan"),
            colors = c(blue_scheme, red_scheme, 
                       rainbow_scheme, random_scheme),
            size_mult = seq(1, 3, 0.3),
            angle_sd = c(5, 10, 12, 15)) -> 

       param$seed, param$dupl, 
       param$font, param$colors, 
       param$size_mult, param$angle_sd)

The generation process for each unique cover only took a few seconds, so I would generate a few hundred, quickly browse through them, update the parameters to match my preferences, and then generate a new set. Among others, I varied the color palette used, the size range of the words, their angle, the font used, et cetera. To fill up the canvas, I experimented with repeating the words: two, three, five, heck, even twenty, thirty times. After an evening of generating and rating, I came to the final settings for my cover:

  • Words were repeated twenty times in the dataset.
  • Words were randomly distributed across the canvas.
  • Words placed in random order onto the canvas, except for a select set of relevant words, placed last.
  • Words’ transparency ranged randomly between 0% and 70%.
  • Words’ color was randomly selected out of six colors from this palette of blues.
  • Words’ writing angles were normally distributed around 0 degrees, with a standard deviation of 12 degrees. However, 25% of words were explicitly without angle.
  • Words’ size ranged between 1 and 4 based on a negative binomial distribution (10 * 0.8) resulting in more small than large words. The set of relevant words were explicitly enlarged throughout.

With League Spartan (#thisisparta) loaded as a beautiful custom font, this was the final cover background which I and my significant other liked most:

cover_wordcloud_20-League Spartan-4.png

While I still need to decide on the final details regarding title placement and other details, I suspect that the final cover will look something like below — the white stripe in the middle depicting the book’s back.


Now, for the finale, I wanted to visualize the generation process via a GIF. Thomas Lin Pedersen developed this great gganimate package, which builds on the older animation package. The package greatly simplifies creating your own GIFs, as I already discussed in this earlier blog about animated GIFs in R. Anywhere, here is the generation process, where each frame includes the first frame ^ 3.2 words:

cover_wordcloud_20-League Spartan_4.gif

If you are interested in the process, or the R code I’ve written, feel free to reach out!

I’m sharing a digital version of the dissertation online sometime around the defense date: November 9th, 2018. If you’d like a copy, you can still leave your e-mailadress in the Google Form here and I’ll make sure you’ll receive your copy in time!

(Time Series) Forecasting: Principles & Practice (in R)

(Time Series) Forecasting: Principles & Practice (in R)

I stumbled across this open access book by Rob Hyndman, the god of time series, and George Athanasopoulos, a colleague statistician / econometrician at Monash University in Melbourne Australia.

Hyndman and Athanasopoulos provide a comprehensive introduction to forecasting methods, accessible and relevant among others for business professionals without any formal training in the area. All R examples in the book assume work build on the fpp2 R package. fpp2 includes all datasets referred to in the book and depends on other R packages including forecast and ggplot2.

Some examples of the analyses you can expect to recreate, ignore the agricultural topic for now ; )

Monthly milk production per cow.
One of the example analysis you will recreate by following the book (Figure 3.3)

Forecasts of egg prices using a random walk with drift applied to the logged data.
You will be forecasting price data using different analyses and adjustments (Figure 3.4)

I highly recommend this book to any professionals or students looking to learn more about forecasting and time series modelling. There is also a DataCamp course based on this book. If you got value out of this free book, be sure to buy a hardcopy as well.

HR Analytics: Een 7e zintuig voor de moderne HR-professional

HR Analytics: Een 7e zintuig voor de moderne HR-professional

Wat gebeurt er in Nederland op het gebied van HR Analytics? Dit nieuwe boek laat zien wat enkele Nederlandse organisaties de afgelopen jaren daadwerkelijk hebben ondernomen. De verschillende auteurs, waaronder ik mij mag scharen, geven een kijkje in de praktijkwereld van het onderbouwen van HR-beslissingen aan de hand van diverse databronnen en analysetechnieken. Ze verklaren daarmee HR Analytics niet heilig, maar wie als HR- professional waarde wil toevoegen aan de business, kan er veel aan hebben. Het credo is dan: weet wat je moet doen, wees alert op de valkuilen en beschouw HR Analytics als een zevende zintuig naast je andere zintuigen. Met dit extra zintuig kun je als HR- professional scherper waarnemen wat het echte HR-probleem is, en wat mogelijk de oplossing is.

Het boek ‘HR Analytics’ is voor de moderne HR-professional die nieuwsgierig is naar wat analytics kan bij dragen aan zijn of haar professionaliteit. De voorbeelden en verhalen uit de praktijk leveren verschillende leerpunten en inzichten die helpen bij een meer analytische benadering van de diverse HR beleidsthema’s rondom recruitment, loopbanen, arbeidsvoorwaarden, training en opleiding of engagement. Het is een duwtje in de rug op weg naar HR Analytics als een toevoeging aan het HR-vak. Niet als vervanging.

Wiemer Renkema, recensist op managementboeken.nl, heeft het boek inmiddels gelezen en vat de inhoud mooi samen:

In de tien hoofdstukken van het boek komen de belangrijkste HR analytics voorbij, zoals die voor recruitment, carrièreontwikkeling, medewerkerstevredenheid en beloning. De lezer kan zelf de relevantie van ieder onderwerp bepalen en gericht de informatie zoeken die voor hem van belang is. Bij ieder onderwerp gaan de schrijvers in op alle kernvragen, wat het boek een overzichtelijke en makkelijk leesbare structuur geeft.


Je hebt geen lange adem nodig om HR analytics. Een 7e zintuig voor de moderne HR-professional te lezen. Wat een praktisch, compleet en goed geschreven boek is dit!

Wiemer Renkema, recensist [link]

Hier kun je een deel van het introductiehoofdstuk inzien om te kijken of het boek iets voor jou is.

Machine Learning & Deep Learning book

Machine Learning & Deep Learning book

The Deep Learning textbook helps students and practitioners enter the field of machine learning in general and deep learning in particular. Its online version is available online for free whereas a hardcover copy can be ordered here on Amazon. You can click on the topics below to be redirected to the book chapter:

Part I: Applied Math and Machine Learning Basics

Part II: Modern Practical Deep Networks

Part III: Deep Learning Research


Python resources (free courses, books, & cheat sheets)

Python resources (free courses, books, & cheat sheets)

Find more comprehensive Python repositories:
Vinta’s awesome Python Github repository, the easy Python docs, the Python Wiki Beginners Guide, or CourseDuck’s overview of free Python courses!

My list of Python resources is still quite short so if you have additions, please comment below or contact me! There are separate overviews for Data Science, Machine Learning, & Statistics resources in general, and for R resources and SQL resources in specific.

LAST UPDATED: 11-11-2018

Cheat sheets: