Category: programming

Creating plots with custom icons for data points

Data visualizations that make smart use of icons have a way of conveying information that sticks. Dataviz professionals like Moritz Stefaner know this and use the practice in their daily work.

A recent #tidytuesday entry by Georgios Karamanis demonstrates how easy it is to integrate visual icons in your data figures when you write code in R. You can simply store the URL location of an icon as a data column, and map it to an aesthetic using the ggplot2::geom_image function.

Via https://github.com/gkaramanis/tidytuesday/tree/master/week-21

Do have a closer look at Georgios’ github repository for week 21 of tidytuesday. You will probably have to alter the code a bit to get it to work. though!

For those who haven’t moved away from base R plotting functions yet, here’s a good StackOverflow item showing how to use icons in both base R and tidyverse.

Now that I think of it, the above probably uses the same methods that were used to make this amazing Game of Thrones map in R.

Free Python Tutorials & Courses, by CourseDuck

CourseDuck founder Michael Kuhlman was nice enough to point me to their overview of curated (&free!) Python courses and tutorials.

This overview is curated in the sense that all resources are rated by CourseDuck’s users. These ratings seem quite reliable, at least, I personally enjoyed their top-3 resources sometime the past years:

Note that all these courses, as well as the curated overview, come free of charge! A great resource for starting data scientists or upcoming pythonistas!

Via https://courseduck.com/programming/python/?k=1&m=c

Factors in R: forcats to help

Hugo Toscano wrote a great blog providing an overview of all the helpful functionalities of the R forcats package. The package includes functions that help handling categorical data, by setting a random or fixed order, and by recategorizing or anonymizing. These functions are specifically helpful when visualizing data with R’s ggplot2.

A comprehensive overview is provided in the form of the RStudio forcats cheat sheet, but, on his blog, Hugo demonstrates some of its functionalities using a dataset on suicides and people’s ages:

Via https://toscano84.github.io/2019/05/factors-in-r-forcats-to-help/

For instance, you might want to reorder the categories using forcats::fct_reorder.

Other functions can be used to automatically surpress infrequent categories, to reverse the order of categories, to shuffle or shift categories, to quickly relabel or anonimize categories, and many more…

Have a look at Hugo’s original blog: https://toscano84.github.io/2019/05/factors-in-r-forcats-to-help/

Alternatively, the tidyverse helppage is also a good starting point: https://forcats.tidyverse.org/

Pimp my RMD: Tips for R Markdown – by Yan Holtz

R markdown creates interactive reports from R code, including interactive reports, documents, dashboards, presentations, and even books. Have a look at this awesome gallery of R markdown examples.

Yan Holtz recently created a neat little overview of handy R Markdown tips and tricks that improve the appearance of output documents. He dubbed this overview Pimp my RMD. Have a look, it’s worth it!

Via https://rmarkdown.rstudio.com/authoring_quick_tour.html

Propensity Score Matching Explained Visually

Propensity score matching (wiki) is a statistical matching technique that attempts to estimate the effect of a treatment (e.g., intervention) by accounting for the factors that predict whether an individual would be eligble for receiving the treatment. The wikipedia page provides a good example setting:

Say we are interested in the effects of smoking on health. Here, smoking would be considered the treatment, and the ‘treated’ are simply those who smoke. In order to find a cause-effect relationship, we would need to run an experiment and randomly assign people to smoking and non-smoking conditions. Of course such experiments would be unfeasible and/or unethical, as we can’t ask/force people to smoke when we suspect it may do harm.
We will need to work with observational data instead. Here, we estimate the treatment effect by simply comparing health outcomes (e.g., rate of cancer) between those who smoked and did not smoke. However, this estimation would be biased by any factors that predict smoking (e.g., social economic status). Propensity score matching attempts to control for these differences (i.e., biases) by making the comparison groups (i.e., smoking and non-smoking) more comparable.

Lucy D’Agostino McGowan is a post-doc at Johns Hopkins Bloomberg School of Public Health and co-founder of R-Ladies Nashville. She wrote a very nice blog explaining what propensity score matching is and showing how to apply it to your dataset in R. Lucy demonstrates how you can use propensity scores to weight your observations in such a way that accounts for the factors that correlate with receiving a treatment. Moreover, her explainations are strenghtened by nice visuals that intuitively demonstrate what the weighting does to the “pseudo-populations” used to estimate the treatment effect.

Have a look yourself: https://livefreeordichotomize.com/2019/01/17/understanding-propensity-score-weighting/

Recommended Books on Data Visualization

Disclaimer: This page contains one or more links to Amazon.
Any purchases made through those links provide us with a small commission that helps to host this blog.

Data visualization and the (in)effective communication of information are salient topics on this blog. I just love to read and write about best practices related to data visualization (or bad practices), or to explore novel types of complex graphs. However, I am not always online, and I am equally fond of reading about data visualization offline.

These amazing books about data visualization
are written by some of the leading experts in the dataviz scene: