Sharon’s list contains many neat tricks, some of which less well-known base functions, others features of more niche packages. Here’s the ones I am definitely adding to my R tricks overview and want to highlight here as well:
Categorize values into interval cut()
Convert numbers that came in as strings with commas to R numbers with readr::parse_number(mydf$mycol)
Create a searchable, sortable HTML table in 1 line of code with DT::datatable(mydf, filter = 'top')
Display a fraction between 0 and 1 as a percentage with scales::percent(myfraction)
Generate a vector of 1:length(myvec) with seq_along(myvec)
The magazine of Tilburg University — Univers — recently interviewed me on my PhD research on People Analytics and data-driven Human Resource management. The Dutch write-up by interviewer Ron Vaessen you can find here, but is unfortunately available in Dutch only.
I have also dedicated several blogs to more background information. A small extract on the ethics of people analytics and machine learning in HR I posted here. Those interested in visualizing survival curves like I did can see this post. Curious about the cover design, read this post.
Timo Grossenbacher works as reporter/coder for SRF Data, the data journalism unit of Swiss Radio and TV. He analyzes and visualizes data and investigates data-driven stories. On his website, he hosts a growing list of cool projects. One of his recent blogs covers categorical spatial interpolation in R. The end result of that blog looks amazing:
This map was built with data Timo crowdsourced for one of his projects. With this data, Timo took the following steps, which are covered in his tutorial:
Read in the data, first the geometries (Germany political boundaries), then the point data upon which the interpolation will be based on.
Preprocess the data (simplify geometries, convert CSV point data into an sf object, reproject the geodata into the ETRS CRS, clip the point data to Germany, so data outside of Germany is discarded).
Then, a regular grid (a raster without “data”) is created. Each grid point in this raster will later be interpolated from the point data.
Run the spatial interpolation with the kknn package. Since this is quite computationally and memory intensive, the resulting raster is split up into 20 batches, and each batch is computed by a single CPU core in parallel.
Visualize the resulting raster with ggplot2.
All code for the above process can be accessed on Timo’s Github. The georeferenced points underlying the interpolation look like the below, where each point represents the location of a person who selected a certain pronunciation in an online survey. More details on the crowdsourced pronunciation project van be found here, .