Category: visualization

Evolving Floorplans – by Joel Simon

Joel Simon is the genius behind an experimental project exploring optimized school blueprints. Joel used graph-contraction and ant-colony pathing algorithms as growth processes, which could generate elementary school designs optimized for all kinds of characteristics: walking time, hallway usage, outdoor views, and escape routes just to name a few.

Two generated designs, minimizing the traffic flow (left) as well as escape routes (right) [original]

Other designs tried to maximize the number of windows, resulting in seemingly random open courtyards [original]

Definitely check out the original write-up if you are interested in the details behind the generation process! Or have a look at some of Joel’s other projects.

Become a data-driven Sommelier by text mining wine reviews

Aleszu Bajak at Storybench.org published a great demonstration of the power of text mining. He used the R tidytext package to analyse 150,000 wine reviews which Zach Thoutt had scraped from Wine Enthusiast in November of 2017.

Aleszu started his analysis on only the French wines, with a simple word count per region:

Next, he applied TF-IDF to surface the words that are most characteristic for specific French wine regions — words used often in combination with that specific region, but not in relation to other regions.

The data also contained some price information, which Aleszu mapped France with ggplot2 and the maps package to demonstrate which French wine regions are generally more costly.

On the full dataset, Alezsu also demonstrated that there is a strong relationship between price and points, meaning that, in general, more expensive wines seem to get better reviews:

The full script and more details you can find in the orginal blog.

Multimapping in R, by Ilya Kashnitsky

Nothing beats a aesthetically-pleasing data visualization in the form of a map (see evidence here, here, here, or here).

Moreover, we’ve already witnessed some great R tutorials by Ilya Kashnitsky before (see Animated Snow in R).

These two come together in Ilya’s recent post on subplots in ggplot2 maps, with which he completely amazed me. The creation process is actually easier than the end result makes it look: make several visualizations and add them as ggplot2::annotation_custom() to your main ggplot2 map — the same as if you are adding a logo to your plot. Enjoy:

Here you can find Ilya’s original blog and the associated R script.

R tips and tricks

Below are a dozen of very specific R tips and tricks. Some are valuable, useful, or boost your productivity. Others are just geeky funny.

More general helpful R packages and resources can be found in this list.

If you have additions, please comment below or contact me!

Completely new to R? → Start here!

RStudio tricks
General tips
Base R tricks
R Markdown tricks
Data manipulation tricks
Data visualization tricks

Funny tricks
Easter eggs

Join 385 other subscribers

RStudio

RStudio Addins
RStudio Keyboard Shortcuts
R Studio easy tricks: tearable panes, command history, renaming in scope, outlining, snippets, and more
Working with R projects and here
Working with code snippets
Working with code snippets (video)
Stop RStudio from asking to save workspace
Automatically save workspace in case of a crash / errors
Edit several lines of code at once
Press ALT + left mousebutton to select and write on multiple lines simultaneously.
Press ALT + - to insert a <- operator
Press CTRL + SHIFT + M to insert a %>% operator
Press CTRL + SHIFT + F to search all files in the directory or project
Press CTRL + UP to access navigate your console history
Rename all variables with same name (rename in scope)
Press CMD + ALT + SHIFT + M to rename variable within scope: to rename all/multiple occurrences of a variable in a script
Press TAB inside “” (quotation marks / an empty string) to select a filename from your current directory, or to autocomplete a filename you started typing

Many more shortkeys available here online, and in your RStudio under Tools → Keyboard Shortcuts Help.

General

Disclaimer: This page contains one or more links to Amazon.
Any purchases made through those links provide us with a small commission that helps to host this blog.

Useful base functions

str() – explore structure of R object
trimws() – trim trailing and/or leading whitespaces
dput() – dump an R object in form of R code
cut()– categorize values into intervals
intersect() – returns similar elements in two vectors
union() – find intersecting items in two vectors
setdiff() – returns different elements in two vectors
interaction() – computes a factor which represents the interaction of the given factors
formatC()can be used to round numbers and force trailing zero’s
formatC() and sprintf() can be used to add leading/trailing characters
expand.grid() – create a data frame from all combinations of the supplied vectors or factors
seq_along(myvec) – generates a vector of 1:length(myvec)
Initiate an empty dataframe with header names
Functional programming tricks:
- switch() can replace elaborate ifelse statements (see also)
- match.arg() can check for arguments and values
- The null-default operator (%||%) returns the first value that is not NULL
Convert a vector of strings to title case
Quickly map a new set of values to an existing vector
Calculate the derivative of a function expression
Specify options() in your script:
- Prevent scientific notation using options(scipen = 999)
- Prevent automatic factor columns using options(stringsAsFactors = FALSE)
- Use options(width = 60) to change the default width of console output
- Use options(max.print = 100) to change the default number of values printed in the console

Back to Table of Contents

R Markdown

Pimp my RMD: Overview of many R markdown tricks by Yan Holtz
Save compiled images in folder with markdown
Add caption to compiled tables with markdown
Tabsets in markdown
Foldable html content in markdown
Reuse code chunks in markdown
Generate Word documents with markdown
Open url’s in a new window with[text](url){target = "_blank} in markdown
Use #<< to highlight code
Move to next xaringan slide upon click (or Enter)
Convert an R Markdown file (.Rmd) into an R script (.R) with
knitr::purl(input, output, documentation = 2)
Use CTRL + SHIFT + 1:4 to zoom in on any single of your RStudio panels. Use ALT + CTRL + SHIFT + 0 to zoom back out.
knitr::read_chunk("your_script_name.R") can be used to source in scripts that reside outside your current markdown file
Use animations in your markdown files with the gganimate package and "header-includes: - \usepackage{animate} in your YAML preamble
Create a searchable, sortable HTML table in 1 line of code with DT::datatable(mydf, filter = 'top')

Data manipulation

readr::parse_number extracts the numbers from raw / scraped text
stringr::str_pad can be used to add leading or trailing characters (like zero’s)
dplyr tricks
dplyr::case_when replaces elaborate ifelse statements (Video)
dplyr::everything in combination with dplyr::select to reorder columns
Quickly count / tally observations within groups with dplyr::count, dplyr::tally, and dplyr::add_count and dplyr::add_tally
Quickly filter the top categories / groups based on a variable with forcats::fct_lump
Apply the same filter to multiple columns with dplyr::filter_all or dplyr::filter_if in combination with dplyr::all_vars and dplyr::any_vars
dplyr::group_by_if performs quick conditional grouping
Perform rowwise mutations / calculations using dplyr::rowwise
purrr tricks
purrr::map_df to read in and merge all data files in a folder
Combine purr::map_df and fs::dir_ls to read in and merge all data files following a specific pattern in a folder
Combine list.files and purrr::map_df to read in and merge all data files in a folder
broom::tidy puts your model results in a tidy data frame
Simpler correlation analysis with corrr
df %>% .$column_name or df %$% column_name can retrieve a column from a tibble
dplyr::coalesce finds the one value contained in many columns with missing values
Display a fraction between 0 and 1 as a percentage with scales::percent(myfraction)
Convert numbers that came in as strings with commas to R numbers with readr::parse_number(mydf$mycol)

Data visualization

colors() to see the names of all built-in colors
GGally::ggpairs for beautiful pair-wise correlation plots
tidyr::complete to get barplot spacing right
Quickly visualize your whole dataset
Create custom, corporate, reproducible color palettes and custom discrete color scales
Standardize the colors of groups in your visualizations using named vectors
theme_set to set a default ggplot2 theme
Create your own ggplot2 theme:
Rearranging values and axis within ggplot2 facets
Add line labels at the end of geom_lines by Simon Jackson
Add + NULL to the end of your ggplot2 chain during development
Add clip = "off" to draw outside the plot panel
Remove point borders with stroke = 0
Multicolored annotated text in ggplot2 by Andrew Whitby & Visuelle Data
Combine plots using patchwork or cowplot
Add a (corporate) logo to your plot using magick
Use animations in your markdown files with the gganimate package and "header-includes: - \usepackage{animate} in your YAML preamble
If you pass a function to the data-argument in a geom_*, then it applies that function to the data!
Generate distributions in ggplot2 using the stat_function function. Normal distributions, student t-distributions, beta distributions, anything. See also here.

Back to Table of Contents

Fun

Easter eggs

Run ????"", via Reddit
Run example(readLine), via DecisionStats
Run ?.Internal, via DecisionStats

Join 385 other subscribers

Back to Table of Contents

Generating Pusheen with AI

Zack Nado wrote the best machine learning application I’ve seen so far: a neural network architecture that generates new Pusheen pictures.

Image result for pusheen — This is an orginal Pusheen picture.

In his blog, Zack describes his generative adversarial network (GAN) , a special type of machine learning architecture where two neural networks try to fool each other. Zack first gave the discriminator network some real Pusheen images, so it gets an idea of what Pusheen looks like. Next, the generator network gets a bunch of random numbers so it can generate completely new (fake) images. These generated images are then fed back into the discriminator, so it knows what generated images look like. Zack repeated this process several hundred thousand times, so he obtained a generator network that’s great at making new Pusheen images which the discriminator (nearly) can’t dinstinguish from the original, real ones. Below is the learning process of the generator network visualized:

ezgif.com-video-to-gif — Samples output by the generator network. It learns distinctive features of “real” Pusheen (e.g., tail, eyes, ears) over time [original]

In the end, the generated images are very much like the real Pusheen. Zack added an interactive module (using Tensorflow.js) to the blog so you can generate some Pusheens yourself. (it didn’t work for me though…) On a final note, Zack wrote the orginal blog both in plain English, for non-experts, and in jargon, for the more experienced data scientists. I highly recommend you read either one of those versions!

Some of the Pusheen’s generated by Zack’s GAN [original]

Interactive Explanation of Network and Graph Principles

Why do groups of people act smart, dumb, kind, or cruel? People behave in strange ways, particularly when they are able to influence one another. Both good and bad things can happen when people interact and behave in network structures. On the bright side, you must be familiar with the wisdom of the crowd, where the aggregated knowledge of a group is more valuable than its sum? Ensemble algorithms – like random forest analysis – rely on this positive principle.

On the dark side, are you familiar with the phenomenon called the tragedy of the commons, where shared resource-systems collapse because individuals behave in their self-interest? Or psychological phenomena such as groupthink, where groups of people make irrational decisions due to social issues? The recent spread of fake news and misinformation is also stimulated by network interactions. In these cases, we could speak of the madness of the crowd.

Nicky Case made a great interactive walkthrough explaining why and when networks of people become wise or mad. You are tasked to change and simulate network interactions while Nicky explains concepts such as (complex) contagion, the majority illusion paradox, bonding and bridging, and small world networks. In the references, Nicky provides links to scientific papers explaining these concepts in more detail. I highly suggest you check out her website here.