Tag: animation

Your own personalized motivational GIF

One of the R OpenSci ozunconference 2018 projects was all about gganimate. The guys and gals dubbed this project learngganimate and dedicated an expansive GitHub repository to it. As part of this project, they created some — let’s call it — very creative GIFS.

One of their GIFs I particularly liked, copied below. Using the OpenSci syn package they looked up synonyms for cool and printed those in some nice colors.

On GitHub, you can find the original code for this project. However, I didn’t get it working on my machine — due to recent updates to the gganimate package — so I had to create my own version, which you find below.

devtools::install_github("ropenscilabs/syn") # only needed for first-time install
devtools::install_github('thomasp85/gganimate') # install the most recent version of gganimate

library(syn)
library(ggplot2)
library(gganimate)
library(dplyr)

set.seed(1) # for reproducibility purposes

synonyms <- syn("great") # store synonyms for your word of chosing

n = 15 # number of synonyms to sample
time = 3 # their position in the plot as well as the duration of their display

# generate dataframe with random synonyms sentences and assigned locations
sentences_df <- data_frame(
  sentence = paste("#rstats ==", sample(synonyms, n), "!!")
  , x = time
  , y = seq(time, time * n, time) 
  )

# generate the actual plot
ggplot(sentences_df,
       aes(x, -y, 
           label = sentence, 
           group = sentence, 
           fill = sentence)) +
  geom_label(size = 10, colour = "white", label.size = 0.3) +
  transition_components(id = sentence, time = y,
                        enter_length = n * time + time ,
                        exit_length = n * time + time) +
  scale_fill_viridis_d() +
  theme_void() +
  theme(legend.position = "none") ->
  plot1

# animate the plot
animate(plot1, nframes = n * time + time)

This code renders the following GIF:

Try to play around with the code to change the GIF:

Change the set.seed argument to get different synonyms in there,
Change the n to include more or less words,
Change the x and y variables to position the labels differently,
Change the size, colour, and fill of the geom_label function to change the label design,
Or change the transition_components arguments to change the display timing.

Moreover, you could change the sentence variable to something to motivate yourself. For instnace, in the following code, I changed it to include my name, and synonyms for the word good. Moreover, I picked a different gganimate function — transition_time — to display the labels according to a different pattern.

set.seed(2) # for reproducibility purposes

# generate dataframe with random synonyms sentences and assigned locations
sentences_df <- data_frame(
  sentence = paste0("Paul is ", sample(syn("good"), n), "!")
  , x = time
  , y = seq(time, time * n, time) 
)

# generate the actual plot
ggplot(sentences_df,
       aes(x, -y, 
           label = sentence, 
           group = sentence, 
           fill = sentence)) +
  geom_label(size = 10, colour = "white", label.size = 0.3) +
  transition_time(time = y) +
  scale_fill_viridis_d() +
  theme_void() +
  theme(legend.position = "none") ->
  plot2

# animate the plot
animate(plot2, nframes = n * time + time)

I think the result is very pleasing, comforting, and positive! Except maybe for the dinkum bit, but fortunately neither I or thesaurus.com know what that means, so it might as well be positive : )

If you go about creating your own animations, you can save them using the save_animation function of the gganimate package. Good luck!

PS. The code to generate the GIF at the top of this blog is posted below. It uses another gganimate function called transition_states:

set.seed(3) # for reproducibility purposes

time = 5
n = 5

# generate dataframe with random synonyms sentences and assigned locations
sentences_df <- data_frame(
  sentence = paste0("You are ", sample(syn("amazing"), n), "!")
  , x = runif(n)
  , y = seq(time, time * n, time) 
)

# generate the actual plot
ggplot(sentences_df,
       aes(x, -y, 
           label = sentence, 
           group = sentence, 
           fill = sentence)) +
  geom_label(size = 12, colour = "white", label.size = 0.5) +
  transition_states(states = sentence, transition_length = time, state_length = time) +
  theme_void() +
  theme(legend.position = "none") +
  coord_cartesian(xlim = c(-0.5, 1.5)) ->
  plot3

# animate the plot
animate(plot3, nframes = n * time + time)

Animated Citation Gates turned into Selection Gates

Bret Beheim — senior researcher at the Max Planck Institute for Evolutionary Anthropology — posted a great GIF animation of the response to his research survey. He calls the figure citation gates, relating the year of scientific publication to the likelihood that the research materials are published open-source or accessible.

To generate the visualization, Bret used R’s base plotting functionality combined with Thomas Lin Pedersen‘s R package tweenrto animate it.

I've been experimenting with R animations using the tweenR package for visualizing the results of our reproducibility survey, and I think it turned out pretty nice. pic.twitter.com/MRerAWHNYT
— Bret Beheim (@babeheim) November 17, 2018

Bret shared his R code for the above GIF of his citation gates on GitHub. With the open source code, this amazing visual display inspired others to make similar GIFs for their own projects. For example, Anne-Wil Kruijt’s dance of the confidence intervals:

Two wks ago I built a shiny 'CI demo' app for a job interview. Yet
I wasn't quite content with it. Then 2 days ago @babeheim posted an amazing gif (srsly, go check it!). Super inspired, & borrowing heavily from his code: my rendition of 'the Dance of the Confidence Intervals' pic.twitter.com/ORheOBBzDm
— Anne-Wil Kruijt (@t_awkr) November 20, 2018

A spin-off of the citation gates: A gif showing confidence intervals of sample means.

Applied to a Human Resource Management context, we could use this similar animation setup to explore, for instance, recruitment, selection, or talent management processes.

Unfortunately, I couldn’t get the below figure to animate properly yet, but I am working on it (damn ggplot2 facets). It’s a quick simulation of how this type of visualization could help to get insights into the recruitment and selection process for open vacancies.

The figure shows how nearly 200 applicants — sorted by their age — go through several selection barriers. A closer look demonstrates that some applicants actually skip the screening and assessment steps and join via a fast lane in the first interview round, which could happen, for instance, when there are known or preferred internal candidates. When animated, such insights would become more clearly visible.

Animated Snow in R

Due to the recent updates to the gganimate package, the code below no longer produces the desired animation.
A working, updated version can be found here.

After hearing R play the Jingle Bells tune, I really got into the holiday vibe. It made me think of Ilya Kashnitsky (homepage, twitter) his snowy image in R.

– Papa, what are you doing?
…
How I ended up generating #rstats snow for my 3yo daughter Sophia#ggplot2 #dataviz pic.twitter.com/29sk1HpROJ
— Ilya Kashnitsky (@ikashnitsky) 4 december 2017

if(!"tidyverse" %in% installed.packages()) install.packages("tidyverse")

library("tidyverse")

n <- 100 
tibble(x = runif(n),  
y = runif(n),  
s = runif(n, min = 4, max = 20)) %>%
ggplot(aes(x, y, size = s)) +
geom_point(color = "white", pch = 42) +
scale_size_identity() +
coord_cartesian(c(0,1), c(0,1)) +
theme_void() +
theme(panel.background = element_rect("black"))

This greatly fits the Christmas theme we have going here. Inspired by Ilya’s script, I decided to make an animated snowy GIF! Sure R is able to make something like the lively visualizations Daniel Shiffman (Coding Train) usually makes in Processing/JavaScript? It seems so:

### ANIMATED SNOW === BY PAULVANDERLAKEN.COM
### PUT THIS FILE IN AN RPROJECT FOLDER

# load in packages
pkg <- c("here", "tidyverse", "gganimate", "animation")
sapply(pkg, function(x){
if (!x %in% installed.packages()){install.packages(x)}
library(x, character.only = TRUE)
})

# parameters
n <- 100 # number of flakes
times <- 100 # number of loops
xstart <- runif(n, max = 1) # random flake start x position
ystart <- runif(n, max = 1.1) # random flake start y position
size <- runif(n, min = 4, max = 20) # random flake size
xspeed <- seq(-0.02, 0.02, length.out = 100) # flake shift speeds to randomly pick from
yspeed <- runif(n, min = 0.005, max = 0.025) # random flake fall speed

# create storage vectors
xpos <- rep(NA, n * times)
ypos <- rep(NA, n * times)

# loop through simulations
for(i in seq(times)){
if(i == 1){
# initiate values
xpos[1:n] <- xstart
ypos[1:n] <- ystart
} else {
# specify datapoints to update
first_obs <- (n*i - n + 1)
last_obs <- (n*i)
# update x position
# random shift
xpos[first_obs:last_obs] <- xpos[(first_obs-n):(last_obs-n)] - sample(xspeed, n, TRUE)
# update y position
# lower by yspeed
ypos[first_obs:last_obs] <- ypos[(first_obs-n):(last_obs-n)] - yspeed
# reset if passed bottom screen
xpos <- ifelse(ypos < -0.1, runif(n), xpos) # restart at random x
ypos <- ifelse(ypos < -0.1, 1.1, ypos) # restart just above top
}
}

# store in dataframe
data_fluid <- cbind.data.frame(x = xpos,
y = ypos,
s = size,
t = rep(1:times, each = n))

# create animation
snow <- data_fluid %>%
ggplot(aes(x, y, size = s, frame = t)) +
geom_point(color = "white", pch = 42) +
scale_size_identity() +
coord_cartesian(c(0, 1), c(0, 1)) +
theme_void() +
theme(panel.background = element_rect("black"))

# save animation
gganimate(snow, filename = here("snow.gif"), title_frame = FALSE, interval = .1)

Updates:

21/12/2017: Keith combined sound and image to create this very merry video.
22/12/2017: Ioannis Kosmidis generated snow in base R
25/12/2017: Daniel Shiffman dedicated a coding challenge to the topic.
25/12/2017: Cynthia Siew combined sound and image in this Shiny Christmas card.
17/12/2018: Due to the update to gganimate, I updated the code and general setup to run still in 2018.

Sentiment Analysis of Stranger Things Seasons 1 and 2

Jordan Dworkin, a Biostatistics PhD student at the University of Pennsylvania, is one of the few million fans of Stranger Things, a 80s-themed Netflix series combining drama, fantasy, mystery, and horror. Awaiting the third season, Jordan was curious as to the emotional voyage viewers went through during the series, and he decided to examine this using a statistical approach. Like I did for the seven Harry Plotter books, Jordan downloaded the scripts of all the Stranger Things episodes and conducted a sentiment analysis in R, of course using the tidyverse and tidytext. Jordan measured the positive or negative sentiment of the words in them using the AFINN dictionary and a first exploration led Jordan to visualize these average sentiment scores per episode:

The average positive/negative sentiment during the 17 episodes of the first two seasons of Stranger Things (from Medium.com)

Jordan jokingly explains that you might expect such overly negative sentiment in show about missing children and inter-dimensional monsters. The less-than-well-received episode 15 stands out, Jordan feels this may be due to a combination of its dark plot and the lack of any comedic relief from the main characters.

Reflecting on the visual above, Jordan felt that a lot of the granularity of the actual sentiment was missing. For a next analysis, he thus calculated a rolling average sentiment during the course of the separate episodes, which he animated using the animation package:

GIF displaying the rolling average (40 words) sentiment per Stranger Things episode (from Medium.com)

Jordan has two new takeaways: (1) only 3 of the 17 episodes have a positive ending – the Season 1 finale, the Season 2 premiere, and the Season 2 finale – (2) the episodes do not follow a clear emotional pattern. Based on this second finding, Jordan subsequently compared the average emotional trajectories of the two seasons, but the difference was not significant:

Smoothed (loess, I guess) trajectories of the sentiment during the episodes in seasons one and two of Stranger Things (from Medium.com)

Potentially, it’s better to classify the episodes based on their emotional trajectory than on the season they below too, Jordan thought next. Hence, he constructed a network based on the similarity (temporal correlation) between episodes’ temporal sentiment scores. In this network, the episodes are the nodes whereas the edges are weighted for the similarity of their emotional trajectories. In that sense, more distant episodes are less similar in terms of their emotional trajectory. The network below, made using igraph (see also here), demonstrates that consecutive episodes (1 → 2, 2 → 3, 3 → 4) are not that much alike:

The network of Stranger Things episodes, where the relations between the episodes are weighted for the similarity of their emotional trajectories (from Medium.com).

A community detection algorithm Jordan ran in MATLAB identified three main trajectories among the episodes:

Three different emotional trajectories were identified among the 17 Stranger Things episodes in Season 1 and 2 (from Medium.com).

Looking at the average patterns, we can see that group 1 contains episodes that begin and end with neutral emotion and have slow fluctuations in the middle, group 2 contains episodes that begin with negative emotion and gradually climb towards a positive ending, and group 3 contains episodes that begin on a positive note and oscillate downwards towards a darker ending.

– Jordan on Medium.com

Jordan final suggestion is that producers and scriptwriters may consciously introduce these variations in emotional trajectories among consecutive episodes in order to get viewers hooked. If you want to redo the analysis or reuse some of the code used to create the visuals above, you can access Jordan’s R scripts here. I, for one, look forward to his analysis of Season 3!

Advanced GIFs in R

Rafa Irizarry is a biostatistics professor and one of the three people behind SimplyStatistics.org (the others are Jeff Leek, Roger Peng). They post ideas that they find interesting and their blog contributes greatly to discussion of science/popular writing.

Rafa is the creator of many data visualization GIFs that have recently trended on the web, and in a recent post he provides all the source code behind the beautiful imagery. I sincerely recommend you check out the orginal blog if you want to find out more, but here are the GIFS:

Simpson’s paradox is a statistical phenomenon where an observed relationship within a population reverses within all subgroups that make up that population. Rafa visualized it wonderfully in a GIF that took only twenty-some lines of R code:

A different statistical phenomenon is discussed at the end of the original blog: namely the ecological fallacy. It occurs when correlations that occur on the group-level are erroneously extrapolated to the individual-level. Rafa used the gapminder data included in the dslabs package to illustrate the fallacy: there is a very high correlation at the region level and a lower correlation at the individual country level:

The gapminder data is also used in the next GIF. This mimics Hans Rosling’s famous animation during his talk on New Insights on Poverty, but then made with R and gganimate by Rafa:

A next visualization demonstrates how the UN voting data (of Erik Voeten and Anton Strezhnev) can be used to examine different voting behaviors. It seems to reduce the voting data to a two-dimensional factor structure, and seemingly there are three distinct groups of voters these days, with particularly the USA and Israel far removed from other members:

The next GIFs are more statistical. The one below demonstrates how the local regression (LOESS) works. Simply speaking, LOESS determines the relationship for a local subset of the population and when you iteratively repeat this for all local subsets in a population you get a nicely fitting LOESS curve, the red line in Rafa’s GIF:

Not quite sure how to interpret the next one, but Rafa explains it visualized a random forest’s predictions using only one predictor variable. I think that different trees would then provide different predictions because they leverage different training samples, and an ensemble of those trees would then improve predictive accuracy?

The next one is my favorite I think. This animation illustrates how a highly accurate test would function in a population with low prevalence of true values (e.g., disease, applicant success). More details are in the original blog or here.

The blog ends with a rather funny animation of the only good use of pie charts, according to Rafa:

Animated GIFs in R

Sometimes, it can be of interest to examine how two variables correlate over time. For example, how people in a social network (e.g., an organization) behave or move over the course of time. However, it can be hard to display multi-dimensional data in a single plot. Instead of including time as an additional dimension and providing stakeholders with complicated 3-D plots, ggplot2 now has a support package called gganimate, which allows you to create custom GIFs. Particularly helpful when you seek to demonstrate trends over time.

See this recent post by Analytics Vidhya for a tutorial on the implementation.