Tag: GIF

Daily Art by Saskia Freeke

Daily Art by Saskia Freeke

Saskia Freeke (twitter) is a Dutch artist, creative coder, interaction designer, visual designer, and educator working from Amsterdam. She has been creating an awesome digital art piece for every day since January 1st 2015. Her ever-growing collection includes some animated, visual masterpieces.

My personal favorites are Saskia’s moving works, her GIFs:

Saskia uses Processing to create her art. Processing is a Java-based language, also used often by Daniel Shiffmann whom we know from the Coding Train.

Beating Battleships with Algorithms and AI

Past days, I discovered this series of blogs on how to win the classic game of Battleships (gameplay explanation) using different algorithmic approaches. I thought they might amuse you as well : )

The story starts with this 2012 Datagenetics blog where Nick Berry constrasts four algorithms’ performance in the game of Battleships. The resulting levels of artificial intelligence (AI) seem to compare respectively to a distracted baby, two sensible adults, and a mathematical progidy.

The first, stupidest approach is to just take Random shots. The AI resulting from such an algorithm would just pick a random tile to shoot at each turn. Nick simulated 100 million games with this random apporach and computed that the algorithm would require 96 turns to win 50% of games, given that it would not be defeated before that time. At best, the expertise level of this AI would be comparable to that of a distracted baby. Basically, it would lose from the average toddler, given that the toddler would survive the boredom of playing such a stupid AI.

A first major improvement results in what is dubbed the Hunt algorithm. This improved algorithm includes an instruction to explore nearby spaces whenever a prior shot hit. Every human who has every played Battleships will do this intuitively. A great improvement indeed as Nick’s simulations demonstrated that this Hunt algorithm completes 50% of games within ~65 turns, as long as it is not defeated beforehand. Your little toddler nephew will certainly lose, and you might experience some difficulty as well from time to time.

A visual representation of the “Hunting” of the algorithm on a hit [via]

Another minor improvement comes from adding the so-called Parity principle to this Hunt algorithm (i.e., Nick’s Hunt + Parity algorithm). This principle instructs the algorithm to take into account that ships will always cover odd as well as even numbered tiles on the board. This information can be taken into account to provide for some more sensible shooting options. For instance, in the below visual, you should avoid shooting the upper left white tile when you have already shot its blue neighbors. You might have intuitively applied this tactic yourself in the past, shooting tiles in a “checkboard” formation. With the parity principle incorporated, the median completion rate of our algorithm improves to ~62 turns, Nick’s simulations showed.

The Parity “checkerboard” principle [via]

Now, Nick’s final proposed algorithm is much more computationally intensive. It makes use of Probability Density Functions. At the start of every turn, it works out all possible locations that every remaining ship could fit in. As you can imagine, many different combinations are possible with five ships. These different combinations are all added up, and every tile on the board is thus assigned a probability that it includes a ship part, based on the tiles that are already uncovered.

Computing the probability that a tile contains a ship based on all possible board layouts [via]

At the start of the game, no tiles are uncovered, so all spaces will have about the same likelihood to contain a ship. However, as more and more shots are fired, some locations become less likely, some become impossible, and some become near certain to contain a ship. For instance, the below visual reflects seven misses by the X’s and the darker tiles which thus have a relatively high probability of containing a ship part. 

An example distribution with seven misses on the grid. [via]

Nick simulated 100 million games of Battleship for this probabilistic apporach as well as the prior algorithms. The below graph summarizes the results, and highlight that this new probabilistic algorithm greatly outperforms the simpler approaches. It completes 50% of games within ~42 turns! This algorithm will have you crying at the boardgame table.

Relative performance of the algorithms in the Datagenetics blog, where “New Algorithm” refers to the probabilistic approach and “No Parity” refers to the original “Hunt” approach.

Reddit user /u/DataSnaek reworked this probablistic algorithm in Python and turned its inner calculations into a neat GIF. Below, on the left, you see the probability of each square containing a ship part. The brighter the color (white <- yellow <- red <- black), the more likely a ship resides at that location. It takes into account that ships occupy multiple consecutive spots. On the right, every turn the algorithm shoots the space with the highest probability. Blue is unknown, misses are in red, sunk ships in brownish, hit “unsunk” ships in light blue (sorry, I am terribly color blind).


The probability matrix as a heatmap for every square after each move in the game.  [via]

This latter attempt by DataSnaek was inspired by Jonathan Landy‘s attempt to train a reinforcement learning (RL) algorithm to win at Battleships. Although the associated GitHub repository doesn’t go into much detail, the approach is elaborately explained in this blog. However, it seems that this specific code concerns the training of a neural network to perform well on a very small Battleships board, seemingly containing only a single ship of size 3 on a board with only a single row of 10 tiles.

Fortunately, Sue He wrote about her reinforcement learning approach to Battleships in 2017. Building on the open source phoenix-battleship project, she created a Battleship app on Heroku, and asked co-workers to play. This produced data on 83 real, two-person games, showing, for instance, that Sue’s coworkers often tried to hide their size 2 ships in the corners of the Battleships board.

Probability heatmaps of ship placement in Sue He’s reinforcement learning Battleships project [via]

Next, Sue scripted a reinforcement learning agent in PyTorch to train and learn where to shoot effectively on the 10 by 10 board. It became effective quite quickly, requiring only 52 turns (on average over the past 25 games) to win, after training for only a couple hundreds games.

The performance of the RL agent at Battleships during the training process [via]

However, as Sue herself notes in her blog, disappointly, this RL agent still does not outperform the probabilistic approach presented earlier in this current blog.

Reddit user /u/christawful faced similar issues. Christ (I presume he is called) trained a convolutional neural network (CNN) with the below architecture on a dataset of Battleships boards. Based on the current board state (10 tiles * 10 tiles * 3 options [miss/hit/unknown]) as input data, the intermediate convolutional layers result in a final output layer containing 100 values (10 * 10) depicting the probabilities for each tile to result in a hit. Again, the algorithm can simply shoot the tile with the highest probability.

NN diagram
Christ’s convolutional neural network architecture for Battleships [via]

Christ was nice enough to include GIFs of the process as well [via]. The first GIF shows the current state of the board as it is input in the CNN — purple represents unknown tiles, black a hit, and white a miss (i.e., sea). The next GIF represent the calculated probabilities for each tile to contain a ship part — the darker the color the more likely it contains a ship. Finally, the third picture reflects the actual board, with ship pieces in black and sea (i.e., miss) as white.

As cool as this novel approach was, Chris ran into the same issue as Sue, his approach did not perform better than the purely probablistic one. The below graph demonstrates that while Christ’s CNN (“My Algorithm”) performed quite well — finishing a simulated 9000 games in a median of 52 turns — it did not outperform the original probabilistic approach of Nick Berry — which came in at 42 turns. Nevertheless, Chris claims to have programmed this CNN in a couple of hours, so very well done still.

cdf
The performance of Christ’s Battleship CNN compared to Nick Berry’s original algorithms [via]

Interested by all the above, I searched the web quite a while for any potential improvement or other algorithmic approaches. Unfortunately, in vain, as I did not find a better attempt than that early 2012 Datagenics probability algorithm by Nick.

Surely, with today’s mass cloud computing power, someone must be able to train a deep reinforcement learner to become the Battleship master? It’s not all probability right, there must be some patterns in generic playing styles, like Sue found among her colleagues. Or maybe even the ability of an algorithm to adapt to the opponent’s playin style, as we see in Libratus, the poker AI. Maybe the guys at AlphaGo could give it a shot?

For starters, Christ’s provided some interesting improvements on his CNN approach. Moreover, while the probabilistic approach seems the best performing, it might not the most computationally efficient. All in all, I am curious to see whether this story will continue.

Join 320 other followers

Animated Snow in R, 2.0: gganimate API update

Animated Snow in R, 2.0: gganimate API update

Last year, inspired by a tweet from Ilya Kashnitsky, I wrote a snow animation which you can read all about here

Now, this year, the old code no longer worked due to an update to the gganimate API. Hence, I was about to only refactor the code, but decided to give the whole thing a minor update. Below, you find the 2.0 version of my R snow animation.

# PACKAGES ####
pkg <- c("here", "tidyverse", "gganimate", "animation")
sapply(pkg, function(x){
if (!x %in% installed.packages()){install.packages(x)}
library(x, character.only = TRUE)
})

# CUSTOM FUNCTIONS ####
map_to_range <- function(x, from, to) {
# Shifting the vector so that min(x) == 0
x <- x - min(x)
# Scaling to the range of [0, 1]
x <- x / max(x)
# Scaling to the needed amplitude
x <- x * (to - from)
# Shifting to the needed level
x + from
}

# CONSTANTS ####
N <- 500 # number of flakes
TIMES <- 100 # number of loops
XPOS_DELTA <- 0.01
YSPEED_MIN = 0.005
YSPEED_MAX = 0.03
FLAKE_SIZE_COINFLIP = 5
FLAKE_SIZE_COINFLIP_PROB = 0.1
FLAKE_SIZE_MIN = 4
FLAKE_SIZE_MAX = 20

# INITIALIZE DATA ####
set.seed(1)

size <- runif(N) + rbinom(N, FLAKE_SIZE_COINFLIP, FLAKE_SIZE_COINFLIP_PROB) # random flake size
yspeed <- map_to_range(size, YSPEED_MIN, YSPEED_MAX)

# create storage vectors
xpos <- rep(NA, N * TIMES)
ypos <- rep(NA, N * TIMES)

# loop through simulations
for(i in seq(TIMES)){
if(i == 1){
# initiate values
xpos[1:N] <- runif(N, min = -0.1, max = 1.1)
ypos[1:N] <- runif(N, min = 1.1, max = 2)
} else {
# specify datapoints to update
first_obs <- (N * i - N + 1)
last_obs <- (N * i)
# update x position
# random shift
xpos[first_obs:last_obs] <- xpos[(first_obs-N):(last_obs-N)] - runif(N, min = -XPOS_DELTA, max = XPOS_DELTA)
# update y position
# lower by yspeed
ypos[first_obs:last_obs] <- ypos[(first_obs-N):(last_obs-N)] - yspeed
# reset if passed bottom screen
xpos <- ifelse(ypos < -0.1, runif(N), xpos) # restart at random x
ypos <- ifelse(ypos < -0.1, 1.1, ypos) # restart just above top
}
}


# VISUALIZE DATA ####
cbind.data.frame(ID = rep(1:N, TIMES)
,x = xpos
,y = ypos
,s = size
,t = rep(1:TIMES, each = N)) %>%
# create animation
ggplot() +
geom_point(aes(x, y, size = s, alpha = s), color = "white", pch = 42) +
scale_size_continuous(range = c(FLAKE_SIZE_MIN, FLAKE_SIZE_MAX)) +
scale_alpha_continuous(range = c(0.2, 0.8)) +
coord_cartesian(c(0, 1), c(0, 1)) +
theme_void() +
theme(legend.position = "none",
panel.background = element_rect("black")) +
transition_time(t) +
ease_aes('linear') ->
snow_plot

snow_anim <- animate(snow_plot, nframes = TIMES, width = 600, height = 600)

If you want to see some spin-offs of last years code:

Animated vs. Static Data Visualizations

Animated vs. Static Data Visualizations

GIFs or animations are rising quickly in the data visualization world (see for instance here).

However, in my personal experience, they are not as widely used in business settings. You might even say animations are frowned by, for instance, LinkedIn, which removed the option to even post GIFs on their platform!

Nevertheless, animations can be pretty useful sometimes. For instance, they can display what happens during a process, like a analytical model converging, which can be useful for didactic purposes. Alternatively, they can be great for showing or highlighting trends over time.  

I am curious what you think are the pro’s and con’s of animations. Below, I posted two visualizations of the same data. The data consists of the simulated workforce trends, including new hires and employee attrition over the course of twelve months. 

versus

Would you prefer the static, or the animated version? Please do share your thoughts in the comments below, or on the respective LinkedIn and Twitter posts!


Want to reproduce these plots? Or play with the data? Here’s the R code:

# LOAD IN PACKAGES ####
# install.packages('devtools')
# devtools::install_github('thomasp85/gganimate')
library(tidyverse)
library(gganimate)
library(here)


# SET CONSTANTS ####
# data
HEADCOUNT = 270
HIRE_RATE = 0.12
HIRE_ADDED_SEASONALITY = rep(floor(seq(14, 0, length.out = 6)), 2)
LEAVER_RATE = 0.16
LEAVER_ADDED_SEASONALITY = c(rep(0, 3), 10, rep(0, 6), 7, 12)

# plot
TEXT_SIZE = 12
LINE_SIZE1 = 2
LINE_SIZE2 = 1.1
COLORS = c("darkgreen", "red", "blue")

# saving
PLOT_WIDTH = 8
PLOT_HEIGHT = 6
FRAMES_PER_POINT = 5


# HELPER FUNCTIONS ####
capitalize_string = function(text_string){
paste0(toupper(substring(text_string, 1, 1)), substring(text_string, 2, nchar(text_string)))
}


# SIMULATE WORKFORCE DATA ####
set.seed(1)

# generate random leavers and some seasonality
leavers <- rbinom(length(month.abb), HEADCOUNT, TURNOVER_RATE / length(month.abb)) + LEAVER_ADDED_SEASONALITY

# generate random hires and some seasonality
joiners <- rbinom(length(month.abb), HEADCOUNT, HIRE_RATE / length(month.abb)) + HIRE_ADDED_SEASONALITY

# combine in dataframe
data.frame(
month = factor(month.abb, levels = month.abb, ordered = TRUE)
, workforce = HEADCOUNT - cumsum(leavers) + cumsum(joiners)
, left = leavers
, hires = joiners
) ->
wf

# transform to long format
wf_long <- gather(wf, key = "variable", value = "value", -month)
capitalize the name of variables
wf_long$variable <- capitalize_string(wf_long$variable)


# VISUALIZE & ANIMATE ####
# draw workforce plot
ggplot(wf_long, aes(x = month, y = value, group = variable)) +
geom_line(aes(col = variable, size = variable == "workforce")) +
scale_color_manual(values = COLORS) +
scale_size_manual(values = c(LINE_SIZE2, LINE_SIZE1), guide = FALSE) +
guides(color = guide_legend(override.aes = list(size = c(rep(LINE_SIZE2, 2), LINE_SIZE1)))) +
# theme_PVDL() +
labs(x = NULL, y = NULL, color = "KPI", caption = "paulvanderlaken.com") +
ggtitle("Workforce size over the course of a year") +
NULL ->
workforce_plot

# ggsave(here("workforce_plot.png"), workforce_plot, dpi = 300, width = PLOT_WIDTH, height = PLOT_HEIGHT)

# animate the plot
workforce_plot +
geom_segment(aes(xend = 12, yend = value), linetype = 2, colour = 'grey') +
geom_label(aes(x = 12.5, label = paste(variable, value), col = variable),
hjust = 0, size = 5) +
transition_reveal(variable, along = as.numeric(month)) +
enter_grow() +
coord_cartesian(clip = 'off') +
theme(
plot.margin = margin(5.5, 100, 11, 5.5)
, legend.position = "none"
) ->
animated_workforce

anim_save(here("workforce_animation.gif"),
animate(animated_workforce, nframes = nrow(wf) * FRAMES_PER_POINT,
width = PLOT_WIDTH, height = PLOT_HEIGHT, units = "in", res = 300))
Your own personalized motivational GIF

Your own personalized motivational GIF

One of the R OpenSci ozunconference 2018 projects was all about gganimate. The guys and gals dubbed this project learngganimate and dedicated an expansive GitHub repository to it. As part of this project, they created some — let’s call it — very creative GIFS.

One of their GIFs I particularly liked, copied below. Using the OpenSci syn package they looked up synonyms for cool and printed those in some nice colors.

On GitHub, you can find the original code for this project. However, I didn’t get it working on my machine — due to recent updates to the gganimate package — so I had to create my own version, which you find below.

devtools::install_github("ropenscilabs/syn") # only needed for first-time install
devtools::install_github('thomasp85/gganimate') # install the most recent version of gganimate

library(syn)
library(ggplot2)
library(gganimate)
library(dplyr)

set.seed(1) # for reproducibility purposes

synonyms <- syn("great") # store synonyms for your word of chosing

n = 15 # number of synonyms to sample
time = 3 # their position in the plot as well as the duration of their display

# generate dataframe with random synonyms sentences and assigned locations
sentences_df <- data_frame(
sentence = paste("#rstats ==", sample(synonyms, n), "!!")
, x = time
, y = seq(time, time * n, time)
)

# generate the actual plot
ggplot(sentences_df,
aes(x, -y,
label = sentence,
group = sentence,
fill = sentence)) +
geom_label(size = 10, colour = "white", label.size = 0.3) +
transition_components(id = sentence, time = y,
enter_length = n * time + time ,
exit_length = n * time + time) +
scale_fill_viridis_d() +
theme_void() +
theme(legend.position = "none") ->
plot1

# animate the plot
animate(plot1, nframes = n * time + time)

This code renders the following GIF:

Try to play around with the code to change the GIF:

  • Change the set.seed argument to get different synonyms in there,
  • Change the n to include more or less words,
  • Change the x and y variables to position the labels differently,
  • Change the size, colour, and fill of the geom_label function to change the label design,
  • Or change the transition_components arguments to change the display timing.

Moreover, you could change the sentence variable to something to motivate yourself. For instnace, in the following code, I changed it to include my name, and synonyms for the word good. Moreover, I picked a different gganimate function — transition_time — to display the labels according to a different pattern. 

set.seed(2) # for reproducibility purposes

# generate dataframe with random synonyms sentences and assigned locations
sentences_df <- data_frame(
sentence = paste0("Paul is ", sample(syn("good"), n), "!")
, x = time
, y = seq(time, time * n, time)
)

# generate the actual plot
ggplot(sentences_df,
aes(x, -y,
label = sentence,
group = sentence,
fill = sentence)) +
geom_label(size = 10, colour = "white", label.size = 0.3) +
transition_time(time = y) +
scale_fill_viridis_d() +
theme_void() +
theme(legend.position = "none") ->
plot2

# animate the plot
animate(plot2, nframes = n * time + time)

I think the result is very pleasing, comforting, and positive! Except maybe for the dinkum bit, but fortunately neither I or thesaurus.com know what that means, so it might as well be positive : )

If you go about creating your own animations, you can save them using the save_animation function of the gganimate package. Good luck!


PS. The code to generate the GIF at the top of this blog is posted below. It uses another gganimate function called transition_states:

set.seed(3) # for reproducibility purposes

time = 5
n = 5

# generate dataframe with random synonyms sentences and assigned locations
sentences_df <- data_frame(
sentence = paste0("You are ", sample(syn("amazing"), n), "!")
, x = runif(n)
, y = seq(time, time * n, time)
)

# generate the actual plot
ggplot(sentences_df,
aes(x, -y,
label = sentence,
group = sentence,
fill = sentence)) +
geom_label(size = 12, colour = "white", label.size = 0.5) +
transition_states(states = sentence, transition_length = time, state_length = time) +
theme_void() +
theme(legend.position = "none") +
coord_cartesian(xlim = c(-0.5, 1.5)) ->
plot3

# animate the plot
animate(plot3, nframes = n * time + time)

Animated Citation Gates turned into Selection Gates

Bret Beheim — senior researcher at the Max Planck Institute for Evolutionary Anthropology — posted a great GIF animation of the response to his research survey. He calls the figure citation gates, relating the year of scientific publication to the likelihood that the research materials are published open-source or accessible.

To generate the visualization, Bret used R’s base plotting functionality combined with Thomas Lin Pedersen‘s R package tweenrto animate it.

Bret shared his R code for the above GIF of his citation gates on GitHub. With the open source code, this amazing visual display inspired others to make similar GIFs for their own projects. For example, Anne-Wil Kruijt’s dance of the confidence intervals:

A spin-off of the citation gates: A gif showing confidence intervals of sample means.

Applied to a Human Resource Management context, we could use this similar animation setup to explore, for instance, recruitment, selection, or talent management processes.

Unfortunately, I couldn’t get the below figure to animate properly yet, but I am working on it (damn ggplot2 facets). It’s a quick simulation of how this type of visualization could help to get insights into the recruitment and selection process for open vacancies.

The figure shows how nearly 200 applicants — sorted by their age — go through several selection barriers. A closer look demonstrates that some applicants actually skip the screening and assessment steps and join via a fast lane in the first interview round, which could happen, for instance, when there are known or preferred internal candidates. When animated, such insights would become more clearly visible.