Past week, Analytics in HR published a guest blog about one of my People Analytics projects which you can read here. In the blog, I explain why and how I examined the turnover of management trainees in light of the international work assignments they go on.
For the analyses, I used a statistical model called a survival analysis – also referred to as event history analysis, reliability analysis, duration analysis, time-to-event analysis, or proporational hazard models. It estimates the likelihood of an event occuring at time t, potentially as a function of certain data.
The sec version of surival analysis is a relatively easy model, requiring very little data. You can come a long way if you only have the time of observation (in this case tenure), and whether or not an event (turnover in this case) occured. For my own project, I had two organizations, so I added a source column as well (see below).
# LOAD REQUIRED PACKAGES ####
library(tidyverse)
library(ggfortify)
library(survival)
# SET PARAMETERS ####
set.seed(2)
sources = c("Organization Red","Organization Blue")
prob_leave = c(0.5, 0.5)
prob_stay = c(0.8, 0.2)
n = 60
# SIMULATE DATASETS ####
bind_rows(
tibble(
Tenure = sample(1:80, n*2, T),
Source = sample(sources, n*2, T, prob_leave),
Turnover = T
),
tibble(
Tenure = sample(1:85, n*25, T),
Source = sample(sources, n*25, T, prob_stay),
Turnover = F
)
) ->
data_surv
# RUN SURVIVAL MODEL ####
sfit <- survfit(Surv(data_surv$Tenure, event = data_surv$Turnover) ~ data_surv$Source)
# PLOT SURVIVAL ####
autoplot(sfit, censor = F, surv.geom = 'line', surv.size = 1.5, conf.int.alpha = 0.2) +
scale_x_continuous(breaks = seq(0, max(data_surv$Tenure), 12)) +
coord_cartesian(xlim = c(0,72), ylim = c(0.4, 1)) +
scale_color_manual(values = c("blue", "red")) +
scale_fill_manual(values = c("blue", "red")) +
theme_light() +
theme(legend.background = element_rect(fill = "transparent"),
legend.justification = c(0, 0),
legend.position = c(0, 0),
legend.text = element_text(size = 12)
) +
labs(x = "Length of service",
y = "Percentage employed",
title = "Survival model applied to the retention of new trainees",
fill = "",
color = "")

Using the code above, you should be able to conduct a survival analysis and visualize the results for your own projects. Please do share your results!


