Past week, Analytics in HR published a guest blog about one of my People Analytics projects which you can read here. In the blog, I explain why and how I examined the turnover of management trainees in light of the international work assignments they go on.
For the analyses, I used a statistical model called a survival analysis – also referred to as event history analysis, reliability analysis, duration analysis, time-to-event analysis, or proporational hazard models. It estimates the likelihood of an event occuring at time t, potentially as a function of certain data.
The sec version of surival analysis is a relatively easy model, requiring very little data. You can come a long way if you only have the time of observation (in this case tenure), and whether or not an event (turnover in this case) occured. For my own project, I had two organizations, so I added a source column as well (see below).
# LOAD REQUIRED PACKAGES #### library(tidyverse) library(ggfortify) library(survival) # SET PARAMETERS #### set.seed(2) sources = c("Organization Red","Organization Blue") prob_leave = c(0.5, 0.5) prob_stay = c(0.8, 0.2) n = 60 # SIMULATE DATASETS #### bind_rows( tibble( Tenure = sample(1:80, n*2, T), Source = sample(sources, n*2, T, prob_leave), Turnover = T ), tibble( Tenure = sample(1:85, n*25, T), Source = sample(sources, n*25, T, prob_stay), Turnover = F ) ) -> data_surv # RUN SURVIVAL MODEL #### sfit <- survfit(Surv(data_surv$Tenure, event = data_surv$Turnover) ~ data_surv$Source) # PLOT SURVIVAL #### autoplot(sfit, censor = F, surv.geom = 'line', surv.size = 1.5, conf.int.alpha = 0.2) + scale_x_continuous(breaks = seq(0, max(data_surv$Tenure), 12)) + coord_cartesian(xlim = c(0,72), ylim = c(0.4, 1)) + scale_color_manual(values = c("blue", "red")) + scale_fill_manual(values = c("blue", "red")) + theme_light() + theme(legend.background = element_rect(fill = "transparent"), legend.justification = c(0, 0), legend.position = c(0, 0), legend.text = element_text(size = 12) ) + labs(x = "Length of service", y = "Percentage employed", title = "Survival model applied to the retention of new trainees", fill = "", color = "")

Using the code above, you should be able to conduct a survival analysis and visualize the results for your own projects. Please do share your results!