Tag: experiment

Propensity Score Matching Explained Visually

Propensity score matching (wiki) is a statistical matching technique that attempts to estimate the effect of a treatment (e.g., intervention) by accounting for the factors that predict whether an individual would be eligble for receiving the treatment. The wikipedia page provides a good example setting:

Say we are interested in the effects of smoking on health. Here, smoking would be considered the treatment, and the ‘treated’ are simply those who smoke. In order to find a cause-effect relationship, we would need to run an experiment and randomly assign people to smoking and non-smoking conditions. Of course such experiments would be unfeasible and/or unethical, as we can’t ask/force people to smoke when we suspect it may do harm.
We will need to work with observational data instead. Here, we estimate the treatment effect by simply comparing health outcomes (e.g., rate of cancer) between those who smoked and did not smoke. However, this estimation would be biased by any factors that predict smoking (e.g., social economic status). Propensity score matching attempts to control for these differences (i.e., biases) by making the comparison groups (i.e., smoking and non-smoking) more comparable.

Lucy D’Agostino McGowan is a post-doc at Johns Hopkins Bloomberg School of Public Health and co-founder of R-Ladies Nashville. She wrote a very nice blog explaining what propensity score matching is and showing how to apply it to your dataset in R. Lucy demonstrates how you can use propensity scores to weight your observations in such a way that accounts for the factors that correlate with receiving a treatment. Moreover, her explainations are strenghtened by nice visuals that intuitively demonstrate what the weighting does to the “pseudo-populations” used to estimate the treatment effect.

Have a look yourself: https://livefreeordichotomize.com/2019/01/17/understanding-propensity-score-weighting/

How to find two identical Skittles packs?

In a hilarious experiment the anonymous mathematician behind the website Possibly Wrong estimated that s/he only needed to open “about 400-500” packs of Skittles to find an identifical pack.

From January 12th up to April 6th, s/he put it to the test and counted the contents of an astonishing 468 packs, containing over 27.000 individual Skittles! Read all about the experiment here.

Overview of the contents of the Skittles packs, the duplicates encircled.
Via https://possiblywrong.wordpress.com/2019/04/06/follow-up-i-found-two-identical-packs-of-skittles-among-468-packs-with-a-total-of-27740-skittles/

Contents of the two duplicate Skittles packs.
Via https://possiblywrong.wordpress.com/2019/04/06/follow-up-i-found-two-identical-packs-of-skittles-among-468-packs-with-a-total-of-27740-skittles/

A/B Testing a New Look

This WordPress blogger I came across — let’s call him “John” for now — has a very peculiar way of testing out his looks. Using dating-apps like Tinder,
John conducted A/B-tests to find out whether people would prefer him romantically with or without a beard.

John with beard (via https://appsciencing.wordpress.com/)
John shaven (via https://appsciencing.wordpress.com/)

Via a proper experimental setup, John found out that bearded John receives much more attention in the form of Tinder matches. However, not from girls whom John characterized as being asian, that group seemed to prefer shaven John.

Tinder matches for bearded John by race (via https://appsciencing.wordpress.com/)
Tinder matches for shaven John by race (via https://appsciencing.wordpress.com/)

While the sample size was not too large (N_bearded = 500; N_shaven = 500) and the response rate even lower (N_bearded = 64; N_shaven = 30), this seems like a fun way to make your look more data-driven!

Read more on “John”‘s orginal blog below:

How Do You Test Out A New Look? Dating Apps!

12 Guidelines for Effective A/B Testing

I wrote about Emily Robinson and her A/B testing activities at Etsy before, but now she’s back with a great new blog full of practical advice: Emily provides 12 guidelines for A/B testing that help to setup effective experiments and mitigate data-driven but erroneous conclusions:

Have one key metric for your experiment.
Use that key metric do a power calculation.
Run your experiment for the length you’ve planned on.
Pay more attention to confidence intervals than p-values.
Don’t run tons of variants.
Don’t try to look for differences for every possible segment.
Check that there’s not bucketing skew.
Don’t overcomplicate your methods.
Be careful of launching things because they “don’t hurt”.
Have a data scientist/analyst involved in the whole process.
Only include people in your analysis who could have been affected by the change.
Focus on smaller, incremental tests that change one thing at a time.

More details regarding each guideline you can read in Emily’s original blogpost.

In her blog, Emily also refers to a great article by Stephen Holiday discussing five online experiments that had (almost) gone wrong and a presentation by Dan McKinley on continuous experimentation.

Evolving Floorplans – by Joel Simon

Joel Simon is the genius behind an experimental project exploring optimized school blueprints. Joel used graph-contraction and ant-colony pathing algorithms as growth processes, which could generate elementary school designs optimized for all kinds of characteristics: walking time, hallway usage, outdoor views, and escape routes just to name a few.

Two generated designs, minimizing the traffic flow (left) as well as escape routes (right) [original]

Other designs tried to maximize the number of windows, resulting in seemingly random open courtyards [original]

Definitely check out the original write-up if you are interested in the details behind the generation process! Or have a look at some of Joel’s other projects.