Tag: business

Determine optimal sample sizes for business value in A/B testing, by Chris Said

Determine optimal sample sizes for business value in A/B testing, by Chris Said

A/B testing is a method of comparing two versions of some thing against each other to determine which is better. A/B tests are often mentioned in e-commerce contexts, where the things we are comparing are web pages.

ab-testing
via optimizely.com/nl/optimization-glossary/ab-testing/

Business leaders and data scientists alike face a difficult trade-off when running A/B tests: How big should the A/B test be? Or in other words, After collecting how many data points, or running for how many days, should we make a decision whether A or B is the best way to go?

This is a tradeoff because the sample size of an A/B test determines its statistical power. This statistical power, in simple terms, determines the probability of a A/B test showing an effect if there is actually really an effect. In general, the more data you collect, the higher the odds of you finding the real effect and making the right decision.

By default, researchers often aim for 80% power, with a 5% significance cutoff. But is this general guideline really optimal for the tradeoff between costs and benefits in your specific business context? Chris thinks not.

Chris said wrote a great three-piece blog in which he explains how you can mathematically determine the optimal duration of A/B-testing in your own company setting:

Part I: General Overview. Starts with a mostly non-technical overview and ends with a section called “Three lessons for practitioners”.

Part II: Expected lift. A more technical section that quantifies the benefits of experimentation as a function of sample size.

Part III: Aggregate time-discounted lift. A more technical section that quantifies the costs of experimentation as a function of sample size. It then combines costs and benefits into a closed-form expression that can be optimized. Ends with an FAQ.

Chris Said (via)

Moreover, Chris provides three practical advices that show underline 80% statistical power is not always the best option:

  1. You should run “underpowered” experiments if you have a very high discount rate
  2. You should run “underpowered” experiments if you have a small user base
  3. Neverheless, it’s far better to run your experiment too long than too short
Simulations shows that for Chris’ hypothetical company and A/B test, 38 days would be the optimal period of time to gather data
via chris-said.io/2020/01/10/optimizing-sample-sizes-in-ab-testing-part-I/

Chris ran all his simulations in Python and shared the notebooks.

Free Springer Books during COVID19

Free Springer Books during COVID19

Book publisher Springer just released over 400 book titles that can be downloaded free of charge following the corona-virus outbreak.

Here’s fhe full overview: https://link.springer.com/search?facet-content-type=%22Book%22&package=mat-covid19_textbooks&facet-language=%22En%22&sortOrder=newestFirst&showAll=true

Most of these books will normally set you back about $50 to $150, so this is a great deal!

There are many titles on computer science, programming, business, psychology, and here are some specific titles that might interest my readership:

Note that I only got to page 8 of 21, so there are many more free interesting titles out there!

Join 244 other followers

A/B testing and Statistics at Etsy, by Emily Robinson

A/B testing and Statistics at Etsy, by Emily Robinson

Generating numbers is easy; generating numbers you should trust is hard!

Emily Robinson is a data scientist at Etsy, an e-commerce website for handmade and vintage products. In the #rstats community, Emily is nearly as famous as her brother David Robinson, whom we know from the tidytext R-package.

Like any large tech company, Etsy relies heavily on statistics to improve their way of doing business. In their case, data from real-life experiments provide the business intelligence that allow effective decision-making. For instance, they experiment with the layout of their buttons, with the text shown near products, or with the suggestions made after a search query. To detect whether such changes have (ever so) small effects on Etsy’s KPI’s (e.g., conversion), data scientists such as Emily rely on traditional A/B testing.

In a 40-minute presentation, Emily explains how statistical issues such as skewed distributions, outliers, and power are dealt with at Etsy, among others using bootstrapping and simulations. Moreover, 30 minutes in Emily shares her lessons when it comes to working with (less stats-savvy) business stakeholders. For instance, how to help identify and transform business questions into data questions back into business solutions, or how to deal with the desire to peek at the results of experiments early.

Overall, I can the presentation below, the slides of which you find on Emily’s GitHub.