Tag: ecommerce

Determine optimal sample sizes for business value in A/B testing, by Chris Said

Determine optimal sample sizes for business value in A/B testing, by Chris Said

A/B testing is a method of comparing two versions of some thing against each other to determine which is better. A/B tests are often mentioned in e-commerce contexts, where the things we are comparing are web pages.

via optimizely.com/nl/optimization-glossary/ab-testing/

Business leaders and data scientists alike face a difficult trade-off when running A/B tests: How big should the A/B test be? Or in other words, After collecting how many data points, or running for how many days, should we make a decision whether A or B is the best way to go?

This is a tradeoff because the sample size of an A/B test determines its statistical power. This statistical power, in simple terms, determines the probability of a A/B test showing an effect if there is actually really an effect. In general, the more data you collect, the higher the odds of you finding the real effect and making the right decision.

By default, researchers often aim for 80% power, with a 5% significance cutoff. But is this general guideline really optimal for the tradeoff between costs and benefits in your specific business context? Chris thinks not.

Chris said wrote a great three-piece blog in which he explains how you can mathematically determine the optimal duration of A/B-testing in your own company setting:

Part I: General Overview. Starts with a mostly non-technical overview and ends with a section called “Three lessons for practitioners”.

Part II: Expected lift. A more technical section that quantifies the benefits of experimentation as a function of sample size.

Part III: Aggregate time-discounted lift. A more technical section that quantifies the costs of experimentation as a function of sample size. It then combines costs and benefits into a closed-form expression that can be optimized. Ends with an FAQ.

Chris Said (via)

Moreover, Chris provides three practical advices that show underline 80% statistical power is not always the best option:

  1. You should run “underpowered” experiments if you have a very high discount rate
  2. You should run “underpowered” experiments if you have a small user base
  3. Neverheless, it’s far better to run your experiment too long than too short
Simulations shows that for Chris’ hypothetical company and A/B test, 38 days would be the optimal period of time to gather data
via chris-said.io/2020/01/10/optimizing-sample-sizes-in-ab-testing-part-I/

Chris ran all his simulations in Python and shared the notebooks.

Free course: SQL for Data Science

Free course: SQL for Data Science

Kirsten Kehrer from datamovesme.com does all kinds of super valuable stuff in SQL and end of 2019 she decided to share it with the world via a free SQL course.

Screenshot from youtube.com/watch?v=W5AiLYR02l8

The course is focused on data science and has 5 modules with video lectures:

Kirsten advises to take the course with dual monitors, as she also provides an online SQL query builder environment, where you can write your queries during the videos.


Moreover, Kirsten also published the slides and the code to go with the course, so you can really learn along:

A nice touch is that Kirsten simulated some data for a fictitious e-commerce company, that really allows you to get a feel for the type of data you’d be working with in practice:

Screenshot from youtube.com/watch?v=itpvD0Eb_s4

A/B testing and Statistics at Etsy, by Emily Robinson

A/B testing and Statistics at Etsy, by Emily Robinson

Generating numbers is easy; generating numbers you should trust is hard!

Emily Robinson is a data scientist at Etsy, an e-commerce website for handmade and vintage products. In the #rstats community, Emily is nearly as famous as her brother David Robinson, whom we know from the tidytext R-package.

Like any large tech company, Etsy relies heavily on statistics to improve their way of doing business. In their case, data from real-life experiments provide the business intelligence that allow effective decision-making. For instance, they experiment with the layout of their buttons, with the text shown near products, or with the suggestions made after a search query. To detect whether such changes have (ever so) small effects on Etsy’s KPI’s (e.g., conversion), data scientists such as Emily rely on traditional A/B testing.

In a 40-minute presentation, Emily explains how statistical issues such as skewed distributions, outliers, and power are dealt with at Etsy, among others using bootstrapping and simulations. Moreover, 30 minutes in Emily shares her lessons when it comes to working with (less stats-savvy) business stakeholders. For instance, how to help identify and transform business questions into data questions back into business solutions, or how to deal with the desire to peek at the results of experiments early.

Overall, I can the presentation below, the slides of which you find on Emily’s GitHub.