Tag: power

Determine optimal sample sizes for business value in A/B testing, by Chris Said

A/B testing is a method of comparing two versions of some thing against each other to determine which is better. A/B tests are often mentioned in e-commerce contexts, where the things we are comparing are web pages.

Business leaders and data scientists alike face a difficult trade-off when running A/B tests: How big should the A/B test be? Or in other words, After collecting how many data points, or running for how many days, should we make a decision whether A or B is the best way to go?

This is a tradeoff because the sample size of an A/B test determines its statistical power. This statistical power, in simple terms, determines the probability of a A/B test showing an effect if there is actually really an effect. In general, the more data you collect, the higher the odds of you finding the real effect and making the right decision.

By default, researchers often aim for 80% power, with a 5% significance cutoff. But is this general guideline really optimal for the tradeoff between costs and benefits in your specific business context? Chris thinks not.

Chris said wrote a great three-piece blog in which he explains how you can mathematically determine the optimal duration of A/B-testing in your own company setting:

Part I: General Overview. Starts with a mostly non-technical overview and ends with a section called “Three lessons for practitioners”.
Part II: Expected lift. A more technical section that quantifies the benefits of experimentation as a function of sample size.
Part III: Aggregate time-discounted lift. A more technical section that quantifies the costs of experimentation as a function of sample size. It then combines costs and benefits into a closed-form expression that can be optimized. Ends with an FAQ.
Chris Said (via)

Moreover, Chris provides three practical advices that show underline 80% statistical power is not always the best option:

You should run “underpowered” experiments if you have a very high discount rate
You should run “underpowered” experiments if you have a small user base
Neverheless, it’s far better to run your experiment too long than too short

Simulations shows that for Chris’ hypothetical company and A/B test, 38 days would be the optimal period of time to gather data
via chris-said.io/2020/01/10/optimizing-sample-sizes-in-ab-testing-part-I/

Chris ran all his simulations in Python and shared the notebooks.

A/B testing and Statistics at Etsy, by Emily Robinson

Generating numbers is easy; generating numbers you should trust is hard!

Emily Robinson is a data scientist at Etsy, an e-commerce website for handmade and vintage products. In the #rstats community, Emily is nearly as famous as her brother David Robinson, whom we know from the tidytext R-package.

Like any large tech company, Etsy relies heavily on statistics to improve their way of doing business. In their case, data from real-life experiments provide the business intelligence that allow effective decision-making. For instance, they experiment with the layout of their buttons, with the text shown near products, or with the suggestions made after a search query. To detect whether such changes have (ever so) small effects on Etsy’s KPI’s (e.g., conversion), data scientists such as Emily rely on traditional A/B testing.

In a 40-minute presentation, Emily explains how statistical issues such as skewed distributions, outliers, and power are dealt with at Etsy, among others using bootstrapping and simulations. Moreover, 30 minutes in Emily shares her lessons when it comes to working with (less stats-savvy) business stakeholders. For instance, how to help identify and transform business questions into data questions back into business solutions, or how to deal with the desire to peek at the results of experiments early.

Overall, I can the presentation below, the slides of which you find on Emily’s GitHub.

Sorting Algorithms 101: Visualized

Sorting is one of the central topic in most Computer Science degrees. In general, sorting refers to the process of rearranging data according to a defined pattern with the end goal of transforming the original unsorted sequence into a sorted sequence. It lies at the heart of successful businesses ventures — such as Google and Amazon — but is also present in many applications we use daily — such as Excel or Facebook.

Many different algorithms have been developed to sort data. Wikipedia lists as many as 45 and there are probably many more. Some work by exchanging data points in a sequence, others insert and/or merge parts of the sequence. More importantly, some algorithms are quite effective in terms of the time they take to sort data — taking only $n$ time to sort $n$ datapoints — whereas others are very slow — taking as much as $n^2$ . Moreover, some algorithms are stable — in the sense that they always take the same amount of time to process $n$ datapoints — whereas others may fluctuate in terms of processing time based on the original order of the data.

I really enjoyed this video by TED-Ed on how to best sort your book collection. It provides a very intuitive introduction into sorting strategies (i.e., algorithms). Moreover, Algorithms to Live By (Christian & Griffiths, 2016) provided the amazing suggestion to get friends and pizza in whenever you need to sort something, next to the great explanation of various algorithms and their computational demand.

The main reason for this blog is that I stumbled across some nice video’s and GIFs of sorting algorithms in action. These visualizations are not only wonderfully intriguing to look at, but also help so much in understanding how the sorting algorithms process the data under the hood. You might want to start with the 4-minute YouTube video below, demonstrating how nine different sorting algorithms (Selection Sort, Shell Sort, Insertion Sort, Merge Sort, Quick Sort, Heap Sort, Bubble Sort, Comb Sort, & Cocktail Sort) process a variety of datasets.

This interactive website toptal.com allows you to play around with the most well-known sorting algorithms, putting them to work on different datasets. For the grande finale, I found these GIFs and short video’s of several sorting algorithms on imgur. In the visualizations below, each row of the image represents an independent list being sorted. You can see that Bubble Sort is quite slow:

Cocktail Shaker Sort already seems somewhat faster, but still takes quite a while.

For some algorithms, the visualization clearly shows that the settings you pick matter. For instance, Heap Sort is much quicker if you choose to shift down instead of up.

In contrast, for Merge Sort it doesn’t matter whether you sort by breadth first or depth first.

The imgur overview includes many more visualized sorting algorithms but I don’t want to overload WordPress or your computer, so I’ll leave you with two types of Radix Sort, the rest you can look up yourself!