Tag: application

Determine optimal sample sizes for business value in A/B testing, by Chris Said

Determine optimal sample sizes for business value in A/B testing, by Chris Said

A/B testing is a method of comparing two versions of some thing against each other to determine which is better. A/B tests are often mentioned in e-commerce contexts, where the things we are comparing are web pages.

ab-testing
via optimizely.com/nl/optimization-glossary/ab-testing/

Business leaders and data scientists alike face a difficult trade-off when running A/B tests: How big should the A/B test be? Or in other words, After collecting how many data points, or running for how many days, should we make a decision whether A or B is the best way to go?

This is a tradeoff because the sample size of an A/B test determines its statistical power. This statistical power, in simple terms, determines the probability of a A/B test showing an effect if there is actually really an effect. In general, the more data you collect, the higher the odds of you finding the real effect and making the right decision.

By default, researchers often aim for 80% power, with a 5% significance cutoff. But is this general guideline really optimal for the tradeoff between costs and benefits in your specific business context? Chris thinks not.

Chris said wrote a great three-piece blog in which he explains how you can mathematically determine the optimal duration of A/B-testing in your own company setting:

Part I: General Overview. Starts with a mostly non-technical overview and ends with a section called “Three lessons for practitioners”.

Part II: Expected lift. A more technical section that quantifies the benefits of experimentation as a function of sample size.

Part III: Aggregate time-discounted lift. A more technical section that quantifies the costs of experimentation as a function of sample size. It then combines costs and benefits into a closed-form expression that can be optimized. Ends with an FAQ.

Chris Said (via)

Moreover, Chris provides three practical advices that show underline 80% statistical power is not always the best option:

  1. You should run “underpowered” experiments if you have a very high discount rate
  2. You should run “underpowered” experiments if you have a small user base
  3. Neverheless, it’s far better to run your experiment too long than too short
Simulations shows that for Chris’ hypothetical company and A/B test, 38 days would be the optimal period of time to gather data
via chris-said.io/2020/01/10/optimizing-sample-sizes-in-ab-testing-part-I/

Chris ran all his simulations in Python and shared the notebooks.

The 12 Truths of Machine Learning – by Delip Rao

The 12 Truths of Machine Learning – by Delip Rao

In this original blog, with equally original title, Delip Rao poses twelve (+1) harsh truths about the real world practice of machine learning. I found it quite enlightning to read a non-hyped article about ML for once. Particularly because Delip’s experiences seem to overlap quite nicely with the principles of software design and Agile working.

Delip’s 12 truths I’ve copied in headers below. If they spark your interest, read more here:

  1. It has to work
  2. No matter how hard you push and no matter what the priority, you can’t increase the speed of light
  3. With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea
  4. Some things in life can never be fully appreciated nor understood unless experienced firsthand
  5. It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases, this is a bad idea
  6. It is easier to ignore or move a problem around than it is to solve it
  7. You always have to tradeoff something
  8. Everything is more complicated than you think
  9. You will always under-provision resources
  10. One size never fits all. Your model will make embarrassing errors all the time despite your best intentions
  11. Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works
  12. Perfection has been reached not when there is nothing left to add, but when there is nothing left to take away

Delip added in a +1, with his zero-indexed truth: You are Not a Scientist.

Yes, that’s all of you building stuff with machine learning with a “scientist” in the title, including all of you with PhDs, has-been-academics, and academics with one foot in the industry. Machine learning (and other AI application areas, like NLP, Vision, Speech, …) is an engineering research discipline (as opposed to science research).

Delip Rao via deliprao.com/archives/227

Delip [bio] is the VP of Research at AI Foundation where he leads speech, language, and vision research efforts for generating and detecting artificial content. You can find his personal webblog here.

Cover image via the-vital-edge.com/lie-detector

Causal Random Forests, by Mark White

Causal Random Forests, by Mark White

I stumbled accros this incredibly interesting read by Mark White, who discusses the (academic) theory behind, inner workings, and example (R) applications of causal random forests:

EXPLICITLY OPTIMIZING ON CAUSAL EFFECTS VIA THE CAUSAL RANDOM FOREST: A PRACTICAL INTRODUCTION AND TUTORIAL (By Mark White)

These so-called “honest” forests seem a great technique to identify opportunities for personalized actions: think of marketing, HR, medicine, healthcare, and other personalized recommendations. Note that an experimental setup for data collection is still necessary to gather the right data for these techniques.

https://www.markhw.com/blog/causalforestintro

Putting R in Production, by Heather Nolis & Mark Sellors

Putting R in Production, by Heather Nolis & Mark Sellors

It is often said that R is hard to put into production. Fortunately, there are numerous talks demonstrating the contrary.

Here’s one by Heather Nolis, who productionizes R models at T-Mobile. Her teams even shares open-source version of some of their productionized Tensorflow models on github. Read more about that model here.

There’s another great talk on the RStudio website. In this talk, Mark Sellors discusses some of the misinformation around the idea of what “putting something into production” actually means, and provides some tips on overcoming obstacles.

Cover image via Fotolia.

13 Data-Driven Insights to Improve Your Job Search

Talent.Works is back, elaborating on the applicant characteristics that relate to landing an interview. While the majority of applicants has a meager ~2% chance of getting invited to an interview, some applicants do way better! What accounts for their success?

job-applicants-interview-rate-histogram.png
Original can be found in the Talent Works blog

Analyzing 4000+ applicants, Talent.Works found 13 factors that related to getting an interview.

There are some things outside of the applicants’ control:

  • Young applicants have higher chances (+25%).
  • Women applicants have better chances (+48%).
  • Applicants with a second degree have better chances (+22%).

Fortunately, applicants can boost their interview invitation rate using the following tricks:

  • Apply on Monday (+46%), between 6 AM and 10 AM (+89%), and within the first four days (+65%).
  • Start sentences with action-related verbs (+140%).
  • Use numbers to demonstrate impact (+40%).
  • Use occasional buzzwords / jargon (+29%) and skills (+59%).
  • Use leadership-related words (+51%) and avoid overusing words related to teamwork and collaboration (-51%) or personal pronouns (-55%).

Here are some of these effects visualized:

apply-on-mondays-job-search-tip.png
Original can be found in the Talent Works blog

dont-be-team-player-resume-tip.png
Original can be found in the Talent Works blog

The Magic Sudoku App

The Magic Sudoku App

A few weeks ago, Magic Sudoku was released for iOS11. This app by a company named Hatchlings automatically solves sudoku puzzles using a combination of Computer Vision, Machine Learning, and Augmented Reality. The app works on iPad Pro’s and iPhone 6s or above and can be downloaded from the App Store.

Magic Sudoku App in action.

Magic Sudoku gives a magical experience when users point their phone at a Sudoku puzzle: the puzzle is instantaneously solved and displayed on their screen. In several seconds, the following occurs behind the scenes:

What happens in the ARKit app behind the scenes.

One of the original reasons I chose a Sudoku solver as our first AR app was that I knew classifying digits is basically the “hello world” of Machine Learning. I wanted to dip my toe in the water of Machine Learning while working on a real-world problem. This seemed like a realistic app to tackle.” – Brad Dwyer, Founder at Hatchlings

Particularly the training process of the app interested me. In his blog, Brad explains how they bought out the entire stock of Sudoku books of a specific bookstore and, with the help of his team, ripped each book apart to scan each small square with a number and upload in to a server. In the end, this server contained about 600,000 images, but all were completely unlabeled. Via a simple game, they asked Hatchlings users to classify these images by pressing the number keys on their keyboard. Within 24 hours, all 600,000 images were classified!

Nevertheless, some users had misunderstood the task (or just plainly ignored it) and as a consequence there were still a significant number of misidentified images. So Brad created a second tool that displayed 100 images of a single class to users, who where consequently asked to click the ones that didn’t match. These were subsequently thrown back into the first tool to be reclassified.

Quickly, the developers had enough verified data to add an automatic accuracy checker into both tools for future data runs. Funnily enough, they programmed it in such a way that users were periodically shown already known/classified images in order to check the validity of their inputs and determine how much to trust their answers going forward. This whole process reminds me on a blog I wrote recently, regarding human-computer interactions in reinforcement learning.

For several more weeks, users classified more scanned data so that, by the time the app was launched, it had been trained on over a million images of Sudoku squares. The results were amazing as the application had a 98.6% accuracy on launch (currently above 99% accuracy). One minor deficit was that the app was trained on paper Sudoku’s. However, when it aired, many users wanted to quickly test it and searched for Sudoku images on Google, which the app wouldn’t process that well.

“Problem number one was that our machine learning model was only trained on paper puzzles; it didn’t know what to think about pixels on a screen. I pulled an all nighter that first week and re-trained our model with puzzles on computer screens.

Problem number two was that ARKit only supports horizontal planes like tables and floors (not vertical planes like computer monitors). Solving this was a trickier problem but I did come up with a hacky workaround. I used a combination of some heuristics and FeaturePoint detection to place puzzles on non-horizontal planes.” – Brad Dwyer, Founder at Hatchlings

Brad and his colleagues at Hatchlings still need to work out the business model behind the ARKit Magic Sudoku app, but that’s in the meantime, download the app and let me and them know what you think: subscribe to his medium blog or follow Brad on twitter.