Tag: machinelearning

ML Model Degradation, and why work only just starts when you reach production

ML Model Degradation, and why work only just starts when you reach production

The assumption that a Machine Learning (ML) project is done when a trained model is put into production is quite faulty. Neverthless, according to Alexandre Gonfalonieri — artificial intelligence (AI) strategist at Philips — this assumption is among the most common mistakes of companies taking their AI products to market.

Actually, in the real world, we see pretty much the opposite of this assumption. People like Alexandre therefore strongly recommend companies keep their best data scientists and engineers on a ML project, especially after it reaches production!

Why?

If you’ve ever productionized a model and really started using it, you know that, over time, your model will start performing worse.

In order to maintain the original accuracy of a ML model which is interacting with real world customers or processes, you will need to continuously monitor and/or tweak it!

In the best case, algorithms are retrained with each new data delivery. This offers a maintenance burden that is not fully automatable. According to Alexandre, tending to machine learning models demands the close scrutiny, critical thinking, and manual effort that only highly trained data scientists can provide.

This means that there’s a higher marginal cost to operating ML products compared to traditional software. Whereas the whole reason we are implementing these products is often to decrease (the) costs (of human labor)!

What causes this?

Your models’ accuracy will often be at its best when it just leaves the training grounds.

Building a model on relevant and available data and coming up with accurate predictions is a great start. However, for how long do you expect those data — that age by the day — continue to provide accurate predictions?

Chances are that each day, the model’s latent performance will go down.

This phenomenon is called concept drift, and is heavily studied in academia but less often considered in business settings. Concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways.

In simpler terms, your model is no longer modelling the outcome that it used to model. This causes problems because the predictions become less accurate as time passes.

Particularly, models of human behavior seem to suffer from this pitfall.

The key is that, unlike a simple calculator, your ML model interacts with the real world. And the data it generates and that reaches it is going to change over time. A key part of any ML project should be predicting how your data is going to change over time.

Read more about concept drift here.

Via

How do we know when our models fail?

You need to create a monitoring strategy before reaching production!

According to Alexandre, as soon as you feel confident with your project after the proof-of-concept stage, you should start planning a strategy for keeping your models up to date.

How often will you check in?

On the whole model, or just some features?

What features?

In general, sensible model surveillance combined with a well thought out schedule of model checks is crucial to keeping a production model accurate. Prioritizing checks on the key variables and setting up warnings for when a change has taken place will ensure that you are never caught by a surprise by a change to the environment that robs your model of its efficacy.

Alexandre via

Your strategy will strongly differ based on your model and your business context.

Moreover, there are many different types of concept drift that can affect your models, so it should be a key element to think of the right strategy for you specific case!

Image result for concept drift
Different types of model drift (via)

Let’s solve it!

Once you observe degraded model performance, you will need to redesign your model (pipeline).

One solution is referred to as manual learning. Here, we provide the newly gathered data to our model and re-train and re-deploy it just like the first time we build the model. If you think this sounds time-consuming, you are right. Moreover, the tricky part is not refreshing and retraining a model, but rather thinking of new features that might deal with the concept drift.

A second solution could be to weight your data. Some algorithms allow for this very easily. For others you will need to custom build it in yourself. One recommended weighting schema is to use the inversely proportional age of the data. This way, more attention will be paid to the most recent data (higher weight) and less attention to the oldest of data (smaller weight) in your training set. In this sense, if there is drift, your model will pick it up and correct accordingly.

According to Alexandre and many others, the third and best solution is to build your productionized system in such a way that you continuously evaluate and retrain your models. The benefit of such a continuous learning system is that it can be automated to a large extent, thus reducing (the human labor) maintance costs.

Although Alexandre doesn’t expand on how to do these, he does formulate the three steps below:

Via the original blog

In my personal experience, if you have your model retrained (automatically) every now and then, using a smart weighting schema, and keep monitoring the changes in the parameters and for several “unit-test” cases, you will come a long way.

If you’re feeling more adventureous, you could improve on matters by having your model perform some exploration (at random or rule-wise) of potential new relationships in your data (see for instance multi-armed bandits). This will definitely take you a long way!

Solving concept drift (via)
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Google Brain researchers published this amazing paper, with accompanying GIF where they show the true power of AutoML.

AutoML stands for automated machine learning, and basically refers to an algorithm autonomously building the best machine learning model for a given problem.

This task of selecting the best ML model is difficult as it is. There are many different ML algorithms to choose from, and each of these has many different settings ([hyper]parameters) you can change to optimalize the model’s predictions.

For instance, let’s look at one specific ML algorithm: the neural network. Not only can we try out millions of different neural network architectures (ways in which the nodes and lyers of a network are connected), but each of these we can test with different loss functions, learning rates, dropout rates, et cetera. And this is only one algorithm!

In their new paper, the Google Brain scholars display how they managed to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. Using evolutionary principles, they have developed an AutoML framework that tailors its own algorithms and architectures to best fit the data and problem at hand.

This is AI research at its finest, and the results are truly remarkable!

GIF for the interpretation of the best evolved algorithm

You can read the full paper open access here: https://arxiv.org/abs/2003.03384 (quick download link)

The original code is posted here on github: github.com/google-research/google-research/tree/master/automl_zero#automl-zero

GIF for the experiment progress
Building a $86 million car theft AI in 57 lines of JavaScript

Building a $86 million car theft AI in 57 lines of JavaScript

Tait Brown was annoyed at the Victoria Police who had spent $86 million Australian dollars on developing the BlueNet system which basically consists of an license-plate OCR which crosschecks against a car theft database.

Tait was so disgruntled as he thought he could easily replicate this system without spending millions and millions of tax dollars. And so he did. In only 57 lines of JavaScript, though, to be honest, there are many more lines of code hidden away in abstraction and APIs…

Anyway, he built a system that can identify license plates, read them, and should be able to cross check them with a criminal database.

Via Medium

I really liked reading about this project, so please do so if you’re curious via the links below:

Part 1: How I replicated an $86 million project in 57 lines of code

Part 2: Remember the $86 million license plate scanner I replicated?

Part X: the code on Github

Cover image via Medium via Freepik

The 12 Truths of Machine Learning – by Delip Rao

The 12 Truths of Machine Learning – by Delip Rao

In this original blog, with equally original title, Delip Rao poses twelve (+1) harsh truths about the real world practice of machine learning. I found it quite enlightning to read a non-hyped article about ML for once. Particularly because Delip’s experiences seem to overlap quite nicely with the principles of software design and Agile working.

Delip’s 12 truths I’ve copied in headers below. If they spark your interest, read more here:

  1. It has to work
  2. No matter how hard you push and no matter what the priority, you can’t increase the speed of light
  3. With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea
  4. Some things in life can never be fully appreciated nor understood unless experienced firsthand
  5. It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases, this is a bad idea
  6. It is easier to ignore or move a problem around than it is to solve it
  7. You always have to tradeoff something
  8. Everything is more complicated than you think
  9. You will always under-provision resources
  10. One size never fits all. Your model will make embarrassing errors all the time despite your best intentions
  11. Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works
  12. Perfection has been reached not when there is nothing left to add, but when there is nothing left to take away

Delip added in a +1, with his zero-indexed truth: You are Not a Scientist.

Yes, that’s all of you building stuff with machine learning with a “scientist” in the title, including all of you with PhDs, has-been-academics, and academics with one foot in the industry. Machine learning (and other AI application areas, like NLP, Vision, Speech, …) is an engineering research discipline (as opposed to science research).

Delip Rao via deliprao.com/archives/227

Delip [bio] is the VP of Research at AI Foundation where he leads speech, language, and vision research efforts for generating and detecting artificial content. You can find his personal webblog here.

Cover image via the-vital-edge.com/lie-detector

Why cancer screening is the last thing you should pick first to work on with AI

Why cancer screening is the last thing you should pick first to work on with AI

I came across this opinionated though informed commentary by Vinay Prasad on the recent Nature article where Google’s machine learning experts trained models to predict whether scans of patients’ breasts (mammogram’s) show cancerous cells or not.

Vinay Prasad [official bio] is a practicing hematologist-oncologist and Associate Professor of Medicine at Oregon Health and Science University. So he knows what he’s talking about.

He argues that “cancer screening is the LAST thing you should pick FIRST to work on with AI”. Which is an interesting statement in and of itself.

Regardless of my personal opinion on the topic, I found the paper, Vinay’s commentary, and the broader discussion on twitter very interesting and educational to read. I feel it shows how important it is to know the context in which you are applying machine learning. What tremendous value it provides to have domain experts in the same team as the data and machine learning experts.

I cannot explain this better than Vinay himself, so please have a read of the original twitter thread here:

If you’re interested in this kind of topics, I wrote about IBM’s Watson adventures in health analytics a few years back: https://paulvanderlaken.com/2017/09/12/ibms-watson-for-oncology-a-biased-and-unproven-recommendation-system-in-cancer-treatment/

An excerpt from the twitter thread
Animated Machine Learning Classifiers

Animated Machine Learning Classifiers

Ryan Holbrook made awesome animated GIFs in R of several classifiers learning a decision rule boundary between two classes. Basically, what you see is a machine learning model in action, learning how to distinguish data of two classes, say cats and dogs, using some X and Y variables.

These visuals can be great to understand these algorithms, the models, and their learning process a bit better.

Here’s the original tweet, with the logistic regression animation. If you follow it, you will find a whole thread of classifier GIFs. These I extracted, pasted, and explained below.

Below is the GIF which I extracted using EZgif.com.

What you see is observations from two classes, say cats and dogs, each represented using colored dots. The dots are placed along X and Y axes, which represent variables about the observations. Their tail lengths and their hairyness, for instance.

Now there’s an optimal way to seperate these classes, which is the dashed line. That line best seperates the cats from the dogs based on these two variables X and Y. As this is an optimal boundary given this data, it is stable, it does not change.

However, there’s also a solid black line, which does change. This line represents the learned boundary by the machine learning model, in this case using logistic regression. As the model is shown more data, it learns, and the boundary is updated. This learned boundary represents the best line with which the model has learned to seperate cats from dogs.

Anything above the boundary is predicted to be class 1, a dog. Everything below predicted to be class 2, a cat. As logistic regression results in a linear model, the seperation boundary is very much linear/straight.

Logistic regression gif by Ryan Holbrook

These animations are great to get a sense of how the models come to their boundaries in the back-end.

For instance, other machine learning models are able to use non-linear boundaries to dinstinguish classes, such as this quadratic discriminant analysis (qda). This “learned” boundary is much closer to the optimal boundary:

Quadratic discriminant analysis gif by Ryan Holbrook

Models using multivariate adaptive regression splines (or MARS) seem to result in multiple linear boundaries pasted together:

Multivariate adaptive regression splines gif by Ryan Holbrook

Next, we have the k-nearest neighbors algorithm, which predicts for each point (animal) the class (cat/dog) based on the “k” points closest to it. As you see, this results in a highly fluctuating, localized boundary.

K-nearest neighbors gif by Ryan Holbrook

Now, Ryan decided to push the challenge, and simulate new data for two classes with a more difficult decision boundary. The new data and optimal boundaries look like this:

The optimal decision boundary.
Via https://mathformachines.com/posts/decision/

On these data, Ryan put a whole range of non-linear models to work.

Like this support-vector machine, which tries to create optimal boundaries built of support vectors around all the cats and all the dohs (this is definitely not a technical, error-free explanation of what’s happening here).

Support vector machine gif by Ryan Holbrook

Generalized additive models are also cool to see in action. Why Ryan’s versions render so slowly, I don’t know. To learn more about GAMs, I strongly advise this tutorial here.

Generalized additive model gif by Ryan Holbrook

Let’s jump into some tree-based algorithms and the resulting models. A decision tree classifies data based on multiple, sequential, binary splits. Here, Ryan trained a simple decision tree:

Decision tree gif by Ryan Holbrook

As well as it’s big brother, a random forest, which uses hundreds of trees in the back end and thus results in a more flexible boundary:

Random forest gif by Ryan Holbrook

Extreme gradient boosting is also a tree-based algorithm, which leverages many machine learning techniques to optimize the bias-variance tradeoff. Here’s an earlier blog on how to get started with Xgboost in Python or R:

Extreme gradient boosting gif by Ryan Holbrook

Finally, a machine learning project is not complete without an artificial neural network. Learn more on these here:

Artificial neural network gif by Ryan Holbrook

If you want to know more about this project of Ryan Holbrook, do have a look at his accompanying blog here. You can also find Ryan’s code here on github.

Overviews of Graph Classification and Network Clustering methods

Thanks to Sebastian Raschka I am able to share this great GitHub overview page of relevant graph classification techniques, and the scientific papers behind them. The overview divides the algorithms into four groups:

  1. Factorization
  2. Spectral and Statistical Fingerprints
  3. Deep Learning
  4. Graph Kernels

Moreover, the overview contains links to similar collections on community detectionclassification/regression trees and gradient boosting papers with implementations.

As well as a link to relevant graph classification benchmark datasets.