Category: machine learning

Google’s Guidebook for Developing AI Product Development

Google’s Guidebook for Developing AI Product Development

I came across another great set of curated resources by one of the teams at Google:

The People + AI Guidebook.

The People + AI Guidebook was written to help user experience (UX) professionals and product managers follow a human-centered approach to AI.

The Guidebookโ€™s recommendations are based on data and insights from over a hundred individuals across Google product teams, industry experts, and academic research.

These six chapters follow the product development flow, and each one has a related worksheet to help turn guidance into action.

The People & AI guidebook is one of the products of the major PAIR project team (People & AI Research).

Here are the direct links to the six guidebook chapters:

Links to the related worksheets you can find here.

Repository of Production Machine Learning

Repository of Production Machine Learning

The Institute for Ethical Machine Learning compiled this amazing curated list of open source libraries that will help you deploy, monitor, version, scale, and secure your production machine learning.

๐Ÿ” Explaining predictions & models๐Ÿ” Privacy preserving ML๐Ÿ“œ Model & data versioning
๐Ÿ Model Training Orchestration๐Ÿ’ช Model Serving and Monitoring๐Ÿค– Neural Architecture Search
๐Ÿ““ Reproducible Notebooks๐Ÿ“Š Visualisation frameworks๐Ÿ”  Industry-strength NLP
๐Ÿงต Data pipelines & ETL๐Ÿท๏ธ Data Labelling๐Ÿ—ž๏ธ Data storage
๐Ÿ“ก Functions as a service๐Ÿ—บ๏ธ Computation distribution๐Ÿ“ฅ Model serialisation
๐Ÿงฎ Optimized calculation frameworks๐Ÿ’ธ Data Stream Processing๐Ÿ”ด Outlier and Anomaly Detection
๐ŸŒ€ Feature engineering๐ŸŽ Feature Storesโš” Adversarial Robustness
๐Ÿ’ฐ Commercial Platforms
Direct links to the sections of the Github repo

The Institute for Ethical Machine Learning is a think-tank that brings together with technology leaders, policymakers & academics to develop standards for ML.

Tutorial: Demystifying Deep Learning for Data Scientists

Tutorial: Demystifying Deep Learning for Data Scientists

In this great tutorial for PyCon 2020, Eric Ma proposes a very simple framework for machine learning, consisting of only three elements:

  1. Model
  2. Loss function
  3. Optimizer

By adjusting the three elements in this simple framework, you can build any type of machine learning program.

In the tutorial, Eric shows you how to implement this same framework in Python (using jax) and implement linear regression, logistic regression, and artificial neural networks all in the same way (using gradient descent).

I can’t even begin to explain it as well as Eric does himself, so I highly recommend you watch and code along with the Youtube tutorial (~1 hour):

If you want to code along, here’s the github repository: github.com/ericmjl/dl-workshop

Have you ever wondered what goes on behind the scenes of a deep learning framework? Or what is going on behind that pre-trained model that you took from Kaggle? Then this tutorial is for you! In this tutorial, we will demystify the internals of deep learning frameworks – in the process equipping us with foundational knowledge that lets us understand what is going on when we train and fit a deep learning model. By learning the foundations without a deep learning framework as a pedagogical crutch, you will walk away with foundational knowledge that will give you the confidence to implement any model you want in any framework you choose.

https://www.youtube.com/watch?v=gGu3pPC_fBM
An ABC of Artificial Intelligence Concepts

An ABC of Artificial Intelligence Concepts

Yet another great resource by one of the teams at Google in collaboration with Oxford:

An ABC of Artificial Intelligence-related concepts!

The G is for GANs: Generative Adverserial Networks.

Want to know what GANs are all about?

Just read along with Google’s laymen explanation! Here’s an excerpt:

The P is for Predictions.

Currently the ABC is only available in English, but other language translations come available soon.

Check it out yourself!

How 457 data scientists failed to predict life outcomes

How 457 data scientists failed to predict life outcomes

This blog highlights a recent PNAS paper in which 457 data scientists and academic scholars were challenged use machine learning to predict life outcomes using a rich dataset.

Yet, I can not summarize the result better than this tweet by the author of the paper:

Over 750 scientific papers have used the Fragile Families dataset.

The dataset is famous for its richness of cohort (survey) data on the included families’ lives and their childrens’ upbringings. It includes a whopping 12.942 variables!!

Some of these variables reflect interesting life outcomes of the included families.

For instance, the childrens’ grade point averages (GPA) and grit, but also whether the family was ever evicted or experienced hardship, or whether their primary caregiver had received job training or was laid off at work.

You can read more about the exact data contents in the paper’s appendix.

A visual representation of the data
via pnas.org/content/pnas/117/15/8398/F1.medium.gif

Now Matthew and his co-authors shared this enormous dataset with over 160 teams consisting of 457 academics researchers and data scientists alike. Each of them well versed in statistics and predictive modelling.

These data scientists were challenged with this task: by all means possible, make the most predictive model for the six life outcomes (i.e., GPA, conviction, etc).

The scientists could use all the Fragile Families data, and any algorithm they liked, and their final model and its predictions would be compared against the actual life outcomes in a holdout sample.

According to the paper, many of these teams used machine-learning methods that are not typically used in social science research and that explicitly seek to maximize predictive accuracy.

Now, here’s the summary again:

If hundreds of [data] scientists created predictive algorithms with high-quality data, how well would the best predict life outcomes?

Not very well.

@msalganik

Even the best among the 160 teams’ predictions showed disappointing resemblance of the actual life outcomes. None of the trained models/algorithms achieved an R-squared of over 0.25.

Afbeelding
Via twitter.com/msalganik/status/1263886779603705856/photo/1

Here’s that same plot again, but from the original publication and with more detail:

Via pnas.org/content/117/15/8398

Wondering what these best R-squared of around 0.20 look like? Here’s the disappointg reality of plot C enlarged: the actual TRUE GPA’s on the x-axis, plotted against the best team’s predicted GPA’s on the y-axis.

Afbeelding
Via twitter.com/msalganik/status/1263886781449191424/photo/1

Sure, there’s some relationship, with higher actual scores getting higher (average) predictions. But it ain’t much.

Moreover, there’s very little variation in the predictions. They all clump together between the range of about 2.1 and 3.8… that’s not really setting apart the geniuses from the less bright!

Matthew sums up the implications quite nicely in one of his tweets:

For policymakers deploying predictive algorithms in high-stakes decisions, our result is a reminder of a basic fact: one should not assume that algorithms predict well. That must be demonstrated with transparent, empirical evidence.

@msalganik

According to Matthew this “collective failure of 160 teams” is hard to ignore. And it failure highlights the understanding vs. predicting paradox: these data have been used to generate knowledge on how the world works in over 750 papers, yet few checked to see whether these same data and the scientific models would be useful to predict the life outcomes we’re trying to understand.

I was super excited to read this paper and I love the approach. It is actually quite closely linked to a series of papers I have been working on with Brian Spisak and Brian Doornenbal on trying to predict which people will emerge as organizational leaders. (hint: we could not really, at least not based on their personality)

Apparently, others were as excited as I am about this paper, as Filiz Garip already published a commentary paper on this research piece. Unfortunately, it’s behind a paywall so I haven’t read it yet.

Moreover, if you want to learn more about the approaches the 160 data science teams took in modelling these life outcomes, here are twelve papers in which some teams share their attempts.

Very curious to hear what you think of the paper and its implications. You can access it here, and I’d love to read your comments below.

Making Pictures 3D using Context-aware Layered Depth Inpainting

Making Pictures 3D using Context-aware Layered Depth Inpainting

Several Chinese Ph.D. students wrote a PyTorch program that can turn your holiday pictures into 3D sceneries. They call it 3D photo inpainting. Here are some examples

And here’s the new method compares to previous techniques:

Here are several links to more detailed resources: [Paper] [Project Website] [Google Colab] [GitHub]

We propose a method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. We use a Layered Depth Image with explicit pixel connectivity as underlying representation, and present a learning-based inpainting model that iteratively synthesizes new local color-and-depth content into the occluded region in a spatial context-aware manner. The resulting 3D photos can be efficiently rendered with motion parallax using standard graphics engines. We validate the effectiveness of our method on a wide range of challenging everyday scenes and show fewer artifacts when compared with the state-of-the-arts.

Via github.com/vt-vl-lab/3d-photo-inpainting
3D Photography Inpainting: Exploring Art with AI. - Towards Data ...
I loved this one as well, but it could be a different technique used, via Medium