Category: machine learning

Generating images from scratch: Parallel Multiscale Autoregressive Density Estimation

A while ago, I blogged about this new algorithm, pix2code, which takes in pictures of graphical user interfaces and outputs the underlying code. Today, I discovered another fantastic algorithm, by Scott Reed and his colleagues at Google Deepmind. txt2pix would be a catchy name for this algorithm, as it can take in a fairly complex sentence (e.g., “a grey bird with a black head, orange eyes, and a yellow beak“) and generate a completely new and unique image based on its content. In their recently published paper, they elaborate on the algorithms inner workings.

An example of the training and generation process reported in the paper

Scott and his team have been working on this project for quite some time. The early version of the algorithm generated an image one pixel at a time, but it had difficulties generating large or high-quality images. After picking a starting pixel to generate, any consecutively generated pixel the algorithm generates needs to align with its neighbours. For example, if pixel A is the first pixel in the generation of the yellow beak of a bird, any pixels that are created in the neighbourhood of that pixel should take into account that pixel A is trying to visualize a yellow beak, and behave accordingly: either continuing the beak, or ending the beak and starting on another element of the image.

The problem with such an iterative approach (i.e., pixel by pixel) is that it can take a very long time for a computer to generate an image. Considering that a fairly small image, say 256 by 256 pixels, already contains 65.536 pixels, each of which needs to be generated while considering all its neighbours and keeping in mind the bigger picture. In the most recent, updated version of the algorithm, Scott and his team have allowed the generation of multiple unrelated pixels simultaneously at different ‘zones’ of the image. Hence the Parallel in Parallel Multiscale Autoregressive Density Estimation. With this parallel approach, the algorithm can now generate the pixels representing the yellow beak in one area of the image, while simultaneously generating pixels for the bird’s wings and the branch it’s sitting on at different sections of the image. This speeds up the process quite extensively, demanding less computation time, thus allowing for quicker image generation.

I can definitely recommend that you check out Scott Reeds’ twitter feed for some amazing animated GIFs of the generation process:

Sampling animations for Parallel Multiscale Autoregressive Density Estimation. pic.twitter.com/rNabVgzPGa

— Scott Reed (@scott_e_reed) 13 maart 2017

Some more animations: pic.twitter.com/EptRR6iIZ4

— Scott Reed (@scott_e_reed) 13 maart 2017

One more animation: pic.twitter.com/fdssYx18PY

— Scott Reed (@scott_e_reed) 13 maart 2017

If you want to know more details behind the algorithm but do not fancy reading the entire paper, I recommend this short explanation video by Károly Zsolnai-Fehér (what a name!) of Two Minute papers:

pix2code: teaching AI to build apps

Last May, Tony Beltramelli of Ulzard Technologies presented his latest algorithm pix2code at the NIPS conference. Put simply, the algorithm looks at a picture of a graphical user interface (i.e., the layout of an app), and determines via an iterative process what the underlying code likely looks like.

Afbeeldingsresultaat voor user interface — Graphical user interface examples (Google Images)

Please watchUlzard’s pix2code demo video or the third-party summary at the bottom of this blog. My undertanding is that pix2code is based on convolutional and recurrent neural networks (long explanation video) in combination with long short-term memory (short explanation video). Based on a single input image, pix2code can generate code that is 77% accurate and it works for three of the larger platforms (i.e. iOS, Android and web-based technologies).

Obviously, this is groundbreaking technology. When further developed, pix2code not only increases the speed with which society is automated/robotized but it also further expands the automation to more complex and highly needed tasks, such as programming and web/app development.

Here you can read the full academic paper on pix2code.

Below is the official demo reviewed by another data enthusiast with commentary and some additional food for thought.

Read here some of my other blogs on neural networks and robotization:

R learning: Neural Networks

Artificial neural networks (ANNs) are computing systems inspired by the human brain. They can teach themselves to do tasks, simply by considering examples of the tasks’ outcome. For example, they can learn to identify images that contain cats by analyzing example images that have been tagged “cat” or “no cat”. When given enough examples, the neural network can autonomously determine whether “untagged” images include cats or not (Wikipedia). If you want to learn more and have 20 minutes to spare, I can recommend this YouTube video by Brandon Rohrer.

Neural networks are commonly used for those machine learning problems where there is a vast amount of (complex) data available. Some toy examples include fingerprint recognition, language translation, car steering behaviours, object detection, text generation, and doodle recognition (by Google). Chances are pretty high that any system that makes complex recommendations these days (e.g., “Is this John in the picture?”, “Did you mean “South End Taco’s” instead of “Sout En dTacos”?”) has a neural net running in the background.

http://www.r-exercises.com designs tutorials for beginning programmers in R. On their website they host a learning series on neural networks, consisting of three sets of exercises: Part 1, Part 2, and Part 3. Afterwards, you can check your performance with the solutions: Solutions 1, Solutions 2, and Solutions 3.

Keep on learning!

P.S. afterwards you might want to check out this package and API for deep learning in R and Python.

Light GBM vs. XGBOOST in Python & R

XGBOOST stands for eXtreme Gradient Boosting. A big brother of the earlier AdaBoost, XGB is a supervised learning algorithm that uses an ensemble of adaptively boosted decision trees. For those unfamiliar with adaptive boosting algorithms, here’s a 2-minute explanation video and a written tutorial. Although XGBOOST often performs well in predictive tasks, the training process can be quite time-consuming (similar to other bagging/boosting algorithms (e.g., random forest)).

In a recent blog, Analytics Vidhya compares the inner workings as well as the predictive accuracy of the XGBOOST algorithm to an upcoming boosting algorithm: Light GBM. The blog demonstrates a stepwise implementation of both algorithms in Python. The table below reflects the main conclusion of the comparison: Although the algorithms are comparable in terms of their predictive performance, light GBM is much faster to train. With continuously increasing data volumes, light GBM, therefore, seems the way forward.

Laurae also benchmarked lightGBM against xgboost on a Bosch dataset and her results show that, on average, LightGBM (binning) is between 11x to 15x faster than xgboost (without binning):

View interactively online: https://plot.ly/~Laurae/9/

However, the differences get smaller as more threads are used due to thread inefficiencies (idle-time increases because threads are not scheduled a next task fast enough).

Light GBM is also available in R:

devtools::install_github("Microsoft/LightGBM", subdir = "R-package")

Neil Schneider tested the three algorithms for gradient boosting in R (GBM, xgboost, and lightGBM) and sums up their (dis)advantages:

GBM has no specific advantages but its disadvantages include no early stopping, slower training and decreased accuracy,
xgboost has demonstrated successful on kaggle and though traditionally slower than lightGBM, tree_method = 'hist' (histogram binning) provides a significant improvement.
lightGBM has the advantages of training efficiency, low memory usage, high accuracy, parallel learning, corporate support, and scale-ability. However, its’ newness is its main disadvantage because there is little community support.

Keras: Deep Learning in R or Python within 30 seconds

Keras is a high-level neural networks API that was developed to enabling fast experimentation with Deep Learning in both Python and R. According to its author Taylor Arnold: Being able to go from idea to result with the least possible delay is key to doing good research. The ideas behind deep learning are simple, so why should their implementation be painful?

Keras comes with the following key features:

Allows the same code to run on CPU or on GPU, seamlessly.
User-friendly API which makes it easy to quickly prototype deep learning models.
Built-in support for convolutional networks (for computer vision), recurrent networks (for sequence processing), and any combination of both.
Supports arbitrary network architectures: multi-input or multi-output models, layer sharing, model sharing, etc. This means that Keras is appropriate for building essentially any deep learning model, from a memory network to a neural Turing machine
Fast implementation of dense neural networks, convolution neural networks (CNN) and recurrent neural networks (RNN) in R or Python, on top of TensorFlow or Theano.

R

R: Installation

The R interface to Keras uses TensorFlow™ as it’s underlying computation engine. First, you have to install the keras R package from GitHub:

devtools::install_github("rstudio/keras")

Using the install_tensorflow() function you can then install TensorFlow:

library(keras)
install_tensorflow()

This will provide you with a default installation of TensorFlow suitable for use with the keras R package. See the article on TensorFlow installation to learn about more advanced options, including installing a version of TensorFlow that takes advantage of Nvidia GPUs if you have the correct CUDA libraries installed.

R: Getting started in 30 seconds

Keras uses models to organize layers. Sequential models are the simplest structure, simply stacking layers. More complex architectures require the Keras functional API, which allows to build arbitrary graphs of layers.

Here is an example of a sequential model (hosted on this website):

library(keras)

model keras_model_sequential() 

model %>% 
  layer_dense(units = 64, input_shape = 100) %>% 
  layer_activation(activation = 'relu') %>% 
  layer_dense(units = 10) %>% 
  layer_activation(activation = 'softmax')

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_sgd(lr = 0.02),
  metrics = c('accuracy')
)

The above demonstrates the little effort needed to define your model. Now, you can iteratively train your model on batches of training data:

model %>% fit(x_train, y_train, epochs = 5, batch_size = 32)

Next, performance evaluation can be prompted in a single line of code:

loss_and_metrics %>% evaluate(x_test, y_test, batch_size = 128)

Similarly, generating predictions on new data is easily done:

classes %>% predict(x_test, batch_size = 128)

Building more complex models, for example, to answer questions or classify images, is just as fast.

Python

A step-by-step implementation of several Neural Network architectures with Keras in Python can be found on DataCamp. Similarly, one may use this quick cheatsheet to deploy the most basic models.

Additional resources:

Uber: Translating Behavioural Science to the Work Floor with Gamification and Experimentation

Yesterday, I read the most interesting article on how Uber uses academic research from the field of behavioral psychology to persuade their drivers to display desired behaviors. The tone of the article is quite negative and I most definitely agree there are several ethical issues at hand here. However, as a data scientist, I was fascinated by the way in which Uber has translated academic insights and statistical methodology into applications within their own organization that actually seem to pay off. Well, at least in the short term, as this does not seem a viable long-term strategy.

The full article is quite a long read (~20 min), and although I definitely recommend you read it yourself, here are my summary notes, for convenience quoted from the original article:

“Employing hundreds of social scientists and data scientists, Uber has experimented with video game techniques, graphics and noncash rewards of little value that can prod drivers into working longer and harder — and sometimes at hours and locations that are less lucrative for them.”
“To keep drivers on the road, the company has exploited some people’s tendency to set earnings goals — alerting them that they are ever so close to hitting a precious target when they try to log off.”
“Uber exists in a kind of legal and ethical purgatory […] because its drivers are independent contractors, they lack most of the protections associated with employment.”
“[…] much of Uber’s communication with drivers over the years has aimed at combating shortages by advising drivers to move to areas where they exist, or where they might arise. Uber encouraged its local managers to experiment with ways of achieving this.[…] Some local managers who were men went so far as to adopt a female persona for texting drivers, having found that the uptake was higher when they did.”
“[…] Uber was increasingly concerned that many new drivers were leaving the platform before completing the 25 rides that would earn them a signing bonus. To stem that tide, Uber officials in some cities began experimenting with simple encouragement: You’re almost halfway there, congratulations! While the experiment seemed warm and innocuous, it had in fact been exquisitely calibrated. The company’s data scientists had previously discovered that once drivers reached the 25-ride threshold, their rate of attrition fell sharply.”
“For months, when drivers tried to log out, the app would frequently tell them they were only a certain amount away from making a seemingly arbitrary sum for the day, or from matching their earnings from that point one week earlier.The messages were intended to exploit another relatively widespread behavioral tic — people’s preoccupation with goals — to nudge them into driving longer. […] Are you sure you want to go offline?” Below were two prompts: “Go offline” and “Keep driving.” The latter was already highlighted.”
“Sometimes the so-called gamification is quite literal. Like players on video game platforms such as Xbox, PlayStation and Pogo, Uber drivers can earn badges for achievements like Above and Beyond (denoted on the app by a cartoon of a rocket blasting off), Excellent Service (marked by a picture of a sparkling diamond) and Entertaining Drive (a pair of Groucho Marx glasses with nose and eyebrows).”
“More important, some of the psychological levers that Uber pulls to increase the supply of drivers have quite powerful effects. Consider an algorithm called forward dispatch […] that dispatches a new ride to a driver before the current one ends. Forward dispatch shortens waiting times for passengers, who may no longer have to wait for a driver 10 minutes away when a second driver is dropping off a passenger two minutes away. Perhaps no less important, forward dispatch causes drivers to stay on the road substantially longer during busy periods […]
[But] there is another way to think of the logic of forward dispatch: It overrides self-control. Perhaps the most prominent example is that such automatic queuing appears to have fostered the rise of binge-watching on Netflix. “When one program is nearing the end of its running time, Netflix will automatically cue up the next episode in that series for you,” wrote the scholars Matthew Pittman and Kim Sheehan in a 2015 study of the phenomenon. “It requires very little effort to binge on Netflix; in fact, it takes more effort to stop than to keep going.””
“Kevin Werbach, a business professor who has written extensively on the subject, said that while gamification could be a force for good in the gig economy — for example, by creating bonds among workers who do not share a physical space — there was a danger of abuse.”
“There is also the possibility that as the online gig economy matures, companies like Uber may adopt a set of norms that limit their ability to manipulate workers through cleverly designed apps. For example, the company has access to a variety of metrics, like braking and acceleration speed, that indicate whether someone is driving erratically and may need to rest. “The next step may be individualized targeting and nudging in the moment,” Ms. Peters said. “‘Hey, you just got three passengers in a row who said they felt unsafe. Go home.’” Uber has already rolled out efforts in this vein in numerous cities.”
“That moment of maturity does not appear to have arrived yet, however. Consider a prompt that Uber rolled out this year, inviting drivers to press a large box if they want the app to navigate them to an area where they have a “higher chance” of finding passengers. The accompanying graphic resembles the one that indicates that an area’s fares are “surging,” except in this case fares are not necessarily higher.”