Category: deep learning

Super Resolution: Increasing image quality CSI-like

Super-resolution imaging is a class of techniques that enhance the resolution of an imaging system (Wikipedia). The entertainment series CSI has been ridiculed for relying on exaggerated and unrealistic applications of it:

Until recently, such upscaling of images were though near impossible. However, we have evidenced some pretty amazing breakthroughs in the deep learning space recently. Artificial Intelligence can think ahead, learn physics, and beat experts at their own games (DOTA; Poker), mostly through inventive applications of neural networks.

As a result, there are now several applications where machines have learned to literally fill in the blanks in imagery. Most notable seems the method developed by Google: Rapid and Accurate Image Super Resolution, or RAISR is short. In contrast to other approaches, RAISR does not rely on (adversarial) neural network(s) and is thus not as resource-demanding to train. Moreover, it’s performance is quite remarkable:

You can read more details in the paper by Romano, Isodoro, and Milanfar (2016) or watch the research summary below by, unsurprisingly, Two Minute Papers:

I guess you’re eager to test this super resolution out yourself?! letsenhance.io let’s you enhance the resolution of five images for free, after which it charges you $5 per twenty pictures processed. The website feeds the input image to a neural net and puts out an image of which the resolution has been increased four fold! I tested it with this random blurry picture I retrieved from Google/Pinterest.

blurry_picture-magic — Enhanced 2000×2000

Do you see how much more detailed (though still blurry) the second image is? Nevertheless, upscaling four times seems about the limit as that is the default factor for both RAISR and Let’s Enhance. I am very curious to see how this super resolution is going to develop in the future, how it will be used to decrease memory or network demands, whether it will be integrated with video platforms like YouTube or Netflix, and which algorithm will ultimately take the crown!

Image result for RAISR — Several algorithms and their upscaled results.

Facial Recognition Challenge: Chad Smith & Will Ferrell

The below summarizes Part 4 of a medium.com series by Adam Geitgey.
Check out the original articles: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8!

Adam Geitgey likes to write about computers and machine learning. He explains machine learning as “generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.” (Part 1)

*Adam’s visual explanation of two machine learning applications (original from Part 1)*

In the fourth part of his series on machine learning Adam touches on Facial Recognition. Facebook is one of the companies using such algorithms in real-time, allowing them to recognize your friends’ faces after you’ve tagged them only a few times. Facebook reports they recognize faces with 97% accuracy, which is comparable to our own, human facial recognition abilities!

Facebook’s algorithms recognizing and automatically tagging Adam’s family. Helpful or creepy? (original from Part 4)

Adam decided to put up a challenge: would a facial recognition algorithm be able to distinguish Will Ferrell (famous actor) from Chad Smith (famous rock musician)? Indeed, these two celebrities look very much alike:

Image result for will ferrell chad smith — Chad Smith (left) and Will Ferell (right) on www.rollingstone.com

If you want to train such an algorithm, Adam explain, you need to overcome a series of related problems:

First, look at a picture and find all the faces in it

Second, focus on each face and be able to understand that even if a face is turned in a weird direction or in bad lighting, it is still the same person.

Third, be able to pick out unique features of the face that you can use to tell it apart from other people— like how big the eyes are, how long the face is, etc.

Finally, compare the unique features of that face to all the people you already know to determine the person’s name.

(Adam Geitgey, Part 4)

How the facial recognition algorithm steps might work (original from Part 4)

To detect the faces, Adam used Histograms of Oriented Gradients (HOG). All input pictures were converted to black and white (because color is not needed) and then every single pixel in our image is examined, one at a time. Moreover, for every pixel, the algorithm examined the pixels directly surrounding it:

Illustration of the algorithm as it would take in a black and white photo of Will Ferrel (original from Part 4)

The algorithm then checks, for every pixel, in which direction the picture is getting darker and draws an arrow (a gradient) in that direction.

Illustration of how algorithm would reduce a black and white photo of Will Ferrel to gradients (original from Part 4)

However, to do this for every single pixel would require too much processing power, so Adam broke up pictures in 16 by 16 pixel squares. The result is a very simple representation that does capture the basic structure of the original face, based on which we can now spot faces in pictures. Moreover, because we used gradients, the result will be similar regardless of the lighting of the picture.

The original image turned into a HOG representation (original from Part 4)

Now that the computer can spot faces, we need to make sure that it knows that two perspectives of the same face represent the same person. Adam uses landmarks for this: 68 specific points that exist on every face. An algorithm can then be trained to find these points on any face:

The 68 points on the image of Will Ferrell (original from Part 4)

Now the computer knows where the chin, the mouth and the eyes are, the image can be scaled and rotated to center it as best as possible:

The image of Will Ferrell transformed (original from Part 4)

Adam trained a Deep Convolutional Neural Network to generate 128 measurements for each face that best distinguish it from faces of other people. This network needs to train for several hours, going through thousands and thousands of face pictures. If you want to try this step yourself, Adam explains how to run OpenFace’s lua script. This study at Google provides more details, but it basically looks like this:

The training process visualized (original from Part 4)

After hours of training, the neural net will output 128 numbers accurately representing the specific face put in. Now, all you need to do is check which face in your database is most closely resembled by those 128 numbers, and you have your match! Many algorithms can do this final check, and Adam trained a simple linear SVM classifier on twenty pictures of Chad Smith, Will Ferrel, and Jimmy Falon (the host of a talkshow they both visited).

In the end, Adam’s machine had learned to distinguish these three people – two of whom are nearly indistinguishable with the human eye – in real-time:

Adam Geitgey’s facial recognition algorithm in action: providing real time classifications of the faces of lookalikes Chad Smith and Will Ferrel at Jimmy Falon’s talk show (original from Part 4)

You can find Adam on LinkedIn, or on Twitter at @ageitgey, and I strongly recommend you examine his series on machine learning on Medium.com (Part 1). Moreover, Adam released a Python library called face_recognition, arguably easier to install and use than OpenFace, as well as a pre-configured virtual machine with face_recognition, OpenCV, TensorFlow and lots of other deep learning tools pre-installed.

Neural Networks play Super Mario Bros & Mario Kart

Seth Bling calls himself a video game designer, a hacker and an engineer. You might know him from MarI/O: his neural network that got extremely good to at playing Super Mario Bros. The video below shows the genetic approach Seth used to train this neural network. Seth randomly generated a starting population of neural networks where the inputs – the current frame in the Mario video game – were randomly connected to the outputs – the eight buttons to press (jump, duck, up, down, right, left, etc). By giving the neural nets that made it furthest into the game a larger chance to pass on their genes (their input-output relations) to the next generation with slight mutations, Seth automatically generated neural networks that were more and more proficient in completing the game. In short, by evolution, Seth’s neural network learned the most effective response to the changing video game environment.

After MarI/O, Seth this week posted his newest creation: MariFlow. Here, Seth trained a neural network on 15 hours of training data, consisting of Seth himself playing Super Mario Kart. The neural network thus learned what buttons (output) Seth would most likely push when he encountered a certain Mario Kart parcours piece (input). However, due to random chance, the neural net would often get itself stuck in situations that Seth had not encountered in his training sessions (e.g., reversed, against a wall). The neural net would fail miserably in such situations because it had not learned how to behave. Accordingly, Seth had to generate new training data for these situations and he did so using Human-Computer Interactions in Machine Learning: Seth and the neural net would play alternatively for a while, thus generating training data for situations that Seth would not have encountered on its own. After the neural net was trained with these additional data, it became quite proficient in playing Mario Kart (like Seth) often even winning matches! If you want to know more, you can read the manual here or watch Seth’s video below. If you want to replicate or just play with the data, Seth made everything available here.

Seth has active YouTube, Twitch and Twitter channels and I recommend you check them out!

GAN: Generative Adversarial Networks

A Generative Adversarial Network, GAN in short, is a machine learning architecture where two neural networks compete against each other. One of them functions as a discriminator, seeking to optimize its classification of data (i.e., determine whether or not there is a cat in a picture). The other one functions as a generator, seeking to best generate new data to fool the discriminator (i.e., create realistic fake images of cats). Over time, the generator network will become increasingly good at simulating realistic data and being able to mimic real-life.

The concept of GAN was introduced by Ian Goodfellow in 2014, whom we know from the Machine Learning & Deep Learning book. Although GANs are computationally heavy and still undergoing major development, their potential implications are widespread. We can see these architectures taking over all sort of creative work, where generating new “data” is the main task. Think for instance of designing clothes, creating video footage, writing novels, animating movies, or even whole video games. One of my favorite Youtube channels discusses multiple of its recent applications, and here are a few of my favorites:

If you want to know more about GANs, Analytics Vidhya hosts a short introduction, but I personally prefer this one by Rob Miles via Computerphile:

If you want to try out these GANs yourself but do not have the programming experience: Reiichiro Nakano made a GAN playground in (what seems) JavaScript, where you can play around with the discriminator and the generator to create an adversarial network that identifies and generates images of numbers.

The Magic Sudoku App

A few weeks ago, Magic Sudoku was released for iOS11. This app by a company named Hatchlings automatically solves sudoku puzzles using a combination of Computer Vision, Machine Learning, and Augmented Reality. The app works on iPad Pro’s and iPhone 6s or above and can be downloaded from the App Store.

Magic Sudoku gives a magical experience when users point their phone at a Sudoku puzzle: the puzzle is instantaneously solved and displayed on their screen. In several seconds, the following occurs behind the scenes:

ARKit gets a new frame from the camera.
iOS11’s Vision Library detects the rectangles in the image.
If rectangles are found, it is determined whether they are a Sudoku.
If a puzzle is found, it is split into 81 square images.
Each square is run through a neural network to determine what number (if any) it represents.
Once enough numbers are gathered, a traditional recursive algorithm solves the puzzle.
Finally, a 3D-model of the solved puzzle is fed back to ARKit and displayed on top of the original image from the camera.

What happens in the ARKit app behind the scenes.

“One of the original reasons I chose a Sudoku solver as our first AR app was that I knew classifying digits is basically the “hello world” of Machine Learning. I wanted to dip my toe in the water of Machine Learning while working on a real-world problem. This seemed like a realistic app to tackle.” – Brad Dwyer, Founder at Hatchlings

Particularly the training process of the app interested me. In his blog, Brad explains how they bought out the entire stock of Sudoku books of a specific bookstore and, with the help of his team, ripped each book apart to scan each small square with a number and upload in to a server. In the end, this server contained about 600,000 images, but all were completely unlabeled. Via a simple game, they asked Hatchlings users to classify these images by pressing the number keys on their keyboard. Within 24 hours, all 600,000 images were classified!

Nevertheless, some users had misunderstood the task (or just plainly ignored it) and as a consequence there were still a significant number of misidentified images. So Brad created a second tool that displayed 100 images of a single class to users, who where consequently asked to click the ones that didn’t match. These were subsequently thrown back into the first tool to be reclassified.

Quickly, the developers had enough verified data to add an automatic accuracy checker into both tools for future data runs. Funnily enough, they programmed it in such a way that users were periodically shown already known/classified images in order to check the validity of their inputs and determine how much to trust their answers going forward. This whole process reminds me on a blog I wrote recently, regarding human-computer interactions in reinforcement learning.

For several more weeks, users classified more scanned data so that, by the time the app was launched, it had been trained on over a million images of Sudoku squares. The results were amazing as the application had a 98.6% accuracy on launch (currently above 99% accuracy). One minor deficit was that the app was trained on paper Sudoku’s. However, when it aired, many users wanted to quickly test it and searched for Sudoku images on Google, which the app wouldn’t process that well.

“Problem number one was that our machine learning model was only trained on paper puzzles; it didn’t know what to think about pixels on a screen. I pulled an all nighter that first week and re-trained our model with puzzles on computer screens.

Problem number two was that ARKit only supports horizontal planes like tables and floors (not vertical planes like computer monitors). Solving this was a trickier problem but I did come up with a hacky workaround. I used a combination of some heuristics and FeaturePoint detection to place puzzles on non-horizontal planes.” – Brad Dwyer, Founder at Hatchlings

Brad and his colleagues at Hatchlings still need to work out the business model behind the ARKit Magic Sudoku app, but that’s in the meantime, download the app and let me and them know what you think: subscribe to his medium blog or follow Brad on twitter.

Neural Networks 101

Last month, a video by 3Blue1Brown has been trending on YouTube, accumulating already over a quarter of a million views. It only lasts 10 minutes but provides a very good and intuitive explanation of the inner workings of Neural Networks (NN):

The Machine Learning & Deep Learning book I wrote about recently provides a more substantial explanation of the different NNs and their inner workings. Neural nets come in various different flavors and my list of Data Science, Machine Learning, & Statistics Resources includes useful cheatsheets and other information, such as the architecture map below.

If you still haven’t had enough, Daniel Shiffman demonstrates how to code Neural Networks in Processing (Java), and the video displays precisely what happens behind the scenes. Finally, MIT has made their AI course material open-source, and it includes two 45 minute lectures on NNs. The lecturing professor – Patrick Winston – isn’t much of a fan of these “bulldozer” algorithms. He has a stronger preference for “more sophisticated” mathematical learning through, for instance, Support Vector Machines.