Tag: computervision

Using OpenCV to win Mobile games

OpenCV is open-source library with tools and functionalities that support computer vision. It allows your computer to use complex mathematics to detect lines, shapes, colors, text and what not.

OpenCV was originally developed by Intel in 2000 and sometime later someone had the bright idea to build a Python module on top of it.

Using a simple…

pip install opencv-python

…you can now use OpenCV in Python to build advanced computer vision programs.

And this is exactly what many professional and hobby programmers are doing. Specifically, to get their computer to play (and win) mobile app games.

ZigZag

In ZigZag, you are a ball speeding down a narrow pathway and your only mission is to avoid falling off.

Using OpenCV, you can get your computer to detect objects, shapes, and lines.

This guy set up an emulator on his computer, so the computer can pretend to be a mobile device. Then he build a program using Python’s OpenCV module to get a top score

You can find the associated code here, but note that will need to set up an emulator yourself before being able to run this code.

Kick Ya Chop

In Kick Ya Chop, you need to stomp away parts of a tree as fast as you can, without hitting any of the branches.

This guy uses OpenCV to perform image pattern matching to allow his computer to identify and avoid the trees braches. Find the code here.

Whack ‘Em All

We all know how to play Whack a Mole, and now this computer knows how to too. Code here.

Pong

This last game also doesn’t need an introduction, and you can find the code here.

Is this machine learning or AI?

If you’d ask me, the videos above provide nice examples of advanced automation. But there’s no real machine learning or AI involved.

Yes, sure, the OpenCV package uses pre-trained neural networks under the hood, and you can definitely call those machine learning. But the programmers who now use the opencv library just leverage the knowledge stored in those network to create very basal decision rules.

IF pixel pattern of mole
THEN whack!
ELSE no whack.

To me, it’s only machine learning when there’s really some learning going on. A feedback loop with performance improvement. And you may call it AI, IMO, when the feedback loop is more or less autonomous.

Fortunately, programmers have also been taking a machine learning/AI approach to beating games. Specifically using reinforcement learning. Think of famous applications like AlphaGo and AlphaStar. But there are also hobby programmers who use similar techniques. For example, to get their computer to obtain highscores on Trackmania.

In a later post, I’ll dive into those in more detail.

The wondrous state of Computer Vision, and what the algorithms actually “see”

The field of computer vision tries to replicate our human visual capabilities, allowing computers to perceive their environment in a same way as you and I do. The recent breakthroughs in this field are super exciting and I couldn’t but share them with you.

In the TED talk below by Joseph Redmon (PhD at the University of Washington) showcases the latest progressions in computer vision resulting, among others, from his open-source research on Darknet – neural network applications in C. Most impressive is the insane speed with which contemporary algorithms are able to classify objects. Joseph demonstrates this by detecting all kinds of random stuff practically in real-time on his phone! Moreover, you’ve got to love how well the system works: even the ties worn in the audience are classified correctly!

PS. please have a look at Joseph’s amazing My Little Pony-themed resumé.

The second talk, below, is more scientific and maybe even a bit dry at the start. Blaise Aguera y Arcas (engineer at Google) starts with a historic overview brain research but, fortunately, this serves a cause, as ~6 minutes in Blaise provides one of the best explanations I have yet heard of how a neural network processes images and learns to perceive and classify the underlying patterns. Blaise continues with a similarly great explanation of how this process can be reversed to generate weird, Asher-like images, one could consider creative art:

An example of a reversed neural network thus “*estimating*” an image of a bird [via Youtube]

Blaise’s colleagues at Google took this a step further and used t-SNE to visualize the continuous space of animal concepts as perceived by their neural network, here a zoomed in part on the Armadillo part of the map, apparently closely located to fish, salamanders, and monkeys?

A zoomed view of part of a t-SNE map of latent animal concepts generated by reversing a neural network [via Youtube]

We’ve seen these latent spaces/continua before. This example Andrej Karpathy shared immediately comes to mind:

0_o pic.twitter.com/M3yB6rzTiO

— Andrej Karpathy (@karpathy) 28 oktober 2017

Blaise’s presentaton you can find here:

If you want to learn more about this process of image synthesis through deep learning, I can recommend the scientific papers discussed by one of my favorite Youtube-channels, Two-Minute Papers. Karoly’s videos, such as the ones below, discuss many of the latest developments:

Let me know if you have any other video’s, papers, or materials you think are worthwhile!

The Magic Sudoku App

A few weeks ago, Magic Sudoku was released for iOS11. This app by a company named Hatchlings automatically solves sudoku puzzles using a combination of Computer Vision, Machine Learning, and Augmented Reality. The app works on iPad Pro’s and iPhone 6s or above and can be downloaded from the App Store.

Magic Sudoku gives a magical experience when users point their phone at a Sudoku puzzle: the puzzle is instantaneously solved and displayed on their screen. In several seconds, the following occurs behind the scenes:

ARKit gets a new frame from the camera.
iOS11’s Vision Library detects the rectangles in the image.
If rectangles are found, it is determined whether they are a Sudoku.
If a puzzle is found, it is split into 81 square images.
Each square is run through a neural network to determine what number (if any) it represents.
Once enough numbers are gathered, a traditional recursive algorithm solves the puzzle.
Finally, a 3D-model of the solved puzzle is fed back to ARKit and displayed on top of the original image from the camera.

What happens in the ARKit app behind the scenes.

“One of the original reasons I chose a Sudoku solver as our first AR app was that I knew classifying digits is basically the “hello world” of Machine Learning. I wanted to dip my toe in the water of Machine Learning while working on a real-world problem. This seemed like a realistic app to tackle.” – Brad Dwyer, Founder at Hatchlings

Particularly the training process of the app interested me. In his blog, Brad explains how they bought out the entire stock of Sudoku books of a specific bookstore and, with the help of his team, ripped each book apart to scan each small square with a number and upload in to a server. In the end, this server contained about 600,000 images, but all were completely unlabeled. Via a simple game, they asked Hatchlings users to classify these images by pressing the number keys on their keyboard. Within 24 hours, all 600,000 images were classified!

Nevertheless, some users had misunderstood the task (or just plainly ignored it) and as a consequence there were still a significant number of misidentified images. So Brad created a second tool that displayed 100 images of a single class to users, who where consequently asked to click the ones that didn’t match. These were subsequently thrown back into the first tool to be reclassified.

Quickly, the developers had enough verified data to add an automatic accuracy checker into both tools for future data runs. Funnily enough, they programmed it in such a way that users were periodically shown already known/classified images in order to check the validity of their inputs and determine how much to trust their answers going forward. This whole process reminds me on a blog I wrote recently, regarding human-computer interactions in reinforcement learning.

For several more weeks, users classified more scanned data so that, by the time the app was launched, it had been trained on over a million images of Sudoku squares. The results were amazing as the application had a 98.6% accuracy on launch (currently above 99% accuracy). One minor deficit was that the app was trained on paper Sudoku’s. However, when it aired, many users wanted to quickly test it and searched for Sudoku images on Google, which the app wouldn’t process that well.

“Problem number one was that our machine learning model was only trained on paper puzzles; it didn’t know what to think about pixels on a screen. I pulled an all nighter that first week and re-trained our model with puzzles on computer screens.

Problem number two was that ARKit only supports horizontal planes like tables and floors (not vertical planes like computer monitors). Solving this was a trickier problem but I did come up with a hacky workaround. I used a combination of some heuristics and FeaturePoint detection to place puzzles on non-horizontal planes.” – Brad Dwyer, Founder at Hatchlings

Brad and his colleagues at Hatchlings still need to work out the business model behind the ARKit Magic Sudoku app, but that’s in the meantime, download the app and let me and them know what you think: subscribe to his medium blog or follow Brad on twitter.