Reinforcement learning is an area of machine learning inspired by behavioral psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward (Wikipedia, 2017). Normally, reinforcement learning occurs autonomously. Here, algorithms will seek to minimize/maximize a score that is estimated via predefined constraints. As such, algorithms can thus learn to perform the most effective actions (those that minimize/maximize the score) by repeatedly experimenting and assessing strategies.

The approach in the video below is radically different. Instead of a pre-defined scoring, human-computer interaction is used to assign each action sequence (each iteration/experiment) a score. This approach is particularly useful for complex behaviors, such as a back-flip, for which it is hard to pre-define the constraints and actions that lead to the “most effective” back-flip. However, for us humans, it is relatively easy to recognize a good back-flip when we see one. The video below shows how the researchers therefore integrated a human-computer interaction in their reinforcement learning algorithm. After observing the algorithm perform a sequence of actions, a human actor indicates to what extent the goal (i.e., a backflip) is achieved or not. This human assessment thus functions as the score which the algorithm will try to minimize/maximize.

This approach can be really valuable for organizations seeking to improve their machine learning application. The paper on the principle (Deep Reinforcement Learning from Human Preferences) can be found here. The scholars conclude that this supervised approach based on human preferences has very good training results whereas the cost are similar the simple bulldozer approach of training a neural net from scratch using GPU servers.