Eder Santana

Keras plays catch, a single file Reinforcement Learning example

Wed, 16 Mar 2016 22:25:00 -0400

Get started with reinforcement learning in less that 200 lines of code with Keras (Theano or Tensorflow, it’s your choice).

So you are a (Supervised) Machine Learning practitioner that was also sold the hype of making your labels weaker and to the possibility of getting neural networks to play your favorite games. You want to do Reinforcement Learning (RL), but you find it hard to read all those full featured libraries just to get a feeling of what is actually going on.

Here we’ve got your back: we took the game engine complexities out of the way and show a minimal Reinforcement Learning example with less than 200 lines of code. And yes, the example does use Keras, your favorite deep learning library!

Before I give you a link to the code make sure you read Nervana’s blog post Demystifying Deep Reinforcement Learning. There you will learn about Q-learning, which is one of the many ways of doing RL. Also, at this point you already know that neural nets love mini-batches and there you will see what Experience Replay is and how to use it to get you them batches - even in problems where an agent only sees one sample of the environment state at a time.

So here is the link for our code. In that code Keras plays the catch game, where it should catch a single pixel “fruit” using a three pixel “basket”. The fruit falls one pixel per step and the Keras network gets a reward of +1 if it catches the fruit and -1 otherwise. The networks see the entire 10x10 pixels grid as input and outputs three values, each value corresponds to an action (move left, stay, move right). Since these values represent the expected accumulated future reward, we just go greedy and pick the action corresponding to the largest value.

One thing to note though, is that this network is not quite like you in exotic restaurants, it doesn’t take the very same action exploiting what it already knows at every time, once in a while we force system to take a random action. This would be the equivalent of you learning that life is more than just Penang Curry with fried Tempeh by trial and error.

In the link you will also find scripts that plays the game with no random actions and generates the pictures for the animation above.

Enjoy!

FAQ

1) How does this Q-learning thing even work?

C’mon read the blog post I just mentioned above… Anyway, think like this: the fruit is almost hitting the ground and your model is just one pixel away from a “catching” position. The model will face similar cases many many times. If it decides to stay or move left, it will be punished (imagine it smelling a bunch of rotten fruits in the ground because it was lazy). Thus, it learns to assign a small Q-value (sounds much better than just “output of neural net”, han?) to those two actions whenever it sees that picture as input. But, since catching the fruit also gives a juicy +1 reward, the model will learn to assign a larger Q-value to the “move right” action in that case. This is what minimizing the reward - Q-value error does.

One step before that, there will be no reward in the next step.

I liked how the previous phrase sounded, so I decided to give it its own paragraph. But, although in that case there is no juicy reward right after, the model can be trained using the maximum Q-value of the future state in the next step. Think about it. If you’re in the kitchen you know that you can just open the fridge to get food. But now you’re in your bedroom writing bad jokes and feel hungry. But you have this vague memory that going to the kitchen could help with that. You just go to the kitchen and there you figure how to help yourself. You have to learn all that by living the game. I know, being Markovian is hard! But then the rest is just propagating these reward expectations further and further into the past, assigning high values for good choices and low values for bad choices (don’t forget that sometimes you hit those random choices in college so you learn the parts of life they don’t talk about in school). For everything else, if you believe in Stochastic Gradient Descent then it is easy to see this actually making sense… I hope…

2) How different is that from AlphaGo?

Not much… But instead of learning Q-values, AlphaGo thought it was smarter to use REINFORCE and learn to output actions probabilities directly. After that, she played several games against herself, so many that it could later learn the probability of winning from each position. Using all that information, during play time she uses a search technique to look for possible actions that would take her to positions with higher probability of winning. But she told me to mention here that she doesn’t search as many possibilities in the future as her older cousin DeepBlue did. She also said that she can play pretty well using just one GPU, the other 99 were running high resolution Netflix series so she can catch up with human culture.

That being said, you should be able to modify this script in 2 or 3 days to get a reimplementation or AlphaGo and Skynet should be 4 weeks away?

3) Your code sucks why don’t you write something better?

I’m trying…

4) Did you learn that by yourself?

The bad parts, yes. The good things were taught to me by my friends Evan Kriminger and Matthew Emigh.

March journal club

Thu, 24 Dec 2015 13:00:00 -0500

WORK IN PROGRESS…

Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation

link: http://arxiv.org/pdf/1601.05447v1.pdf

Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis

link: http://arxiv.org/pdf/1601.00706v1.pdf

WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making

link: http://arxiv.org/abs/1602.02261

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

link: http://arxiv.org/abs/1602.02672

Agnez, analytics for deep learning research

Thu, 24 Dec 2015 13:00:00 -0500

Machine learning is about writing programs with parameters that are learned from data. But writing the base architecture that will be learned requires intuition, inspection, trial, and error, all elements that can be enhanced with high quality visualization and analytics tools.

Building visualization and analytics tools to assist deep learning (and machine learning in general) development was what motivated my brother Tiago Santana and I to start Agnez, a collection of visualization tools for deep learning. Models used for deep learning can be seen a business where the architecture and hyperparameters are the business choices and the accuracy or error in the test set a measure of business success. Keeping that metaphor in mind we looked for companies such as Keen IO and projects such as the Automatic Statistician for inspiration to build research analytics and visualization tools.

Here we will describe our approach to serve the visualizations as a web app using Feathers.js in the backend and Keen Dashboards in the frontend. For generating the graphs we are using a temporary solution based on mpld3 that converts Matplotlib graphs to D3. The full code is on minimal-app repository. A schematic diagram of our architecture is shown in the figure below.

We wanted to generate beautifully organized dashboards and we noticed that Keen Dashboards already lifted most of the design weight. But as an originally Python developer, I suggested to keep Matplotlib’s subplot arrangement and flexibility without needing to rewrite html ourselves. An elegant solution to this problem would be to generate the dashboards dynamically using a REST API. We chose to develop the API with Feathers.js, a thin wrapper around Express.js for building real time REST APIs with Node.js. This is what we needed to start a simple to use and general API for handling, storing and plotting model analytics. In coffeescript and using NeDB as the database, our Feathers app is simply:

feathers = require 'feathers'
mongodb = require('feathers-mongodb')
memory = mongodb {
  db: 'edermempy'
  collection: 'values'
}

bodyParser = require 'body-parser'
app = feathers()

app.configure feathers.rest()
  .configure feathers.socketio()
  .use bodyParser.json()
  .use '/values', memory
  .use '/', feathers.static(__dirname)
  .listen 3000

console.log 'App listening on port 3000
console.log 'Index at', __dirname+'/static/'

All the hard work is handled by feathers-nedb CRUD and feathers-client that uses socket.io to update the browser client in real time. The machine learning client training our model with Python sends POST requests to the server. These requests trigger events in the server that updates the browser page. For this simple demo, our Python client will send html strings generated with mpld3 and a gif. When training a deep learning model the html string would be graphs of cost functions, accuracy, weight norms and other useful analytics. As we mentioned this is a simple temporary solution for illustration purposes, it would be more general to use a native D3 chart, patch the graphs in the browser side and only send numbers from the machine learning side. Nevertheless, deep learning epochs, or passes through the training datasets, usually take a few seconds (or even minutes and hours depending on how large the training dataset is) and sending html strings does not add a considerable overhead.

The index.html is pretty minimal since everything will be generated dynamically when we send data using the API. We start with a simple <div id=dashboard> and add new Bootstrap rows later. script.coffee in the server has a basic Keen Dashboard cell as follows:

String.prototype.format = ->
  args = arguments
  return this.replace /{(\d+)}/g, (match, number) ->
    return if typeof args[number] isnt 'undefined' then args[number] else match

cellstr = """
  <div class="col-sm-6">
    <div class="chart-wrapper">
      <div class="chart-title" id=title{1}>
        {0} 
      </div>
      <div class="chart-stage" id="grid{1}">
        {2} 
      </div>
      <div class="chart-notes" id="description{1}">
        {3} 
      </div>
    </div>
  </div>
"""

With this code we can fill the placeholders using a Python inspired syntax: "dat {0} is {1}".format "string", "cool" which returns "dat string is cool". When creating a new cell, we simply append the filled string to #dashboard‘s html. When the html string or an image URL is patched, we update that dashboard cell using the snippet below:

values.on 'patched', (val) ->
  console.log 'patching', val.name
  $grid = $ "#grid#{val.pos}"
  $title = $ "#title#{val.pos}"
  $description = $ "#description#{val.pos}"

  $title.html val.name
  $description.html val.description
  if val.type is "html"
    $grid.html val.value
  if val.type is "img"
    $grid.html "<img src='#{val.value}'>"

To test the app, we use the Python script ahead.

# Remember that we are using feathers database CRUD

# Allocate cell space in the dashboard by calling the CREATE method 
url = "./images/main_img.gif"
r = requests.post("http://localhost:3000/values",
                  json={'name': '', 'type': 'html', 'value': [], 'pos': 0,
                        'description': ''})
id0 = json.loads(r.text)["_id"]
r = requests.post("http://localhost:3000/values",
                  json={'name': '', 'type': 'img', 'value': [], 'pos': 1,
                        'description': ''})
id1 = json.loads(r.text)["_id"]
fig = plt.figure()
numbers = []

# Update the cell html calling the PATCH method
for i in range(100):
    time.sleep(2) # simulate wait time of an epoch
    plt.clf()
    numbers.append(random.random()) # new value
    plt.plot(numbers)
    if len(numbers) > 20:
        del numbers[0] # delete old values
    html = mpld3.fig_to_html(fig) # convert matplotlib to d3
    # PATCH requests
    r = requests.patch("http://localhost:3000/values/" + str(id0),
                       json={'name': 'test1', 'type': 'html', 'value': html,
                             'pos': 0, 'description': 'simple test'})
    r = requests.patch("http://localhost:3000/values/" + str(id1),
                       json={'name': 'test2', 'type': 'img', 'value': url,
                             'pos': 1, 'description': 'simple image'})
    print r

Note that since REST APIs are universal, we could send pictures and graphs with any other language. In the next iteration of this app, using native D3 charts generated in the browser side we will make even easier to serve visualizations in a way that is language agnostic to the machine learning side (Lua and C++ are also popular for deep learning).

For those interested in playing with this code, from the source root directory run

coffee app.coffee

to start up the app and run

python test.py

to send data using the API. We can see the results at http://localhost:3000 and https://localhost:3000/values

If you are training a deep learning model with Keras you can run the app and use the Keras callbacks we provide, as in this example.

Since Agnez is an young project, we are expecting it evolve quickly. Help, suggestions and feedback are welcome.

Does AI stand for Alchemical Intelligence?

Mon, 14 Dec 2015 16:00:00 -0500

AI stands for artificial intelligence, but it currently has a lot in common with chemistry in the ages when it was named Alchemy.

Alex net recipe:

5 five convolutional layers
2 fully-connected layers
Softmax layer with 1000 outputs on top
Use Imagenet dataset for training
Train it carefully with SGD for 2 weeks if you don’t have proper equipment, or for 16 hours if you do.

In high school we learn that knowledge evolves from myths. The demystification is carried by careful investigation, experimentation, reproducibility and analysis, all guided by the scientific method.

The scientific method is a fantastic tool, but I have always been equally fascinated by what was accomplished with other knowledge tools such as art and mythical approaches. My main question was “What did the human mind feel like in pre-scientific method ages?”. For PhD candidate in deep learning, it was humbling to observe that we somewhat live in such an age with respect to artificial intelligence.

Deep learning is one of the computational approaches to unlocking the mysteries of intelligence. The understanding intelligence today, resembles the understanding chemistry in the times of the alchemists. Those chemical experimentalists of the old ages developed practical recipes for manipulating the dowries of Nature, all without the guide of today’s chemistry and physics. Similarly, we can achieve arguably intelligent-like behavior with deep learning. There is no broadly accepted definition in chemistry, physics, mathematics, nor in philosophy or poetry about what is Intelligence. Yet over and over again we have been able to write programs to tackle tasks that years ago would require a human to solve. These solutions are developed with intuition and arduous repetition, just like the alchemist’s workings.

I’ve noticed a few other parallels between the fields of artificial and alchemical intelligence that I list next.

Philosopher’s stone

Our mission SOLVE INTELLIGENCE - Google DeepMind

Alchemists’ ultimate goal was to turn any base metal into gold and find the Elixir of Life. Artificial Intelligence seeks general purpose intelligence, universal program solvers, and building machines with human-like intelligence. These ambitious goals are successfully motivating investigators and patrons alike.

On the other hand, although those involved in the research know how far they are from their Philosopher’s Stone, possible consequences of their declared ultimate goals can be fearsome for outsiders.

Fear

Our greatest existential threat - Elon Musk on AI

Control over gold and life would be nothing but disruptive. It would break economies, health, politics, religion, and any other power systems. Equally world changing would be a machine with human intelligence and without human weaknesses, built as a single owner homunculus. Couldn’t that owner use his homunculus as the ultimate worker and accumulate unequal wealth or use it as the perfect soldier to conquer less technologically gifted societies?

The unknown is scary and not everybody understands how AI or Alchemy works especially when practitioners are so fond of their mystic terminologies.

Mysticism

This state has a 2-dimensional structure: it consists of
w × h vectors of m numbers, i.e., it is a 3-dimensional
tensor of shape [w, h, m]. This "mental image" evolves
in time in a way defined by a convolutional gated
recurrent unit. - Łukasz Kaiser and Ilya Sutskever
on Neural GPUs Learn Algorithms.
(Quotation marks are ours)

Arthur C. Clarke noticed that, “Any sufficiently advanced technology is indistinguishable from magic”. This resonates deeply with our minds so much that we sometimes avoid attributing mundane descriptions to what we recently discovered possible with our magic. For example, alchemists used to name their elements after Roman gods to represent their powers. Deep learners like to attribute cognitive properties to their algorithms such as attention and dreaming, even if they are just calculating first order moments or sampling from a parametric probability distribution. Esoteric terminology has also been widely accepted in names such as Hidden Layers, Dark Knowledge, Skip-Thought Vectors and algorithms that Learn to Think.

After all, we have always calculated first order moments, but only now we use it to filter context relevant values. Also, random number generators have never consistently generated realistic or scary images.

It takes a few generations for a technology to feel common enough for practitioners to retract their mystic terminologies.

Enlightenment

"in recognition of the extraordinary services he has
rendered by the discovery of the laws of chemical
dynamics and osmotic pressure in solutions" - First
Nobel of Chemistry, awarded to Jacobus H. van't Hoff

No one can deny the importance of chemistry to our daily life. Although we can’t easily transform metals into gold, we know the Quantum laws that would explain how that might work. On average we now live about twice as long as those seeking the Elixir of Life. For that, we should thank understandings of the human body and medicines and all that started by claiming god’s workings in alchemists labs.

One day, AI might lose its mystic veil and we will no longer say that Neural Networks can dream or pay attention. But when that time comes we will also be practicing much more than smoke, self-driving cars, image recognition and mirrors.

Obviously, we will also rationally regulate AI the same way we regulate chemical weapons, by focusing on the people, companies and governments that might exploit it with bad will, and by teaching the technology alongside ethics classes.

Predicting the future

I feel happy with better understanding of how the mind of our predecessors might have worked. Looking back also helps me to understand the nature of the fear and wonder surrounding today’s advanced technologies. But these same technologies don’t evolve in unexplainable jumps. By the time we were able to fly, blow up a mountain, transform metal or extend our life spans we no longer felt it was appropriate to call it magic or threatening to human existence.

Right now we are still waiting for an AI theory that may enlighten the field in the same way that Quantum Theory unveiled the real magic of Chemistry. A theory that will not only explain what we did but also make even more possible. At that point robots will be able to talk, draw, dance, work, navigate using information in the spectrum of visible light and hearing sound, and behave indistinguishably from humans. When AI becomes less alchemical, I doubt that the majority of us will be scared, expecting the protection of an AI Inquisition or worrying about what a machine is thinking.