One of the key sticking points in discussions comparing machine learning and the brain is how the notion of “learning” differs between computational and biological systems. In section 2 of their paper, Marblestone et al. grapple with this issue in detail. For our introduction post on this paper, go here.

Deep neural networks are trained with an algorithm known as backpropagation, which is the most efficient training algorithm that we have invented so far (in terms of the number of parameter updates). If neural networks are analogous to neural circuits in the brain, then either:

  1. Neural circuits are also trained with backpropagation
  2. Neural circuits are trained with a better and previously undiscovered algorithm

When describing  the biological plausibility of backpropagation, the authors implicitly assume  that the brain doesn’t use a less efficient algorithm. That is, they believe that the performance achieved by the brain is not possible with something less efficient than backpropagation.

Much of the research at this intersection has focused on demonstrating that item 1 from above  is at least plausible, and we will dig into it in detail below.

However, there is a different notion of efficiency that machine learners must contend with: data efficiency. In typical deep learning settings, we provide extremely large sets of data all at once for training. Brains, on the other hand, learn through experience and interaction with an environment, a notion broadly termed “reinforcement learning.”

Reinforcement learning algorithms have gained popularity in machine learning with the invention of Deep Reinforcement Learning (Deep RL). But as we will discuss below, there are still large gaps in data efficiency and performance between Deep RL and humans.

So, how similar is machine learning to learning in the brain? Let’s find out.

Backpropagation and the Brain

Neural networks are trained with a method called backpropagation (backprop), which uses algorithms such as stochastic gradient descent to change the strength of the connections between neurons.  In a multi-layer network, the updates for a particular connection in one layer are computed using information from layers further upstream (at higher levels) in the network.

Remember that when we train a neural network, we show the network some data (e.g. an image) and the network produces a prediction based on that data (e.g. a prediction for whether the image contains a cat).  The changes in the strength of neuronal connections depends on how far the network’s prediction is from the true value.  For example, if the network was very sure that the picture was of a cat, when in fact it was of a banana, some pretty serious updates will need to occur to make sure the network doesn’t make a silly mistake like that again.

From http://www.catshaming.co.uk/

So, intuitively, if the brain is similar to a neural network, it should be doing something like this during learning.  The brain perceives some input (e.g. it experiences the world through vision) and performs makes some predictions about the world based on what it sees.  Think of a child learning to name farm animals.  Horses and chickens look pretty different, but maybe it’s harder to distinguish a spotted horse from a cow.  The child learns the defining features of each animal by trying to name animals, and being wrong some of the time.  When the child is wrong, she changes something about the way she thinks about farm animals so that she’ll be more likely to correctly identify similar animals in the future.

There are several ways that learning in artificial neural nets (ANNs) diverges from what is possible in the brain.  The first is that ANNs do “credit assignment” which is the process of assigning blame for poor predictions to particular neuronal connections.  This is easy to do in a computer because, essentially, the computer algorithm can communicate directly with every neuron about its connections and how they should change.  In practice, the update for each neuron’s connections can be made using the information available in upstream (higher) layers of the network.

In the brain, it is not obvious how this credit assignment could take place.  There is no global process that could communicate what is needed to perform credit assignment at a neuron-by-neuron level.  Furthermore, there are typically no backwards connections between neurons.  That is, synapses are not symmetric.  Information flows from neuron A to neuron B, but not necessarily backwards from B to A, unless through a third neuron, C.   So neurons in the brain don’t have access to how much individual neurons in the each layer of the network contributed to a bad prediction.

So what’s a brain to do?  This is an active area of research (as Marblestone et al. point out), and people aren’t really sure how the brain creates or strengthens its connections in response to errors.  There’s some knowledge of how neurons might strengthen or weaken connections in response to coincident firing (Hebbian learning), but not in response to some global signal about connection “goodness”.

One recent advance has shown that neurons might not need to know about the connections of the neurons around them, so long as they have access to some information about the global error signal (i.e. how bad was the final prediction).  It’s often true that, while there are not symmetric connections between neurons, for every brain area A that sends information to brain area B, B also has connections to send information back to A.  Thus, learning could occur with just this simple piece of feedback about top-level error.

Here’s a great video with some more detailed information.

Reinforcement Learning

Implementation aside, what does it mean to train a brain?

The broad theory encompassing learning in the brain is known as reinforcement (here are some fun slides targeted at MLers). Probably the most common example is Operant Conditioning, showcased by Pavlov’s dog.

image courtesy of wikimedia commons

A dog can learn to associate an unrelated stimulus (in this case a bell) with a reward (food) and adjust its behavior accordingly. In general reinforcement learning is about learning policies that maximize rewards and minimize punishments.

Machine learners are not new to the idea of reinforcement, and reinforcement learning has become a subfield of machine learning in its own right. A favorite algorithm is Q-learning, in which we learn a function Q that maps states of the world (e.g. “I hear a bell”) to actions (“I should drool now, because food is coming soon”).  Here’s a video of Q-learning in action. The idea is to choose actions that will maximize future expected reward.

At the time that Marblestone et al. published, the Deep Q Network (DQN) was the state-of-the-art for deep reinforcement learning. The idea is simple: train a deep network using the Q-learning algorithm. Because a deep network can approximate almost any function, it is as good a candidate as any for estimating a Q function.  The DQN was able to achieve human (and even superhuman) performance on quite a few Atari games, an impressive feat.

However, there were two problems:

  1. Data efficiency. Humans can learn with fewer attempts than the DQN.
  2. Long timescales. Humans are able to estimate rewards much farther in the future.

A great example of how the DQN struggle with problem 2 is the Atari game Montezuma’s revenge. In this game, you must make it to the top of a pyramid. The catch is that there are no intermediate rewards, only punishments. Eventually the DQN learns to just never leave the starting platform – not dying is better than dying, after all! Here is a cute video illustrating this.

The recent improvement to try to tack both one and two simultaneously is again inspired from the brain: adding an explicit memory system – specifically, an episodic memory system. Episodic memories are memories about the things that happen to you, and about facts about the world. For example, knowing when your mom’s birthday is, or what you ate for breakfast this morning. An example of non-episodic memory is knowing how to ride a bike.

The Neural Episodic Control algorithm, as they call it, is a modification of the DQN in which “memories” (the value of the Q function) are stored for particular actions and used for additional training. This effectively supplements training and does in fact make the NEC more data efficient, getting better performance for less game time. Problem 1 has been somewhat ameliorated, although NEC still does not achieve human-level data efficiency.

But did this solve problem 2? Yes and no. NEC does better on games where the DQN struggled because DQN failed to take advantage of things that pay off in the long term (e.g. special items in Ms. Pac-Man). However, NEC still can’t conquer Montezuma’s revenge. I’ll be interested to see what they try next to tackle the issue.

Concluding Thoughts

Nicole: Honestly I’m not sure if we should think of neural network nodes as corresponding to single neurons – I wonder if we’re barking up the wrong tree with that comparison. I personally am much more interested in the intelligent agents approach. In deep reinforcement learning we see a real union and interplay of neuroscientific concepts and machine learning.

Alona: I feel like the two (RL and neural networks) are coming at the problem of learning from two very different perspectives, and both have something to add to our understanding of biological learning.  I’m interested to see how the study of artificial neural networks can help us understand the brain, and vice versa.  

I’m also interested in the idea of cost functions — that is, what exactly is the brain optimizing?  The cost function for playing montezuma’s revenge is very different from the cost function for learning to name farm animals.  These cost functions can operate at very different time scales (e.g. many levels of a video game vs responding to a particular picture stimulus) and take very different information into account (visual, verbal, touch, movement…).  I’m interested in how the brain might define its own set of cost functions, and how those functions might interact with particular learning tasks.  

In our next post we’ll handle that question. What are the cost functions in the brain?