When DNNs become more human: DNNs vs Transfer Learning vs. Continual Learning

As new approaches to deep neural networks (DNNs) and deep learning
continue to emerge, it’s important to understand the differences between them. Not
all DNNs are created equal, and small tweaks in their architecture sometimes
have profound implications in their applicability to real-world scenarios. For
practical purposes, the choice of different ‘shades’ of Neural Networks – from
traditional DNNs, to Transfer Learning, to new more brain-like approaches like
Continual Learning – can make the difference between a failed experiment and a
realized deployment, especially in real-world scenarios closer to the ones
where humans operate.

How Today’s DNN Work

Neural network
algorithms are the backbone of AI. They derive their
power from the ability to learn from data, as opposed to being preprogrammed to
perform any given function. Neural networks also use a learning formalism
called Backpropagation. Introduced in the late 70s and widely re-discovered and
adopted in the recent years, Backpropagation is able to match and sometimes even
surpass human-level performance in an ever-increasing list of tasks, from
playing chess to detecting an intruder in a security camera.

this super-performance comes at some heavy price. Backpropagation networks are
very sensitive to new information and are susceptible to catastrophic
interference. When something new is learned, it wipes out old information. To
mitigate this problem, researchers made the training process a lot slower and
froze learning after the target performance was reached, to avoid
compromising older information learned when new information is added. And, to
retrain a DNN, one needs to have all data stored to add new one.

today’s DNN are slow to train, static after
training (because updating them is glacially slow) and sometimes impractical
in real applications.

about Transfer Learning?

Transfer Learning is a pretty popular approach where a previously developed DNN
is ‘recycled’ as the starting point from which DNN learns a second task.
Essentially, with Transfer Learning nothing really changes with respect to the traditional DNN methodology, except that
you can train on a bit less data.

For example, take a DNN
model that was trained on Imagenet which can recognize 1000 object classes
(cars, dogs, cats, etc.). If you want to train a new model to identify the
difference between a specific breed of dogs, Transfer Learning would try to
learn these breeds by initializing a new network with the weights from the
bigger 1000 class network which already knows a bit about what dogs look like.
And while it may make the training of the new network faster and more reliable,
there is a catch: the newer network can now only recognize these two specific
breeds of dogs. Instead of recognizing the 1,000 objects it previously learned,
it will now only be able to identify only two.

So, even though this
training would be faster than your training sessions initialized from scratch,
it could still take anywhere from hours to days, depending on the size of the
dataset, and it will know much less.

Then, How about Continual Learning?

There is another category
of DNNs that are gaining traction, belonging to the camp called Continual (or
Lifelong) learning. An implementation of which, called Lifelong-DNN (L-DNN),
inspired by brain neurophysiology, is able to add new information on the fly. Unlike
DNNs and Transfer Learning, it uses a completely different methodology where
iterative processes typical of Backpropagation are mathematically approximated
by instantaneous ones, in an architecture that introduces new processes,
layers, and dynamics with respect to traditional DNNs.

When it comes to training,
you only train once on every piece of data you encounter. This translates into
massive gains in training speed, where, on the same hardware, L-DNN can train
between 10K to 50K faster than a traditional DNN.

So Which Way is the Best Way?

In terms of training data
algorithms, there are several approaches to choose from, and no, they are not
all the same. If you have infinite compute power, data, and time, then using a
DNN makes sense. If you do not, L-DNN can be the only way to cope with certain
use cases where data is scarce, computation must be fast and local, and the
model needs frequent updates.

As AI continues to be integrated into every facet of our lives, in tough real-world scenarios, we’ll see novel approaches to neural networks such as L-DNN emerge that will more closely approximate the continual learning and flexibility of humans. After all, AI is built in our image!

About the Author

Dr. Massimilliano Versace is the CEO and Co-Founder of Neurala. Max continues to lead the world of intelligent devices after his pioneering breakthroughs in brain-inspired computing. He has spoken at numerous events including a keynote at Mobile World Congress Drone Summit, TedX, NASA, the Pentagon, GTC, InterDrone, GE, Air Force Research Labs, HP, iRobot, Samsung, LG, Qualcomm, Ericsson, BAE Systems, AI World, ABB and Accenture among many others. His work has been featured in TIME, IEEE Spectrum, CNN, MSNBC, The Boston Globe, The Chicago Tribune, Fortune, TechCrunch, VentureBeat, Nasdaq, Associated Press and hundreds more. He holds several patents and two PhDs: Cognitive and Neural Systems, Boston University; Experimental Psychology, University of Trieste, Italy. 

Sign up for the free insideBIGDATA newsletter.

(Visited 2 times, 1 visits today)