Originally published 22/08/2017 as part of the Bulletproof Technical Blog
In part one of I wrote about some of the history and theoretical basics of Artificial Neural Networks (ANNs). Now it’s time to look at what changed to lead us to where we are today.
In the pre-cloud era, time, cost and computational constraints meant that large scale research was prohibitively difficult. It was also unclear exactly how to scale ANNs out to hundreds of layers and thousands of neurons. And even if it was possible, could you get any results worth using?
So given how difficult it was, it’s worth asking what the motivation was in the first place. Why did ANNs suddenly need to be deeper (more layers) and wider (more neurons in each layer)?
It turns out that the first layers in an ANN tend to learn to identify basic features in the input data. Subsequent layers then appear to combine these into higher level constructs. Overall, the representation of the “concepts” learned by ANNs tend to be distributed across the network topology and in fact you can even kill off links and neurons and still get acceptable performance.
However advanced tasks such as object detection, human identification, image classification, machine translation and speech recognition required more complex networks with enough freedom to derive, represent and combine sub-concepts in arbitrarily complex ways.
The truth is, even today, no-one really knows why ANNs work as well as they do. But it was clear that the new game was going to have to be all about training deeper, wider networks if any further progress was to be made.
Which Is All Good In Theory But…
ANNs generally do most of their initial learning in the first few layers of the network. So ideally, the error signals that get sent back through the network (via Backpropagation) from the output layer need to be still be strong enough to have a meaningful effect when they get there.
And that’s the problem. These weights and error corrections are very small numbers that get multiplied together many times. So the more layers you have in the network, the smaller that error signal becomes causing training in the initial layers to slow to a crawl. This Vanishing Gradient Problem was a major hurdle to progress and it wasn’t until the early 2000s that various solutions were found.
There were also other mathematical hurdles to overcome but the real breakthrough was the rise of cloud computing drastically reducing the cost of large scale computation. It was suddenly feasible to store and explore terabyte and then petabyte scale datasets without major upfront investment.
Entire clusters could be brought online cheaply and easily for large scale training and then torn down automatically. Research and industry interest rekindled and Deep Learning open source frameworks such as Caffe, DeepLearning4j, Theano and Tensorflow started to appear.
In 2012 a team developed AlexNet (an early 8 layer convolutional network) and wiped the floor with the competition in the annual ImageNet challenge. In 2014 GoogLeNet won it with even better results. Then in 2016, DeepMind’s AlphaGo beat the world ranking Go player 4-1 using a mix of traditional AI and Deep Learning.
This Isn’t Judgement Day
This confluence of seemingly unrelated events drastically changed the Machine Learning landscape. The media (as it always does) supplied an endless list of poorly researched doomsday articles about “The Rise of The Machines” poised to enslave humanity. A number of very smart people who really should have known better started calling for an end to AI research and the looming Singularity was all over the internet.
Anthropomorphising AI and Machine Learning is an easy trap. With phrases like Neural Networks, Genetic Algorithms, Restricted Boltzmann Machines, Support Vector Machines, Self Organising Maps, Autoencoders, Generative Adversarial Networks and Neuroevolution of Augmented Topologies (a personal favourite), is it any wonder that those outside the field are getting ready to welcome our new overlords?
It’s important to remember that these techniques were only inspired by biological systems. They are merely complex algorithms not evolving sentience resenting their enslavement. So go with the experts – when Andrew Ng, one of the fathers of AI and Deep Learning research, isn’t worried there isn’t anything to worry about.
Deep Learning is a big leap forward. But it is still just another step in our quest for understanding ourselves and the world around us.
Categories: Deep Learning