Originally published 22/08/2017 as part of the Bulletproof Technical Blog
Training & Prediction : Better, Faster and Cheaper
The second half of the conference consisted of more presentations, a series of workshops and further opportunities to speak with the presenters in the main hall.
Many of the talks I heard in this period focused on hardware and optimizing training performance on Deep Neural Networks (DNNs). This is an active area of research given the explosion of interest and technology developments over the last three years in particular.
I didn’t actually realize just how much was happening in Machine Learning and DNNs until I heard a number of presenters refer to “old” papers from 2015. So I stopped trying to make sure I understood it all and sat back to enjoy the ride.
And we jumped straight in feet first with “Device Placement Optimization with Reinforcement Learning”. As if the various concepts we deal with in Data Science weren’t already enough to keep you guessing, imagine using Machine Learning to help you train Machine Learning networks even more effectively. This group of researchers used Reinforcement Learning to allocate a set of TensorFlow graph operations across a set of computation devices (CPUs, GPUs and TPUs).
They were able to show non-trivial performance improvements on standard benchmark tasks for DNNs such as ImageNet classification using Inception-V3 and machine translation tasks using Recurrent Neural Networks with Long Short Term Memory units (LSTMs). In other words, Machine Learning produced a more effective Machine Learning process than hand crafted heuristics.
There was also a great talk by the well known Professor Latanya Sweeny from Harvard University called “How AI Designers will Dictate Our Civic Future”. In the US, data anonymity is very much at risk and many of the assumptions about privacy are repeatedly violated through a network of data set sharing throughout the US health system. Her talk was a call to arms for industry practitioners everywhere.
The Posters – Down the Rabbit Hole
If any particular work at ICML caught your fancy you could really drill down into it. A large section of the main hall was devoted to the Posters – a place where various teams set up displays advertising their work and then stand around ready to kind of evangelize on demand.
A team from Zurich discussed their ZipML framework where they argued that you don’t actually need 32-bit precision for training and high accuracy prediction. This leverages the fact that lower precision reduces computation and communication between GPUs so training times are necessarily reduced. Focusing on linear models, they were able use reduced precision representations to achieve an order of magnitude speed up in training times with guaranteed convergence and all without sacrificing prediction accuracy.
Prior to ICML, I’d already heard about the startling results of a team from Rice University that used a new hashing technique during network training to drastically reduce training times so I was particularly keen to listen to their paper “Optimal Densification for Fast and Accurate Minwise Hashing”. Hashing allows fast data lookup and is commonly used to obtain a compact representation of training data. Optimal Densification is a new technique that promises to replace Minwise Hashing and is particularly useful for large scale processing of high dimensionality and sparse real world datasets reducing training times by an order of magnitude.
As the Diamond Sponsor for ICML it was inevitable that NVIDIA would steal the show. They held an all-day workshop where we learned to use their deep learning training system known as DIGITS – a workbench that scales training across multiple GPUs automatically.
Over the course of the day we built a Convolutional Neural Network (CNN) for image classification using the MNIST dataset – a standard set of handwritten digits commonly used as a benchmark for assessing image classification algorithms.
The DIGITS framework itself is impressive. You can select from an existing set of best of breed network topologies and pre-trained models (Inception-V3, LeNet, AlexNext etc) or hand craft and train a network yourself. The interface gives real-time feedback on several key learning metrics over training epochs and has full dataset and model management. At any point you can save an existing model and add it to your own store of pre-trained models for further work downstream. This greatly facilitates experimentation and drastically reduces subsequent training times.
The second session was a deep dive on text structure, parsing and prediction using Recurrent Neural Networks using TensorFlow – RNNs are commonly used in tasks such as machine translation and sentence/word prediction. Here the RNN was trained to predict whole sentences given only a few words from a corpus of training data.
In the final session, the day’s work was combined and taken a step further by developing another RNN that was trained on a dataset of short videos using the COCO dataset from Microsoft which is a set of images of common objects with descriptive captions. The idea was to create a model that given an image (or frame from a video) could generate one or more captions describing the scene.
While some of the results were odd, overall my RNN only took 12 minutes to train and still produced fairly good results. Bear in mind that this was all done out-of-the-box without spending anytime tuning the model which is usually where all the time goes.
It was really hard to appreciate what we had done over the course of just a few hours. The DIGITS framework handles all the heavy lifting in large scale deep learning almost transparently. This allows the practitioner to leverage previous work, existing work in industry and focus on tuning for best results rather than having to worry about checkpointing, model management and scalability.
For anyone interested in learning more about Deep Learning or the DIGITS framework, the Deep Learning Institute has a series of online labs and an excellent blog for you to keep up with the latest developments.
Deep Learning Is Now
Take pause here a moment.
It’s hard to really appreciate that just a few short years ago undertaking deep learning wasn’t even possible at this scale. Tasks such as image classification required weeks of training, large investments in hardware and still produced less than human level accuracy.
But over the last few years a confluence of events has changed the landscape. Firstly, the explosion of cloud-based technologies has made it suddenly feasible to capture, curate, store and make available extremely large datasets coupled with the ability to easily create large, scalable and cheap computational networks.
Then a number of key breakthroughs in neural network theory came from academia and industry research addressing several issues with learning at scale in deep, wide networks, followed by the release of a number of hardened industry strength deep learning frameworks such as Google’s TensorFlow.
This perfect storm of events has resulted today in Machine Learning models that regularly out-perform humans in a wide variety of tasks such as object detection, image classification and language translation.
ICML 2017 was all about the very latest in cutting edge Machine Learning research. There’s no looming singularity here – this is just high-end applied technology at its best.
We live in exciting times.