I’m posting this because it’s a really cool graphic. This isn’t mine – it’s been floating around the tech internet for a year now so thought I’d put it up here for […]
Pro Tip #2 – Excel Preparation
It often happens that you need to provide a dataset to non-tech personnel in your company. The most common use case for this that I’ve encountered is when a business unit wants […]
Pro Tip #1 – The Joy of TABs
Exporting data into flat files is a really common task for a data scientist yet over the last few years I’ve seen almost everyone fall down a particular rabbit hole when it […]
Deep Learning – Part Two
Originally published 22/08/2017 as part of the Bulletproof Technical Blog In part one of I wrote about some of the history and theoretical basics of Artificial Neural Networks (ANNs). Now it’s time to […]
Deep Learning – Part One
Originally published 21/08/2017 as part of the Bulletproof Technical Blog In this blog and the next I’m covering some of the history and basics of Deep Learning. Earlier this year I went to the […]
Adventures in Python #1
Lately I’ve been getting re-acquainted with the boto libraries for interacting with AWS. The newest version is boto3 which is a more stable service oriented API – very different from the original […]
ICML 2017 – Part Two
Originally published 22/08/2017 as part of the Bulletproof Technical Blog Training & Prediction : Better, Faster and Cheaper The second half of the conference consisted of more presentations, a series of workshops and […]
Real-time Data with Snowplow Analytics
Originally published 07/04/2016 as part of the Bulletproof Technical Blog The importance of collecting data As a society we need to measure, plan, predict and test and to do that we need data. […]
Analytics with Microsoft Azure
For every day that passes, more devices come online, more applications are deployed and more data sources are spun up. There has never been a time in our history when data is […]
Spark SQL Aggregations – Gotcha!
Background A few months ago we were writing a Spark job to process AWS billing data. The idea was that every day we’d automatically spin up an Amazon EMR cluster which would do […]