“Orders are Orders!” Ok – I so fell into another hole. This time it was using the assign method on a pandas data frame. It turns out that the assign method, which ostensibly […]
Adventures in Python #3
The iterrows Experience There are a number of ways to iterate over the rows of a Pandas data frame and without doubt the worst is to use the iterrows method. I didn’t […]
Adventures in Python #2

Two Paths to Comprehension? When I first learned about python’s list comprehensions I remember reading that the easiest way to think about them is as a rolled up for loop. But there’s […]
Spark SQL Aggregations – Gotcha!

Background A few months ago we were writing a Spark job to process AWS billing data. The idea was that every day we’d automatically spin up an Amazon EMR cluster which would do […]