Adventures in Python

Adventures in Python #4

“Orders are Orders!”

Ok – I so fell into another hole. This time it was using the assign method on a pandas data frame.

It turns out that the assign method, which ostensibly takes a dictionary of columns to be added to the data frame (but actually doesn’t unless you dereference the dictionary) will actually re-order the columns depending on which version of python you happen to be using.

This has nothing to do with pandas – it’s a consequence of the fact that dictionaries are unordered prior to python 3.6.

So here’s what happened – I was using python 3.6 locally to clean some basic data, add some calculated columns, reformat others etc as part of an ETL pipeline in preparation for an automated load into a data store.

The new data frame is then written out and triggers the next stage in the pipeline. It seems simple and it is – example notebook here:

reorder3.6

However when I ran it on an EC2 instance the ingest kept failing. Turns out that the columns weren’t being written out in the same order and thus data type mis-matches ensued against the target schema.

Here’s the same code under python 3.5:

reorder3.5

Turns out that the Amazon Linux AMI used to launch the instance had python 3.5 on it.

So code that looks right is wrong depending on where you run it. Because under python 3.5 the columns you add via a dictionary are added based on the alphabetical order of the column names.

To be fair the python team is aware of this issue but I lost half a day tracking this one down.

The simple answer of course is to upgrade python to 3.6+, update AMIs to latest version etc etc. But in the real world, you may not always be able to do that. There’re plenty of reasons why you may have to deploy on earlier versions of python because you don’t control the entire stack.

thisexhibitisclosedFor the record I have never encountered a situation in any other language where code as written is semantically changed by the compiler/interpreter.

When I add columns to a data frame in a specific order it’s because that’s the order they need to be in.

 

It wasn’t a request. Orders are Orders.

And this exhibit is closed.

 

 

 

 

 

 

 

 

 

 

 

 

Leave a Reply