A little while back I stumbled upon an article called Data Looks Better Naked and then read Storytelling With Data. The combination of these two resources completely changed my thinking on data presentation.
See, when we start playing with plots (or charts/graphs) it’s amazing how much we just love all those awesome effects – we spend hours playing with drop shadows, specular lighting effects, wild colour schemes and suddenly all become experts on all those font family, face and size variations.
But I think the “Less Is More” mantra holds us in good stead in data science and what follows is a step by step example of its application.
You’re working at an observatory tracking solar flares and your boss has asked you to make a business case for a 10x increase in cloud compute. Processing the imagery takes longer for certain types of flares and over half your workload is bottle-necked. How can you best tell this story to a non-technical executive team?
Here’s a sample of the Solar Flare data set from the UCI repository :
In this case the task is to present a breakdown of the percentages of each class of largest solar flare spot size observed (second column above)
Seven Drafts to Clarity
R source code to create these plots can be found here.
The obvious plot type here is the bar chart and here’s the original plot:
The data-to-pixel ratio here is pretty small – all we’re really showing is 6 categorial variables and their percentage values. Does this plot really need to be this dense? Let’s apply a series of refinements and simplify the message…
Let’s get rid of those background colours as they add no information:
The title already tells us what the plot is showing so we don’t need the axis labels. Further, the legend adds zero value here and takes up real estate. Let’s lose them too:
Next let’s take out the major and minor gridlines and remove the borders:
In order to focus the eye on the data, we want the text to be less conspicuous – so remove the bold face and change the text colour to a light grey. Shorten the title and left align it.
Finally direct label the plot to reduce cognitive load on the viewer:
We know it’s a percentage plot so we don’t need the vertical axis and as always, sorting the data will bring out the trend and use of colour will highlight the bottleneck:
Simple use of text annotation summarises the point of the plot.
So here’s the side by side comparison between start and end – as you can see, the final plot focusses the eye on the data itself allowing for a cleaner, leaner message:
It’s important to remember that many people have to look at plots many times a day and a key role for a data scientist is to be able to communicate effectively to business stakeholders as well as technical audiences.
We need less pixels in this world!
Categories: Data Visualisation