Data Visualisation

Less is More

A little while back I stumbled upon an article called Data Looks Better Naked and then read Storytelling With Data. The combination of these two resources completely changed my thinking on data presentation.

See, when we start playing with plots (or charts/graphs) it’s amazing how much we just love all those awesome effects – we spend hours playing with drop shadows, specular lighting effects, wild colour schemes and suddenly all become experts on all those font family, face and size variations.

But I think the “Less Is More” mantra holds us in good stead in data science and what follows is a step by step example of its application.

Scenario

You’re working at an observatory tracking solar flares and your boss has asked you to make a business case for a 10x increase in cloud compute. Processing the imagery takes longer for certain types of flares and over half your workload is bottle-necked. How can you best tell this story to a non-technical executive team?

Dataset

Here’s a sample of the Solar Flare data set from the UCI repository :

solar

In this case the task is to present a breakdown of the percentages of each class of largest solar flare spot size observed (second column above)

Seven Drafts to Clarity

R source code to create these plots can be found here.

First Draft

The obvious plot type here is the bar chart and here’s the original plot:

solar_plot1

The data-to-pixel ratio here is pretty small – all we’re really showing is 6 categorial variables and their percentage values. Does this plot really need to be this dense?┬áLet’s apply a series of refinements and simplify the message…

Second Draft

Let’s get rid of those background colours as they add no information:

solar_plot2

Third Draft

The title already tells us what the plot is showing so we don’t need the axis labels. Further, the legend adds zero value here and takes up real estate. Let’s lose them too:

solar_plot3

Fourth Draft

Next let’s take out the major and minor gridlines and remove the borders:

solar_plot4

Fifth Draft

In order to focus the eye on the data, we want the text to be less conspicuous – so remove the bold face and change the text colour to a light grey. Shorten the title and left align it.

Finally direct label the plot to reduce cognitive load on the viewer:

solar_plot5

Sixth Draft

We know it’s a percentage plot so we don’t need the vertical axis and as always, sorting the data will bring out the trend and use of colour will highlight the bottleneck:

solar_plot6

Final Draft

Simple use of text annotation summarises the point of the plot.

solar_plot7

So here’s the side by side comparison between start and end – as you can see, the final plot focusses the eye on the data itself allowing for a cleaner, leaner message:

solar_plot1 solar_plot7

It’s important to remember that many people have to look at plots many times a day and a key role for a data scientist is to be able to communicate effectively to business stakeholders as well as technical audiences.

We need less pixels in this world!

Categories: Data Visualisation

Tagged as: , ,

Leave a Reply