Data Storytelling part 1: Introduction
As data scientists, it’s crucial to understand the difference between data visualization intended to gain insights and visualizations to communicate those insights to others. Exploratory visualization is used to form hypotheses and find hidden gems in the data, while explanatory visualization is used to clearly communicate insights to an audience.
It’s important to rea that the best visualization for exploration may not be the best for explanation.
So let’s start at the basics. Why do we do bother visualizing data to begin with?
Some people are under the impression that charts are simply “pretty pictures”, and that all of the important information can be derived through statistical analysis. However, visualizations are hugely beneficial to uncover information from your data that simply staring at the raw numbers won’t reveal.
Take the dataset visualized below for example. All images represent 13 distinct datasets with the exact same summary statistics, while drastically different in appearance when plotted. (courtesy of Same Stats, Different Graphs)
But most data scientists know that visualization is an incredibly useful tool to understand your data better. So.. what does that have to do with data storytelling?
First of all, there are different types of visualization. We have data visualizations like the one below, which is a simple scatter plot of nine data points made with matplotlib. This was made, just like our dinosaur plot, to get a little more insight in the data at hand.
But this is a very different chart from the chart we see in below. Whereas one is a simple plot, most likely intended for data analysis, the other is designed to incite an emotion. The title, the color and the orientation of the bars, they all contribute to the message – you don’t even have to read the numbers to know what that message is.
And just as easily, the message can be flipped. The data visualized is indentical, but by reversing because the way the data was presented was different, it guides you to a wildly different conclusion.
Why does that matter to me?
The first chart we saw, the simple matplotlib scatter plot, is an example of exploratory data visualization. This is what we do if we, as individuals, are looking for insights hidden in our data. We form hypotheses – for instance, is the customer satisfaction going up over time? – and check these with our data through visualization. This is what you do when you’re looking for whatever is noteworthy or interesting about your data. The hidden gems, so to say.
The second type of data visualization is explanatory data visualization. This is the visualization you make when you want to communicate the insight you found with someone else, communicate to an audience– most likely, not a data scientist.
It doesn’t immediately have to be a fancy infographic, like we saw before. But where exploratory data visualization is experimentation-driven and doesn’t require a clear conclusion, explanatory data visualization does intentionally guide the reader to a certain conclusion.
And it turns out:
exploratoryvisualization is not necessarily the best explanatoryvisualization
When you are exploring your data, you have the luxury of time. You can look at summary statistics, create different visualizations, spend time disecting the information on the graph presented to you, and make connections to insights you gathered before.
When you are communicating your insights, you do not have that luxury of time. If you’re presenting to a stakeholder, you cannot take them through your whole process which led you to your insight; it should be as clear as possible from the get-go.
And more often then not, there is a point you want to make. An action you want them to take. Maybe your analysis revealed that the recent changes to the marketing campaign have made people look at your company more favourably. Therefore, you recommend continuing with this. Or maybe you have discovered these farm chickens grow to be the healthiest on a certain food, therefore you recommend this diet.
The point is: explanatory visualizations aren’t there to give some color to your slides. They are there to support a point. Because data is only valuable if given context.
And while data itself may be objective, every decision you make on how to present it shapes a different story.
And that’s where data storytelling comes in. The art of visualizing your data in such a way that it starts the conversation you want to start.
So how can you shape your own data story? Focus on the following aspects:
See the next parts of this series for details.