Getting to know your data

This week I was on a project team that learned an incredibly important lesson about creating data-driven visualisations:  get to know your data really well before you get started. Any visualisation you build is considerably sculpted, not only by the meaning implicit the data, but also by how the data has been captured. The issue we faced was trying to explore a story that depended on finely granulated geocoding, but we only realised too late that our data reflected a ‘coarser grain’ of location. For another visualisation this would have been perfect – our issue is no reflection on the quality of the (super cool) data we were using – but, in our context, the data clearly didn’t function with the narrative we wanted to construct and the meaning we wanted to convey. At the last minute we had to re-think our visualisation, and explore a completely different facet of the data. Though we managed to get a new project finished, the process made for some frazzling moments and a late night.

What makes ‘getting to know’ your data difficult?

Time pressure

We were working with 8000 records under significant time pressure, so wanted to dive straight into the building. But, though we began with a great idea for what we wanted to explore, our hurrying meant we didn’t take the time to carefully assess how the location data might function on a map; or how it might relate to the other fields we were trying to represent.  It sounds painfully simple but on the next project, I would take the time early on to speculate on these subjects. With a deadline looming, it would be less hectic to miss out some final features than have to re-think the visualization.

Starting with an undefined idea of what you (ideally) want to do explore.

It’s oxymoronic: the final story and visualisation must appropriately reflect how the underlying data has been captured; but, in order to assess your data you must have a clear idea of that story and the visualisation you want to build. We began with a general idea of what we wanted to convey. This meant that, while we conjectured about how it could look and work, we didn’t sharply envision the vital features needed to convey that idea. And… you can’t always work it as you go along. I think next time I would follow an iterative process: start with a clear idea, assess what foundational features demonstrate that effectively, check the data is capable of that, reshape the idea and so on. Again, taking the time for this at the beginning saves a whole world of coffee and confusion at 11pm.

Applying this lesson outside of the Fieldschool?

The data I work with on a day-to-day basis focuses on Wellington’s print history. My biggest dataset is spreadsheet upon spreadsheet containing details about late 19th-century printers, publishers, booksellers and engravers. It is meticulously geocoded and able to be sliced by year or print service. At this stage, some records are geocoded by numbered street address but some only by street name. From this week’s project I can easily tell that there are limitations to what this data can tell me or how it will convey meaning most effectively. It could tell you an interesting story about the frequency of available print services in individual streets of the city. But – at this stage I would not be able to create a network diagram that links individual addresses.

Leave a Reply