Contribute Media
A thank you to everyone who has made this possible: Read More

Visualizing research data: Challenges of combining different datasources

Description

PyData Berlin 2016

Your source data has multiple formats? You have multiple API’s to pull data from? This talk will go through some common problems with solutions that you will face when trying to combine multiple different research data sources in programmatic way. We go through a real world web project on that visualizes poverty data with JSON API's, Shapefiles and Excel spreadsheets as data sources.

Introduction

  • Give talk goals: This talk aims to give the tools to solve the engineering challenges related to combining different datasources
  • Set the context: We use geodata from humanitarian projects as an example, but solutions will apply to other areas as well.
  • Go through talk outline

Part 2: Quickly introduce the project

Give the audience idea of real world project in preparation for the part 3

  • Show screenshots of the final project
  • Go through the used technologies (ESRI shapefiles, geo/topo json, xls, API’s, python libraries)
  • Introduce the data pipeline

Part 3: Explain common problems and our solutions for them

This is the meat of the talk, each point introduces problem and suggests at least one solution. Solutions are based on Python technologies

  • Handling different data formats
  • How to manage the data sources (validation, automation, etc)
  • Normalizing units
  • Mapping problems (different projects may follow different standards for the id’s)
  • Normalizing data and metadata

Part 4: Wrap up

  • Quickly explain how we applied these problems in the project
  • Sum up the things you should consider (check-list)

Details

Improve this page