Contribute Media
A thank you to everyone who makes this possible: Read More

Build your own data warehouse for personal analytics with SQLite and Datasette


Simon Willison

Many data enthusiasts dream of analyzing their own personal data, but few find time to build their own pipeline for it. This talk will show you how to get started with personal analytics with the highest possible return on your invested effort.

SQLite is the ideal tool for building a personal data analysis pipeline: it's free, fast and widely supported. Each database is a single file on disk, so you don't need to set up a database server to start using it. Tools that import data into SQLite can be written in any programming language, and its JSON support means it can even ingest data that may not fit neatly in a standard relational database table.

Datasette is a Python application that provides an interface over SQLite. It lets you bookmark and queries in your browser and export the results as JSON and CSV. The Datasette plugin ecosystem has over 30 plugins that extend Datasette in different ways, adding visualization tools, alternative export formats and more.

I'll show how to combine SQLite, Datasette and some simple Python scripts to ingest personal data from multiple different sources and build a personal data warehouse for your digital life. Data sources will include:

  • Twitter
  • Facebook
  • Apple Photos
  • LinkedIn
  • Google (via Google Takeout)
  • Foursquare / Swarm
  • GitHub
  • Apple Health
  • 23AndMe

Techniques that work for an individual can work for organizations too. I'll finish by showing how this approach to working with data can scale up to solving professional problems in addition to personal analytics.

Produced by NDV:

Python, PyCon, PyConAU, PyConline

Fri Sep 4 16:00:00 2020 at Curlyboi

Improve this page