Database Sharding with Django
Wednesday 4 p.m.--4:45 p.m.
Audience level: Intermediate
Django works great for single database webapps, but did you know they give us the tools to create webapps with sharded data right out of the box? This talk will go over how we leveraged Django's features to add sharding to our webapp infrastructure. Abstract
Topics to cover:
- Why do you need to shard your data? lots of data lots of writes
- What should you do before sharding? Scale up - throw money at it. Feature partitioning - split feature data on different databases Sharded databases.
- How to set up multiple databases in Django tools that we used at Wave. Talk about what Django gives us right out of the box (DB Routers) How database routers work. Gotchas when using South for DB migrations.
- The downsides of multi-database systems. what we lose (ForeignKeys, Transactions management, select_related(), prefetch_related())
- Scaling the database via sharding. what does this mean? how do I pick a key to shard on? what makes a good and bad sharding key?
- Database routers for sharding.
- New tooling that needs to be written to deal with database migrations. deal with database migrations
- How to interact with your data through the ORM when it's sharded.
- Need to start using Globally Unique PK's go over a few different strategies used by other companies. strategy for unique ID generation in Python.
- Transactions + Sharded data. transactions are only useful on same-shard
- Balancing sharded data across servers we chose multiple DB's per node. make db migration a DevOps task. downsides (limited db connections).