Hey everyone, it's Ozgun. When I first wrote this blog post, it was much longer. Based on initial feedback, I edited out parts of it to keep the post focused.

If you have any questions that aren't covered in the post, happy to answer them here!

Thanks for writing the post. Sharding is something I’m consudering at my current job.

How long do these sharding projects usually take? Do you know of any posts that break down the steps in more detail?

Timeframes for sharding projects vary quite a bit. If you have a B2B database, we find that sharding projects usually take between one to eight weeks of engineering (not clock) time. Most take two to three weeks.

A good way to tell is by looking at your database schema. If you have a dozen tables, you'll likely migrate with one week's of effort. If your database has 250+ tables, then you'll take about eight weeks.

When you're looking to shard your B2B database, you usually need to take the following steps:

1. Find tables that don't have a customer / tenant column, and add that column. Change primary and foreign key definitions to include this column. (You'll have a few tables that can't have a customer column, and these will be reference tables)

2. Backfill data to tables that don't didn't have customer_id / tenant_id

3. Change your application to talk to this new model. For Rails/Django, we have libraries available that make the app changes simpler (100-150 lines). For example: https://github.com/citusdata/activerecord-multi-tenant

4. Migrate your data over to a distributed database. Fortunately, online data migrations are starting to become possible with logical decoding in Postgres.

If you have a B2C app, these estimates and steps will be different. In particular, you'll need to figure out how many dimensions (columns) are central to your application. From there on, you'll need to separate out the data and shard each data group separately.