I never fully solved this problem, so don’t trust me, but I can tell you what I learned...

I think for one thing you can’t really rely on joins for query efficiency, because as you say there are too many combinations so it’s impossible to optimize everything.

Instead you have to try to query each data type separately. So you get a query for users. You do an SQL call and gather up a bunch of requests for offices, and then you do a single request to your office backend.

I think the best case is something like n SQL queries per request, where n is the depth of the tree you are querying (users->office->address is depth 3).

That means you’re doing all your queries after the first one by ID (not by arbitrary columns). So you have to have some way to “pre-join” your tables. You can do this either by optimistically joining your data to everything around it (query the node plus all of its edges) or you need to store your edges in your data model (which I have to assume is what FB does).

In the end your resolvers need to be using some standardized way of grabbing objects by is (or edge), something like https://github.com/graphql/dataloader

Whether it’s possible to do this efficiently I don’t know. At my last job we messed it up, and then we started applying a strategy like I described above, but then I switched jobs.

Would love to hear from others who have dealt with the same challenges.