What does HackerNews think of dataloader?
DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.
WRT random requests, there are libraries for query cost estimation you can use as a gate.
You can also join additional data to reference in deeper resolvers, but that's an antipattern AFAIK.
The fundamental problem is where the data resides. Microservices are well understood today, but taking them to the edge isn't; there isn't a path to do that for typical apps. So most microservices which are being used at the edge are doing caching/transcoding/resizing etc.
> Latency is key for some important applications like self-driving cars and industrial automation
They keep compute on-vehicle or on-prem. For data services (not media delivery), latency is:
a) either supremely important to be fully local (vehicles, automation)
b) or it doesn't matter enough to be on a 3rd party edge network. The diminishing returns in typical apps is what the article is alluding to.
> not really to make some queries in GraphQL
You're misrepresenting what I said - and it seems deliberate.
I mentioned GraphQL as one of the attempts to solve latency issues in typical apps.
For example, some apps use graphql/dataloader[1] because it can "coalesce all individual loads which occur within a single frame of execution before calling your batch function with all requested keys. This ensures no additional latency while capturing many related requests into a single batch."
So in typical apps, there isn't a big benefit to putting general compute on the edge - because the network calls are chunky and not chatty, and their data is centralized. GraphQL (along with libs/frameworks) being one way to turn chatty into chunky.
I think for one thing you can’t really rely on joins for query efficiency, because as you say there are too many combinations so it’s impossible to optimize everything.
Instead you have to try to query each data type separately. So you get a query for users. You do an SQL call and gather up a bunch of requests for offices, and then you do a single request to your office backend.
I think the best case is something like n SQL queries per request, where n is the depth of the tree you are querying (users->office->address is depth 3).
That means you’re doing all your queries after the first one by ID (not by arbitrary columns). So you have to have some way to “pre-join” your tables. You can do this either by optimistically joining your data to everything around it (query the node plus all of its edges) or you need to store your edges in your data model (which I have to assume is what FB does).
In the end your resolvers need to be using some standardized way of grabbing objects by is (or edge), something like https://github.com/graphql/dataloader
Whether it’s possible to do this efficiently I don’t know. At my last job we messed it up, and then we started applying a strategy like I described above, but then I switched jobs.
Would love to hear from others who have dealt with the same challenges.
How do you do that? https://github.com/graphql/dataloader
However, to be super efficient you need to give up on some consistency. You simply can't have data points which join directly in the db. Instead, you need to make separate parallel requests for those datapoints and let the dataloader be in charge of merging them into larger batches of requests for the db to fullfill.
This can result in some additional latency on a request, but ultimately provides the best way to be able to scale things out.
The benefit of rest is that it's easier to make a really fast on single request endpoint. You can precisely tune your db indexes to match your queries. For graphql to be fast and not kill your DB with a malicious query, you need to introduce that wait time/batching.
https://www.apollographql.com/docs/react/caching/cache-confi...
https://relay.dev/docs/en/network-layer#caching
You can also implement things like Dataloader to batch/cache your requests:
https://github.com/graphql/dataloader
Hasura in particular implements two forms of caching. While not directly data-related, it does cache both the GraphQL query-plan and SQL query-plan with prepared statements:
https://hasura.io/blog/fast-graphql-execution-with-query-cac...
Hasura's architecture ensures that 2 "caches" are automatically hit so that performance is high:
GraphQL query plans get cached at Hasura: This means that the code for SQL generation, adding authorization rules etc doesn't need to run repeatedly.
SQL query plans get cached by Postgres with prepared statements: Given a SQL query, Postgres needs to parse, validate and plan it's own execution for actually fetching data. A prepared statement executes significantly faster, because the entire execution plan is already "prepared" and cached!
So cache on the client-layer, with cache on the server layer, especially if you implement something like Relay which has the ability to fetch only individual fragments, leads to pretty tiny and performant queries.
So any improved developer experience to solve the N+1 problem is welcome :)
[0] https://itnext.io/what-is-the-n-1-problem-in-graphql-dd4921c...