What does HackerNews think of dataloader?

DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.

Language: JavaScript

#3 in GraphQL
#10 in Node.js
The most common practice is to turn N+1 into 1+1 using dataloaders (https://github.com/graphql/dataloader for JS, there are equivalents for most implementations). The N resolvers invoke a single batched loader which receives a list of keys and returns a list of values.
Dataloader (https://github.com/graphql/dataloader) eliminates many n+1 issues. Basically, entity requests are queued up during evaluation of the query and then you can batch them up for fulfillment. It’s a great addition to other asynchronous contexts as well.

WRT random requests, there are libraries for query cost estimation you can use as a gate.

GraphQL is designed around CRUD fetching patterns - so typically you would have a query per data type slice, and that query can be as complex as you like. In order to avoid N+1 queries, you should use dataloader: https://github.com/graphql/dataloader.

You can also join additional data to reference in deeper resolvers, but that's an antipattern AFAIK.

A dataloader implementation is your first step.

https://github.com/graphql/dataloader

> ... and microservices is one example of how to do it.

The fundamental problem is where the data resides. Microservices are well understood today, but taking them to the edge isn't; there isn't a path to do that for typical apps. So most microservices which are being used at the edge are doing caching/transcoding/resizing etc.

> Latency is key for some important applications like self-driving cars and industrial automation

They keep compute on-vehicle or on-prem. For data services (not media delivery), latency is:

a) either supremely important to be fully local (vehicles, automation)

b) or it doesn't matter enough to be on a 3rd party edge network. The diminishing returns in typical apps is what the article is alluding to.

> not really to make some queries in GraphQL

You're misrepresenting what I said - and it seems deliberate.

I mentioned GraphQL as one of the attempts to solve latency issues in typical apps.

For example, some apps use graphql/dataloader[1] because it can "coalesce all individual loads which occur within a single frame of execution before calling your batch function with all requested keys. This ensures no additional latency while capturing many related requests into a single batch."

So in typical apps, there isn't a big benefit to putting general compute on the edge - because the network calls are chunky and not chatty, and their data is centralized. GraphQL (along with libs/frameworks) being one way to turn chatty into chunky.

[1]: https://github.com/graphql/dataloader

I never fully solved this problem, so don’t trust me, but I can tell you what I learned...

I think for one thing you can’t really rely on joins for query efficiency, because as you say there are too many combinations so it’s impossible to optimize everything.

Instead you have to try to query each data type separately. So you get a query for users. You do an SQL call and gather up a bunch of requests for offices, and then you do a single request to your office backend.

I think the best case is something like n SQL queries per request, where n is the depth of the tree you are querying (users->office->address is depth 3).

That means you’re doing all your queries after the first one by ID (not by arbitrary columns). So you have to have some way to “pre-join” your tables. You can do this either by optimistically joining your data to everything around it (query the node plus all of its edges) or you need to store your edges in your data model (which I have to assume is what FB does).

In the end your resolvers need to be using some standardized way of grabbing objects by is (or edge), something like https://github.com/graphql/dataloader

Whether it’s possible to do this efficiently I don’t know. At my last job we messed it up, and then we started applying a strategy like I described above, but then I switched jobs.

Would love to hear from others who have dealt with the same challenges.

The trick is to abstract away what can make the DB tip over. This complicates your servers, for sure, but pays dividends on performance/scale.

How do you do that? https://github.com/graphql/dataloader

However, to be super efficient you need to give up on some consistency. You simply can't have data points which join directly in the db. Instead, you need to make separate parallel requests for those datapoints and let the dataloader be in charge of merging them into larger batches of requests for the db to fullfill.

This can result in some additional latency on a request, but ultimately provides the best way to be able to scale things out.

The benefit of rest is that it's easier to make a really fast on single request endpoint. You can precisely tune your db indexes to match your queries. For graphql to be fast and not kill your DB with a malicious query, you need to introduce that wait time/batching.

The n+1 problem can be fixed without introspection by using dataloaders: https://github.com/graphql/dataloader. It's actually quite pleasant to work with them, but it's definitely one more thing you need to know.
In large part, caching in GraphQL stems from client-side tools like Apollo or Relay, which have a fairly sophisticated cache ability.

https://www.apollographql.com/docs/react/caching/cache-confi...

https://relay.dev/docs/en/network-layer#caching

You can also implement things like Dataloader to batch/cache your requests:

https://github.com/graphql/dataloader

Hasura in particular implements two forms of caching. While not directly data-related, it does cache both the GraphQL query-plan and SQL query-plan with prepared statements:

https://hasura.io/blog/fast-graphql-execution-with-query-cac...

Hasura's architecture ensures that 2 "caches" are automatically hit so that performance is high:

GraphQL query plans get cached at Hasura: This means that the code for SQL generation, adding authorization rules etc doesn't need to run repeatedly.

SQL query plans get cached by Postgres with prepared statements: Given a SQL query, Postgres needs to parse, validate and plan it's own execution for actually fetching data. A prepared statement executes significantly faster, because the entire execution plan is already "prepared" and cached!

So cache on the client-layer, with cache on the server layer, especially if you implement something like Relay which has the ability to fetch only individual fragments, leads to pretty tiny and performant queries.

My biggest problem using graphql is regarding N+1 queries[0] and the usability of data loading libraries like[1].

So any improved developer experience to solve the N+1 problem is welcome :)

[0] https://itnext.io/what-is-the-n-1-problem-in-graphql-dd4921c...

[1] https://github.com/graphql/dataloader