The major news are:

- SQL/PGQ - A Graph Query Language

- JSON improvements (a JSON type, simplified notations)

Peter Eisentraut gives a nice overview here: https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-fin...

I'm a pretty average SQL user, but I've heard expert consultants say before that they could do many more things with SQL databases that developers like me would have maybe grabbed a different tool for, like a graph database. So this then makes me wonder, once there's even broader adoption through PGQ, is that going to be a killer for niche databases like Neo4j, in favor of, say, Postgres?

derefr

Graph databases are about as different from RDBMSes storage-wise, as column-stores are from row-stores. It comes down to how you plan to shard data and distribute queries when data doesn't fit on a single node.

Using a graph DB with many underlying KV-store nodes, you can have a single graph spread over many machines representing e.g. Facebook's social graph, and run a query which "chases around" edges between vertices that live on different nodes, to solve that query, while ensuring that as little of that has to happen as possible — both by rebalancing vertices so that data is sharded at low-connection-degree points in the graph; and by consolidating the steps of queries that occur on the same node into single batch queries, such that the whole thing becomes (close to) a single map/reduce step.

There's nothing in Postgres that knows how to do that; if you had e.g. a graph stored in a Citus hypertable, and did a recursive CTE over it to do graph search, then you'd get pretty dang bad perf.

MarkusWinand

> It comes down to how you plan to shard data and distribute queries when data doesn't fit on a single node.

A problem everbody would love to have but pretty much nobody actually has.

jupp0r

> A problem everbody would love to have

Except the people who do have it and need to keep their business running off of one postgres instance.

justinclift

Interestingly, when a place does get to the point where the single instance has capacity issues (after upgrading to EPYC and lots of flash drives) then other non-obvious stuff shows up too.

For example, at one place just over a year ago they were well into this territory. One of weird problems for them was with pgBadger's memory usage (https://github.com/darold/pgbadger). That's written in perl, which doesn't seem to do garbage collection well. So even on a reporting node with a few hundred GB's of ram, it could take more than 24 hours to do a "monthly" reporting run to analyse PG usage for the time period.

There wasn't a solution in place at the time I left, so they're probably still having the issue... ;)