What does HackerNews think of riegeli?

Riegeli/records is a file format for storing a sequence of string records, typically serialized protocol buffers.

Language: C++

> didn’t find any standard for separating protobuf messages

The fact that protobufs are not self-delimiting is an endless source of frustration, but I know of 2 standards for doing this:

- SerializeDelimited* is part of the protobuf library: https://github.com/protocolbuffers/protobuf/blob/main/src/go...

- Riegeli is "a file format for storing a sequence of string records, typically serialized protocol buffers. It supports dense compression, fast decoding, seeking, detection and optional skipping of data corruption, filtering of proto message fields for even faster decoding, and parallel encoding": https://github.com/google/riegeli

> Compare that to something like protobuf: it's not a self-synchronizing stream, so if you send someone multiple messages without framing them (prefix by length or delimited are popular approaches), they're going to decode a single message that doesn't make much sense on the other end. And they won't be able to fix it at all.

FWIW, this is a conscious design decision with Protobuf: it allows for easy upsert operations on serialized messages by appending another message with the updated field values. This is very useful for middleware that wants to either just add its own context to a message it doesn't even parse [1], or for middleware that might handle protobuf messages serialized with unknown fields.

On the other hand, 'newline delimited protobuf' is much less useful day-to-day than ndjson, as gRPC provides message streaming, which solves the issue of wanting to stream small elements of a long response (which is the general usecase of ndjson from my experience). For on-disk storage of sequential protobufs (or any other data, really), you should be using something like riegeli [2], as it provides critical features like seek offsets, compression and corruption resiliency.

[1] - eg. passing a Request message from some web server frontend, through request routers, logging, ACL and ratelimit systems up to the actual service handling the request.

[2] - https://github.com/google/riegeli

It looks like https://github.com/google/riegeli might be what you're looking for? (from a search of "RecordIO")
There is no listed equivalent of RecordIO. What do people use for high-reliability journals?

When I needed something like RecordIO to store market data, I couldn't find anything. So I implemented https://github.com/romkatv/ChunkIO. I later learned of https://github.com/google/riegeli (work in progress), which could've saved me a lot of time if only I found it earlier. I think my ChunkIO is a better though.

If you were interested in RecordIO, then this project might also be of interest to you: https://github.com/google/riegeli