In another decade or so the world might replicate half of the very nice internal tools Google has.

Suggestion for a project: make a tool that, given a proto description and a file that contains concatenated proto messages stored as binary strings (sort of like RecordIO at Google) lets you run simple SQL queries on the data and extract a subset of the fields from messages matching a predicate, and maybe even do simple aggregations. That was pretty handy. I really wish Google would open source some or most of this stuff. It’s not like keeping it closed source creates any kind of insurmountable competitive advantage, especially compared to the advantages that would accrue from broader adoption of protobufs.

When I was at Google, I kept an eye on the open sourcing of RecordIO. Apparently there was no desire not to open source it: it was simply that nobody had the time to disentangle and/or clean it up for release.

Looks like some parts of it have escaped… https://github.com/eclesh/recordio

If you were interested in RecordIO, then this project might also be of interest to you: https://github.com/google/riegeli