IME the XML spec is so complex that you either end up with a slow but compliant parser or a fast one that doesn't implement the spec completely.
JSON, unlike XML, is minimal enough that writing an entire compliant parser with SIMD intrinsics [1] is actually practically feasible. That library claims 3 GBps parsing speed, which could theoretically process your 120kb of data in 1/25000th of a second instead of 2/1000ths of a second.
I would wager that JSON is faster to parse, on balance.
[0] https://web.archive.org/web/20080209172554/https://rapidxml....
Used simdjson [1] together with python bindings [2]. Achieved massive speedups for analyzing the data. Before it was in the order of minutes, then it became fast enough to not leave my desk. Reading from disk became the bottleneck, not cpu power and memory.
[1] https://github.com/simdjson/simdjson [2] https://pysimdjson.tkte.ch/
I did this one for reading json on the fast path, the sending system laid out the arrays in a periodic pattern in memory that enabled parseless retrieval of individual values.
CBOR could be another option: https://en.wikipedia.org/wiki/CBOR
Anyway, he has a point, `cout` is used extensively as a logging mechanism. If you don't see that "single millisecond" making any difference, you certainly haven't work on a relevant system.
> XML and Javascript are tree structures and not suitable for efficiently storing tabular data (plus other issues).
You can certainly be efficient with json(net). See:
Notice how they are separate objects:
{'name': 'foo', 'age': 2}
{'name': 'cat, 'age': 6}
You can do it very efficiently: https://github.com/simdjson/simdjsonCompress it if you need compact.
There's also UBF, but it never saw much traction: https://ubf.github.io/ubf/ubf-user-guide.en.html#specificati...
The Python protobuf implementation has a somewhat checkered history (I used protobuf v1 and v2 for a long time, and reviewed v3 a tiny bit).
The type system issue is that protobufs to a large extent "replace" your language's types. It's essentially language-independent type. So that means you are limited to a lowest common denominator, and you have the issues of "winners" and "losers"... I would call Python somewhat of a "loser" in the protobuf world, i.e. it feels more second class and is more of a compromise.
This doesn't mean that anybody did a bad job; it's just a fundamental issue with such IDLs. In contrast, JSON/XML/CSV are "data-first" and there are multiple ways of using them and parsing them. You can lazily parse all of them, DOM and SAX, for example, and you have push and pull parsers, etc. Protobufs have grown some of that but it wasn't the primary usage, and many people don't know about it.
---
If your worry is parsing speed, then JSON not only has battle-tested parsers, but also has SIMD-assisted parsers which can process gigabytes a second on a single core (e.g. https://github.com/simdjson/simdjson). It would take Internet Object years to develop parsers as performant as that, even if it did, by some miracle, achieve wide uptake. So the notional advantage afforded by not having keys on each row is neither here nor there.
And incidentally, as someone who's written a handful of parsers, I suspect that this scheme would not be particularly easy to parse. You need lookahead because of optional fields, as well as maintaining state and a lookup table for mapping positions to keys, etc. I can draw up a quick parser in pseudocode or Python to explain, if you disagree.
PyPI: https://pypi.org/project/pysimdjson/
There's a rust port: https://github.com/simd-lite/simd-json
... From ijson https://pypi.org/project/ijson/#id3 which supports streaming JSON:
> Ijson provides several implementations of the actual parsing in the form of backends located in ijson/backends: [yajl2_c, yajl2_cffi, yajl2, yajl, python]
If so, Ive been following it for a couple years, but I put it out of my mind recently after moving to AMD. I could sware it was an intel only project, but a quick scan of the that git suggests I'm wrong. So either I'm totally missremembering, or AMD support was added later.
Anyway, I cant wait to try that out again. I wonder why most projects don't just use this as their default json parser now?