> Both the Ruby Oj and the C parser OjC are the best performers in their respective languages.

Um, no, they arent:

According to the OjC README:

> No official benchmarks are available but in informal tests Oj is 30% faster that [sic] simdjson.

Source: https://github.com/ohler55/ojc/blob/master/README.md#benchma...

If you have benchmarks that show otherwise, that would be great for the discussion here, but your point appears to have already been addressed?

svnpenn

Thats one person saying it is faster. Without a reproducible test, thats pretty much worthless.

coder543

You don’t see the irony in that statement? You’re one person saying simdjson is faster, and you are also lacking a reproducible test. That doesn’t make your side of the argument particularly compelling either.

I’ve never used either library, so I don’t care which one is faster, but I would generally trust the author of the library to have tested the performance at some point more than I would trust someone who isn’t the author to comment on the performance of code they’ve never tested.

The author of that library isn’t present to make their case, so you’re the only one of the two who could provide evidence right now.

EDIT: the author is present now: https://news.ycombinator.com/item?id=23738512

glangdale

Given how things turned out, this might be a good lesson as to whether "I measured my own library informally on some benchmarks that I couldn't be bothered to describe to you and it WINS!" is trustworthy in future.

When we introduced simdjson, we went out of our way to explain (a) why we thought it might be faster than other libraries and (b) how people could replicate our measurements or find fault in our methodology. There were zero performance claims, public or private about the performance of simdjson before this material was available.

In my long experience, undocumented and non-reproducible performance claims suffer from the fact that people generally stop benchmarking when they get the result that they wanted to hear, even if the absolute numbers involved are unreasonable. It's very easy to make mistakes in performance measurement (compiling simdjson with anything short of -O2 or -O3, doing repeated runs of a tiny input and thus getting perfect branch prediction behavior), but people generally fix these mistakes only on the side that makes their stuff look good.

Back when I worked on the Hyperscan project ( https://github.com/intel/hyperscan ), we had an API call that allocated scratch memory (necessary due to Hyperscan's rather constrained approach to memory usage). I lost count of the number of times we had users run this allocation (which only ever needed to be done once per thread, effectively) inside their measurement loop. And whenever we were competing against an incumbent system and the people doing the measurement were the authors of that incumbent system, it was amazing how quickly they'd accept the "truth" of their benchmark run.