All of these should come with the disclaimer that your use case among other things will determine how much your mileage varies. For instance, when I benchmarked (wrk) my json utilizing API with a payload size of 145-200 bytes + a minimal header it came out ahead of GRPC with protobufs. I was getting ~180,000 requests per second with serialization & deserialization on my laptop (Thinkpad T25). You're already saturating a 1gbps connection at 1/3 of that output but when you think of the extra leeway for processing that you have, not to mention the smaller size box you can run it on it's pretty nice.

This wasn't in java, the environments themselves are different, the code is different, the benchmarking tool is different, etc, etc so maybe it's not applicable but it still stands that you should investigate whether it will really be worth it or not to invest in the tech if you optimize elsewhere. I mean, I've tried similar things with python and have only gotten 20-40k rps across various frameworks.

Please take everything above with a grain of salt and do your own research.

Interesting, what was your test setup like out of curiosity?

Kore.io, 4 workers, modified cpp example, along with state machine https://github.com/jorisvink/kore

json library https://github.com/nlohmann/json

wrk, post with json payload via lua post script https://github.com/wg/wrk