What does HackerNews think of picohttpparser?

tiny HTTP parser written in C (used in HTTP::Parser::XS et al.)

Language: C

Yea, it is definitely a fake HTTP server which I acknowledge in the article [1]. However based on the size of the requests, and my observation of the number of packets per second in/out being symmetrical at the network interface level, I didn't have a concern about doubled responses.

Skipping the parsing of the HTTP requests definitely gives a performance boost, but for this comparison both sides got the same boost, so I didn't mind being less strict. Seastar's HTTP parser was being finicky, so I chose the easy route and just removed it from the equation.

For reference though, in my previous post[2] libreactor was able to hit 1.2M req/s while fully parsing the HTTP requests using picohttpparser[3]. But that is still a very simple and highly optimized implementation. FYI, from what I recall, when I played with disabling HTTP parsing in libreactor, I got a performance boost of about 5%.

1. https://talawah.io/blog/linux-kernel-vs-dpdk-http-performanc...

2. https://talawah.io/blog/extreme-http-performance-tuning-one-...

3. https://github.com/h2o/picohttpparser

I'm skeptical of the performance numbers. First, like others here I don't believe nginx's performance will be a bottleneck for HTTP/2. Beyond that, I suspect there are cases in which this code is much worse than nginx.

Here's one. Look at the example request loop on <https://github.com/h2o/picohttpparser/>. It reads from a socket, appending to an initially-empty buffer. Then it tries to parse the buffer contents as an HTTP request. If the request is incomplete, the loop repeats. (h2o's lib/http1.c:handle_incoming_request appears to do the same thing.)

In particular, phr_parse_request doesn't retain any state between attempts. Each time, it goes through the whole buffer. In the degenerate case in which a client sends a large (n-byte) request one byte at a byte, it uses O(n^2) CPU for parsing. That extreme should be rare when clients are not malicious, but the benchmark is probably testing the other extreme where all requests are in a single read. Typical conditions are probably somewhere between.