I got run times from the simplest single-threaded directory walk that are only 1.8x slower than git ls-files. (Min time of 10 runs with the git repo housed by /dev/shm on Linux 5.15.)
The "simple" code is in https://github.com/c-blake/cligen/blob/master/cligen/dents.n... (just `dents find` does not require the special kernel batch system call module to be fast. That kernel module is more about statx batching but IO uring can also do that these days. For those unfamiliar with Nim, it's a high productivity, high performance systems language.)
I believe that GNU find is slow because it is specifically written to allow arbitrary filesystem depth as opposed to "open file descriptor limit-limited depth". (I personally consider this over-/mis-engineered from days of bygone systems with default ulimits of, say, only 64 open fd's. There should at least be a "fast mode" since let's be real - file hierarchies deeper than 256..1024 levels, while possible, are rare and one should optimize for the common case or at least allow manual instruction that this case prevails. AFAIK there is no such "fast mode". Maybe on some BSD?)
Meanwhile, I think the Rust fd is slow because of (probably counterproductive) multi-threading (at least it does 11,000 calls to futex).
> I believe that GNU find is slow because it is specifically written to allow arbitrary filesystem depth as opposed to "open file descriptor limit-limited depth".
I haven't benchmarked find specifically, but I believe the most common Rust library for the purpose, walkdir[1], also allows arbitrary file system recursion depth, and is extremely fast. It was fairly close to some "naive" limited depth code I wrote in C for the same purpose. A lot of go-to C approaches seem to needlessly call stat on every file, so they're even slower.
I'd be curious to see benchmarks of whether this actually makes a difference.
It may actually be because most ftw()s call stat (and so are quite slow, at least without some kernel magic like IO uring or sys_batch) that the non-stat calling mode is poorly optimized. In that context, it may seem like a more minor optimization.