For nix: https://github.com/google/walk
You can also use dmenu's stest to do the equivalent to find's -type f, { walk $PWD | stest -f }
Found it: https://github.com/google/walk
I cannot speak to why your "naive" C variant might have been slower than necessary. I might (wildly) guess that you did unnecessary string handling/allocation. You really just need one re-used buffer and a memcpy out of dirents to the tail of said buffer (or even directly to stdio's output buffer). With modern Linux FSes you can use the d_type to decide recursion, not a stat. EDIT: Output perhaps might also be/have been hamstrung by not using fwrite_unlocked on Linux. Really just wild guesses, though. I can also say that https://github.com/google/walk mentioned in other subthreads is almost as fast as `dents find` and over 2x faster than GNU find on the same linux git tree problem (up to commit 923dcc5eb0c111eccd51cc7ce1658537e3c38b25, btw).
It may actually be because most ftw()s call stat (and so are quite slow, at least without some kernel magic like IO uring or sys_batch) that the non-stat calling mode is poorly optimized. In that context, it may seem like a more minor optimization.
Walk is faster than find. https://github.com/google/walk