I still don't know of any RISC-V implementation that supports the POPC (and related) instructions. Would like to be pointed at any.

It is generally a bad idea to field an instruction-set architecture that turns common, brilliant optimizations into gross pessimizations.

The bit manipulation [0] extension has been ratified for a while now and is part of the RVA22 application extension profile [1].

You can already buy SOCs that support it, e.g. vision five 2 and star64.

Interestingly the risc-v vector extension [2] has it's own popcount instructions for vector registers/register masks. This is needed, because the scalable architecture doesn't guarantee that a vector mask can fit into a 64 bit register, so vector masks are stored in a single LMUL=1 register. This works really well, because with LMUL=8 and SEW=8 you get 100% utilization of the single LMUL=1 vector register.

Another interesting thing is that the vector crypto extension will likely introduce a element wise popcount instruction [3].

[0] https://github.com/riscv/riscv-bitmanip

[1] https://github.com/riscv/riscv-profiles

[2] https://github.com/riscv/riscv-v-spec

[3] https://github.com/riscv/riscv-crypto/blob/master/doc/vector...