How many of these are still relevant? x86 has an instruction for crc, bswap, as well as nop now.

Now almost every code to be run is cache-bound for performance, not cycle-bound. So this gems, while may still work, may not be the best means to achieve optimal performance. Optimizing cache access is what brings the biggest speedups today -- see for example NumExpr[1].

[1] https://github.com/pydata/numexpr