One big issue to point out:
>Floating Point: NAMD
GCC sucks at automatic hardware vectorization. So does the LLVM. Really the only time you can count on getting automatic hardware vectorization is if you shell out for the ICC AND write your code in Fortran. I'm gonna bet they didn't vectorized a goddamn thing, but we can't inspect anand's binary so we'll never know. The results are still likely correct, but IBM should have lost by a smaller margin.
TL;DR
POWER8 is fun but costs 5k more then Xeon per rack mount and uses about 2x the power usage for 10% less performance on generalized work loads. But can pull off 10-15% more performance on _some specialized_ workloads. So meh?
I agree that gcc sucks for this; we didn't have LLVM last time I was doing this kind of work, so I can't comment on that.
This ends up being -very- fast.