The dependency chain is state += 0x60bee2bee120fc15ull or (state += UINT64_C(0x9E3779B97F4A7C15)); the rest of the calculations are independent per iteration.
Anyway, the more important fact is that 64x64b -> 128b mul might be one instruction on x86, but it's broken into 2 µops. Because modern CPUs generally don't design around µops being able to write two registers in the same set.
It's a shame we can't see the rest of the code. What is happening to the result value? Is it being compared to something? Put into an array, or what? All of that code probably totally outweighs what you pointed out here. Or, at least it should. I have a bad feeling it might be being dead-code eliminated, since compilers are super aggressive about that nowadays, but I hope he's somehow controlled for that.
godbolt clang compiles it to:
.LBB5_2: // =>This Inner Loop Header: Depth=1
mul x13, x11, x10
umulh x14, x11, x10
eor x13, x14, x13
mul x14, x13, x12
umulh x13, x13, x12
eor x13, x13, x14
str x13, [x0, x8, lsl #3]
add x8, x8, #2 // =2
cmp x8, x1
add x11, x11, x9
b.lo .LBB5_2
[1] https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...