There's a part which seems strange, where clang is used with "-O2" to generate code:

  $ clang -O2 -target bpf -Xclang -target-feature -Xclang +alu32 -c sub64.c -o - | llvm-objdump -S -
> Apparently the compiler decided it was better to operate on 64-bit registers and discard the upper 32 bits.

The workaround was to use the `volatile` keyword.

The problem kind of sounds like one of the LLVM optimisation passes made the change.

http://releases.llvm.org/8.0.0/docs/Passes.html#transform-pa...

Wonder if disabling optimisations ("-O0") would have also worked?

Compiling to eBPF is hard! The compiler must: - avoid loops (backwards jumps) - unroll everything - inline everything - no function calls

Basically, it's impossible to use clang to generate eBPF without -O2. Sorry.

I'm new to this area of work, but has anyone stepped back and thought: "this is not the right way to build eBPF programs"? Would it be better to create a new high-level language and toolset? All this voodoo hackery to try to trick a C compiler into making eBPF-compatible code feels unsustainable.

bcc https://github.com/iovisor/bcc if you need low-level abstractions or bpftrace https://github.com/iovisor/bpftrace if you need something simpler (similar to dtrace)