Via https://github.com/zoidbergwill/awesome-ebpf#user-space-ebpf
- QBE [1] - small compiler backend with nice IL
- DynASM [2] - IIUC the laujit's backend, that can and is used by other languages
- uBPF - Userspace eBPF VM. Depending on your DSL the eBPF toolchain could fit your use-case, but this would probably be the biggest excursion. There is some basic assembler in python.
The point being that you could rip the eBPF implementation out of the kernel, remove the verify check and have a very usable VM.
Here's a implementation that exists because of the GPL: https://github.com/iovisor/ubpf.
> "anything that takes a context switch per traced function will probably be dramatically slower."
Good thing uprobes don't context switch:
# perf stat -e context-switches -e probe_libc:re_search_internal sed '/./d' /mnt/data.txt
Performance counter stats for 'sed /./d /mnt/data.txt':
6 context-switches
15,122,432 probe_libc:re_search_internal
19.744738204 seconds time elapsed
You mean mode switch? Cheaper, but yes, still costly. Here's runtime without the probe: # time sed '/./d' /mnt/data.txt
real 0m3.349s
user 0m3.345s
sys 0m0.004s
Which means we can calculate the cost to be ~1.1 us per probe (on my system). Anyone know what XRay is clocking in at?AFAIK, LTTng has done work for user<->user instrumentation. I think uBPF will be doing this (https://github.com/iovisor/ubpf) - although that project is very new. Could use some help from some more good engineers (please do!).
> "In "speed while not tracing", anything much more expensive than nop-sleds will be too slow to run in production."
I'm not sure anyone is suggesting anything more than nop-sleds. Dynamic tracing is zero, and static tracing is nop-sleds.
> "probably won't be able to completely hook functions that get inlined"
Sure. Sometimes there's static tracing probes (nop-sled based), sometimes there isn't and it's dynamic probes, sometimes those dynamic probes are inlined and you walk up the stack to find one that isn't. If it is inlined, maybe you need to trace the address rather than the function entry.
In my experience it's pretty rare that something is just untracable because inlining is so insane. But yes, it does happen sometimes. Usually I figure out a workaround before giving up.
> "a mechanism to execute arbitrary code at function entry or exit in a way that's runtime-customizable and very low overhead when you want it to be"
BPF! In-kernel virtual machine that runs JIT'd code on events, and is part of mainline Linux. Lots of enhancements in the Linux 4.x series.