It didn't have to be done this way. In case somebody to fix this, here is an alternate implementation:

There is a separate library, or a part of libgcc, that can be mapped in multiple times. This allows trampolines without any limit other than memory. Alternately, just decide how many you want in advance, and place them into the executable.

Trampolines are conceptually in an array, to be allocated one by one, but there are actually two arrays because we'd want to separate the executable part from the non-executable part. On a 64-bit architecture with 4096-byte pages we might have a read-write page with 256 pairs of pointers, an execute-only page with 256 trampoline functions (which use those pairs of pointers), and possibly another execute-only page with helper code. The trampoline functions can use the return address on the stack to determine location, though x86_64 also offers RIP-relative addressing that could be used.

These arrays could be done as thread-private data or done with locking. Pick your poison. The allocation changes accordingly, along with the space requirements and cache-line bouncing and all. Be sure to update setjmp and longjmp as required.

There is also a much simpler answer to the usual security problems. If the stack will become executable and the option hasn't been given to explicitly allow this, linking should always fail at every step. Propagating the lack of a no-exec flag should be impossible without confirmation of intent.

I use them and I don't care about the trampolines because I don't have protected memory. Having used them, they are really useful. Leading me to think that the biggest thing missing from C is closures. Fix that and C becomes a lot better language.
two approaches i use -

each closure is a function which takes a number of saved arguments and a number of new arguments. construct a set of macros to define a closure structure. this structure contains a function pointer to the closure, plus all the saved arguments. another macro defines function to take the new arguments and construct a complete call to the handler function with the saved arguments. this solution carries the type information for all the arguments and flags at compile time attempts to call the closure with non-unifying new arguments. this general approach is from Sergei T and is used quite heavily here: https://github.com/nanovms/nanos

dynamically synthesize a function to shift the new arguments to the right in the call, and fill in the saved arguments from immediates. this is faster and application can use normal function calls, but its not type safe. working on a little c->c front end that maintains type safety

these are both really implementations of partial application rather than general closures. they are both heap based which allows them to be used for general callbacks without worrying about lifetime, but they become a big memory management problem.

with the exception of the memory issues - i think c+closures is a great environment for systems programming.