Why does hello world take 1.8ms to run? That's ~5,000,000 cycles. By my measurements with Actually Portable Executable the latency of vfork+execve+exit+wait4 on a physical computer running Linux should be 25µs. That's two orders of a magnitude difference. https://github.com/jart/cosmopolitan/blob/master/test/libc/s...

How would you time this? I tried a simple C hello world and got a similar ~2ms run time. But I was just running "time ./hello" at a Bash prompt, which a) only has 1ms resolution and b) probably measures some shell overhead. Running it under "strace -tt" so I could see where the slow bits were made it 10x slower :-(

Edit: FWIW, building with "gcc foo.c -o foo -O2 -s -Wall --static" was slightly faster than without the --static. And "musl-gcc foo.c -o foo -O2 -s -Wall --static" was slightly better again. Maybe 0.5ms improvement in total. I'm on Ubuntu 20 x64, gcc 9, Intel Xeon W-2133.

You could write a non-shell launcher using fork+exec+wait and present the results with finer granularity than time(1). There may be something adequate available off-the-shelf but I’m not familiar with it.

Got you covered. https://github.com/jart/cosmopolitan/blob/master/examples/ru...

    $ git clone https://github.com/jart/cosmopolitan && cd cosmopolitan
    $ make -j8 o//examples/rusage.com o//examples/hello.com
    $ o//examples/rusage.com o//examples/hello.com
    hello world 123
    RL: took 90µs wall time
    RL: ballooned to 108kb in size
    RL: needed 106µs cpu (0% kernel)
    RL: caused 13 page faults (100% memcpy)
This one has a tiny bit more overhead than the practical minimum you saw earlier in the benchmark. Note that APE binaries start off as shell scripts so the first run is going to be slower. It's something I'm working towards improving in a variety of ways. Here's the executive summary of the upcoming changes:

    int ws, pid;
    CHECK_NE(-1, (pid = fork()));
    if (!pid) {
      execve("ape.com", (char *const[]){"ape.com", 0}, environ);
      perror("execve");
      _Exit(127);
    }
    CHECK_EQ(pid, wait(&ws));
    CHECK_TRUE(WIFEXITED(ws));
    CHECK_EQ(0, WEXITSTATUS(ws));
Here's the latency of the various execution strategies:

    FORK+EXEC+EXIT+WAIT      │       APE │ APE-LOADER │ BINFMT_MISC
    ───────────────────────────────────────────────────────────────
    fork() + execve()        │      55µs │       66µs │        56µs
    vfork() + execve()       │      25µs │      446µs │        35µs
    /bin/sh -c ./ape (1st)   │     485µs │      457µs │       170µs
    /bin/sh -c ./ape (avg)   │     148µs │      466µs │       159µs