The author's lack of practical familiarity with the memory hierarchy, caching, and pipelining—evidenced by his surprise by the speed gained by fetching more than a byte at a time—also makes me suspicious of his understanding of the semantics of his program. (I understand that this is a form of ad hominem as it relates to arguing the correctness of software, but it's a consideration worth noting in a broader context of trust.)
It would be great if this was written in another language.
[0] Most "portable" code written in C is not portable. Look at any mature C project and see the layers and layers of macros and hacks to make things portable.
The ANSI C and C++ standards define the concept of abstract machine for the language semantics, just like in most languages.
Additionally you have the concepts of sequence points, the new memory model semantics for multi-threaded code and the beloved UB.
UB which doesn't exist in most languages, because their rather leave it implementation defined or lose the opportunity to target some strange CPU not able to support the language semantics.
Also Assembly doesn't has UB, making it ironic that it is safer to write straight Assembly at the expense of portability than C or C derived languages.