I was working on Distributed Services for AIX in 1986 and 1987, a distributed filesystem to compete with NFS. As this was being developed by a dev team, my colleague and I pondered how to test the system that we had architected.

There were so many possible states that a system's file system can be in. Were the conventional tests going to catch subtle bugs? Here's an example of an unusual but not unheard of issue: in a Unix system a file can be unlinked and hence deleted from all directories while remaining open by one or more processes. Such a file could still be written to and read by multiple processes at least until it was finally closed by all processes having open file descriptors at which point it would disappear from the system. Does the distributed file system model this correctly? Many other strange combinations of system calls might be applied to the file system. Will the tests exercise these.

It occurred to me that the "correct" behavior for any sequence of system calls could be defined by just running the sequence on a local file system and comparing the results with running an identical sequence of system calls against the distributed file system.

I built a system to generate random sequences of file related system calls that would run these on a local file system and a remote distributed file system that I wanted to test. As soon as a difference in outcome resulted the test would halt and save a log of the sequence of operations.

My experience with this test rig was interesting. At first discrepancies happened right away. As each bug was fixed by the dev team, we would start the stochastic testing again, and then a new bug would be found. Over time the test would run for a few minutes before failure and then a few minute longer and finally for hours and hours. It was a really ingesting and effective way to find some of the more subtle bugs in the code. I don't recall if I published this information internally or not.

The key to modern fuzzing is feedback, usually some kind of coverage measurement of the program under test. This allows the fuzzer to be much smarter about how it finds new code paths and discards inputs that don't extend coverage. This makes fuzzing find bugs a lot quicker.

Google have a project to do fuzzing on Linux system calls using coverage feedback: https://github.com/google/syzkaller