My experience of property based testing is that you end up either testing trivial stuff that you don't really care about (sure, maybe you have a bug with extremely large integers that can't actually happen in real life), or you're creating tests every bit as complex and buggy as the implementation itself to represent realistic states. Quite often you're just directly duplicating the logic you're testing to prove it works in all cases.

These just seem like the worst kinds of test, highly coupled with implementation details. For the benefit of finding some bugs with edge cases (which may or may not be spurious), I just don't believe it's worth the effort, given that you still need all your normal tests anyway. Maybe I am just unlucky not to work in a more pure, mathsy domain, I dunno.

I can see how you might avoid "highly coupled with implementation details" example-based tests. With a little abstraction, you can do the same with property-based tests.

Properties as external assertions that you code up as a test. An easy example is to think about the contract of a stack, and the properties that result, and how you could test those properties with arbitrary example: pushing arbitrary t of T on a stack of T results in a stack 1 deeper popping a non-empty stack results in a stack 1 shallower, etc.

How does this work in more complicated situations. For example, how do you use PBT to show that some operation is thread-safe?

> how do you use PBT to show that some operation is thread-safe?

One approach is to make values to represent the "actions" we care about (e.g. "increment", "sendMessage", "delete", etc.), and write tests which take random lists of actions as arguments and execute those actions.

We can calculate the expected result by either using a non-threaded 'oracle', or by generating actions with a known result (e.g. if we always generate equal numbers of "increment" and "decrement" actions, then the result should match the original value).

As an example, I wrote a dynamic binding mechanism based on thread-local storage, which required special handling of thread pools to ensure futures wouldn't see stale values. To test this, I made a tiny domain-specific language (AKA a handful of classes) defining futures/awaiting, dynamic binding/lookup, and string literals/appending. I made two interpreters (AKA recursive functions) for this DSL: one using real futures/threads/dynamic-binding, the other just a straightforward calculation with a couple of stacks. To test the correctness of my dynamic binding mechanism in the face of futures/threading, I simply test that randomly-generated programs in that DSL will give the same result from both interpreters. (I also wrote a few properties to test each interpreter too!)

One very nice example of this is Jepsen:

https://github.com/jepsen-io/jepsen