Property-based testing is typically introduced with simple functional examples, like reversing a list. It is a very profitable technique for testing pure code! It's less clear to the beginner how to generalise these techniques for a messy, effectful production application. It is easy to end up with outstanding test coverage for the edges of the system, but little assurance that the effectful core is correct.
Often the intermediate solution is to write glorified unit tests that use some generated inputs to perform a small, predetermined set of actions. This has a very slight advantage over regular old unit tests; for example, a randomised IO unit test is quite likely to identify the incorrect use of a third-party API, or a malformed database query. It's still a pretty heavy compromise.
Property-based testing promised a rich declarative style that looks more like the design than the implementation, and to find a lot more bugs than a unit test. How can we do better?
My complex effectful system designs don't often look like pseudocode -- they look like abstract state machines. If you're unfamiliar, state machines can be drawn up in this lovely whiteboard-friendly format:
We have:
The various states of the system have been encoded as labels, in this case Locked and Unlocked. The available actions in each state are encoded as arrows, in this case Coin and Push. Though both actions are available in both states, their effect on the machine's state depends entirely on the initial state when the action is performed.
This is some pretty straightforward logic. If the turnstile is locked, it can't be pushed. If we insert a coin, the turnstile is unlocked. If we push an unlocked turnstile, it locks.
It's fun to model this kind of system as a pure functional program:
What a pleasant model! We probably wouldn't even bother writing tests for it.
Unfortunately, as you probably already know, this model is insufficiently enterprise. We need to implement a turnstile server to provide gated access to all the threads, and the entire organisation. It would be nice, after the fact, to test that our big weird effectful machine behaves like this simple function.
The Hedgehog API in use here is still in flux, so this code may stop working in the near future, though it shouldn't break a whole lot.
First, let's build the program.
Since we've got mutable state now, let's model it with an IORef:
This extremely enterprise turnstile server needs to have atomic actions, so there's a fun and unreasonably effective hammer we can reach for: atomicModifyIORef'.
That's it! Ship it.
Most of the power of this technique comes from modelling state and control flow in a way that is easy to reason about.
This model should be written in terms of externally-visible information. It should not contain any implementation details. When testing a web application, for example, you might store generated passwords directly in the model; the application itself would store hashes.
The state in this case is pretty straightforward, so this should look a little familiar. Ignore the kind signatures and parameters for now:
Recall that we had two types of actions available, coin and push.
In Hedgehog, these are specified as discrete datatypes, parameterised by a functor. Again, ignore this detail for now.
Squint a bit, and observe that we're encoding two functions that take no arguments with two constructors that take no arguments:
The key function to be used is actions, which expects a list of Command. Commands are made up of:
I recommend looking at a worked example to see what's going on. The types are a little complicated, though for a good reason!
I'd suggest writing a command for each transition in the state machine, give or take. You might want additional commands to ensure bad inputs don't lead to bad states.
Let's write our simplest command, the one for Coin:
Since this generator always produces Coin without looking at the state, both our coin transitions are covered.
For push, outcomes are very different depending on the initial state. So, let's break them out into two different commands.
When the turnstile is locked, pushing should not change the state, and should not allow anyone in:
When the turnstile is unlocked, pushing should change the state to TLocked, and we should get a result of True:
That's it, we've encoded the whole state machine. Let's run it!
Once the machine has been specified in this way, our work is done. It's over to Hedgehog.
First, we create a turnstile. We then use the actions combinator to generate an arbitrary list of actions. The executeSequential function takes an initial model state, and applies that list of actions in order, ensuring preconditions and postconditions are satisfied.
Unfortunately, this test passed first try! The machine was too simple. Let's break it in an inane way to see the counterexample. I suggest flipping the model's initialState over to TUnlocked:
The first push failed. We thought it was unlocked and tried to push, but it was not. Our model did not line up with reality, and Hedgehog demonstrated this with the smallest counterexample.
Perhaps you haven't yet noticed, but this turnstile service isn't the most useful system in the world. There are way more useful concurrency control abstractions in Haskell's standard library! It's just a stand-in effectful program.
Thus far, we've been testing sequentially. We generate a list of actions, and run them one by one.
In practice, most effectful systems are concurrent. I commonly encounter threadsafe local abstractions, as well as external services, like a web application. Even this turnstile has atomicity guarantees that should be tested.
Since we're already generating arbitrary lists of actions, it shouldn't be much of a stretch to run arbitrary subsets of them in parallel, shrinking to find an unsafe interleaving. Indeed, this is the basis of Hedgehog's (as-yet unreleased) parallel state machine facilities!
These are easy to use: we just replace executeSequential with executeParallel in our property. If the actions being executed are not atomic, subsequences being run in parallel should hone in on the mismatch, and should shrink down to find the smallest such interleaving.
You may have noticed that our model and actions were parameterised by some higher-kinded type, (v :: * -> *). Likewise, I promised the complicated types in Command would have some use.
We didn't need it for our simple turnstile, but actions in an effectful system may produce return values that inform future interaction; think things like session IDs, user IDs, or timestamps. These are variable in the eyes of our state machine.
We need to reason about these values symbolically when generating sequences of actions; for example, you might want to say that the new UserId goes into a hash table, though we haven't performed the IO yet.
Once the action has been performed, we have the concrete result, and we can reason about it in more detail. In Ensure postconditions, we have access to the concrete value, and we can use it for assertions.
Hedgehog makes this distinction by parameterising things with a functor, which can be either Symbolic or Concrete. Ephemeral values should be represented in the model as Var UserId v.
Suppose we are interacting with an opaque ticketing system, where each ticket has some serial number. We don't care what the serial is, only that it's unique.
Our model might look something like this:
Our UseTicket command might look something like this:
... and our IssueTicket command will look something like this:
Jacob has ported a classic Erlang QuickCheck example to Hedgehog. You'll find it in the hedgehog-example repo. It uses all the facilities described above.
Tune in next time for a much more comprehensive example using WAI.
This is a really nice way to specify and test effectful systems. It's much more effective than example-testing with canned action sequences.
There's also a lot of potential for test reuse: anything implementing the same interface can use the same property. The in-memory prototype and the production database-backed version of the same service can share the same excellent test coverage.
I'm looking forward to consolidating lots of my glorified unit tests and half-baked properties into these simple machines!
These techniques are not new! Quviq et al have been doing this for a living for many moons, and quite profitably. If you've done serious testing this way, please get in touch, as I'd love to see more examples.
I've omitted several uninteresting implementation details, so the code presented in this post may not compile on its own. The unabridged source is available here.
© 2017-2019 Tim Humphries