PropEr introduction to Property-Based Testing

by Kostis Sagonas

This tutorial is for absolute beginners. We will introduce the idea of property-based testing and describe how one can employ it using PropEr. In doing this, we will also present one particular aspect of input generators that is probably unique to PropEr, namely generation and shrinking of general Erlang terms.

The programming task that this tutorial is based on is really simple. Let us assume that we want to write and test a program that sorts lists. Our first attempt, inspired by quicksort, is the following Erlang code:

-module(my_sort).
-export([sort/1]).

-spec sort([T]) -> [T].
sort([]) -> [];
sort([P|Xs]) ->
    sort([X || X <- Xs, X < P]) ++ [P] ++ sort([X || X <- Xs, P < X]).

To get some confidence that this module is correct, we can of course compile and run some unit tests for this module in the Erlang shell:

1> c(my_sort).
{ok,my_sort}
2> my_sort:sort([17,42]).
[17,42]
3> my_sort:sort([42,17]).    
[17,42]
4> my_sort:sort([3,1,4,2]).
[1,2,3,4]

Seems to be working alright. At this point we probably want to add these as EUnit tests in the module. This involves adding an appropriate include_lib directive below the export declaration:

-include_lib("eunit/include/eunit.hrl").

and also the following EUnit tests:

sort_test_() ->
    [test_zero(), test_two(), test_four()].

test_zero() ->
    ?_assertEqual([], sort([])). % notice underscore 
test_two() ->
    [?_assertEqual([17,42], sort([X,Y])) || {X,Y} <- [{17,42}, {42,17}]].
test_four() ->
    [?_assertEqual([1,2,3,4], sort([3,1,4,2]))].

At this point, we can recompile the module and run all the unit tests with a single command:

5> c(my_sort).
{ok,my_sort}
6> eunit:test(my_sort).
  All 4 tests passed.
ok

Unit testing is an important part of program development. It provides us with some confidence that our programs do what they are supposed to do in some hard-coded, manually selected test cases that we want to ensure are working properly when developing our program and will continue to work properly after program modifications. Note, however, that the process of writing unit tests becomes boring and tedious after a point: it's a manual process after all. More importantly, it's difficult to become really confident that the tests we have written exercise all aspects of program correctness that we care about.

This is where property-based testing comes to play. The idea is to specify only the properties that we want our programs to satisfy, and use a property-based testing tool to automatically generate inputs that test whether our code satisfies these properties or not.

But what exactly are these properties? They are logical statements that capture aspects of partial correctness of our functions/programs. In the case of our example, one property we can reasonably expect our sorting function to satisfy is that for all valid inputs, lists of some elements in this case, the result of applying the sort/1 function to some list gives us back a list whose elements are ordered. Let us write this property using the language of PropEr:

prop_ordered() ->
    ?FORALL(L, list(integer()), ordered(sort(L))).

PropEr uses the convention that all properties are functions whose name begins with prop_. The body of this function here begins with a ?FORALL, which uses a macro defined by PropEr. For the Erlang compiler to find this macro, we need to add an appropriate include_lib directive above the corresponding include_lib directive for EUnit. (We need to put the PropEr include before the EUnit one, because both tools define some macro with the same name, as we will soon see.) The two directives should look as follows:

-include_lib("proper/include/proper.hrl").
-include_lib("eunit/include/eunit.hrl").

The FORALL macro has three arguments. The first two go hand-in-hand: the first is an Erlang term containing variables that will get values which are produced by the generators in the second argument. In this particular case, there is only one variable whose values will be (semi-random) lists of integers. Both list/1 and integer/0 are built-in generators defined in the proper_types module. However, one does not need to specify the module name, as PropEr knows where to find them when used in PropEr macros. The third argument is a Boolean expression that specifies the property we are interested in. In this case the property is self-explanatory: calling the sort/1 function on some list L should return a list which is ordered.

To use this, we also need to define the ordered/1 function. But this is simple enough:

ordered([]) -> true;
ordered([_]) -> true;
ordered([A,B|T]) -> A =< B andalso ordered([B|T]).

We can now test whether this property holds for our program:

7> c(my_sort).
{ok,my_sort}
8> proper:quickcheck(my_sort:prop_ordered()).
....................................................................................................
OK: Passed 100 test(s).
true

Indeed, it holds. PropEr ran 100 tests, which is the default number of tests to run, generating one list of integers at a time, and checked whether the property was true for them.

We can actually test a particular property any number of times we wish:

9> proper:quickcheck(my_sort:prop_ordered(), 4711).
...................... <4711 dots> ..................................................................
OK: Passed 4711 test(s).
true

This is not the only property that we want a list sorting function to satisfy. Another one is that the input and output lists have the same length. Let's write it:

prop_same_length() ->
    ?FORALL(L, list(integer()), length(L) =:= length(sort(L))).

and test it:

10> c(my_sort).
{ok,my_sort}
11> proper:quickcheck(my_sort:prop_same_length()).
..........!
Failed: After 11 test(s).
[0,0,1,0]

Shrinking .(1 time(s))
[0,0]
false

Oops! This property does not hold for our program. PropEr ran some tests, eleven in total in this case, and found the input list [0,0,1,0] for which the property is false. Consequently, PropEr automatically shrank this input list to a list of smaller length, actually of minimal length, that also falsifies the property.

Let's test the property again:

12> proper:quickcheck(my_sort:prop_same_length()).
..................!
Failed: After 19 test(s).
[-8,14,-1,1,-1]

Shrinking ......(6 time(s))
[0,0]
false

Once again, PropEr discovered that the property does not hold. It took a total of nineteen tests this time for the property to be falsified. The input list in this case was [-8,14,-1,1,-1], which was subsequently shrunk down to the [0,0] list once again. In contrast to the previous shrinking process, which was done in one step, shrinking now involved six steps to find a minimal input that can be derived from the original one and still falsify the property.

What's the problem? We can find out by manually testing the sort/1 function in the Erlang shell using these inputs that PropEr discovered for us:

13> my_sort:sort([-8,14,-1,1,-1]).
[-8,-1,1,14]
14> my_sort:sort([0,0]).
[0]

It appears that the sort/1 function we wrote does not work when the list contains duplicate elements. A moment's thought will reveal that we have forgotten to handle the case that one of the elements in Xs is equal to the pivot element P. Consequently, our sort/1 function throws away duplicate elements in the input list. Oh dear!

Of course, we can easily correct the problem by changing one of the '<' tests in the code of the sort/1 function to '=<' and the function will now satisfy the prop_same_length property. However, this is not a tutorial on how to write a correct sorting function in Erlang. Instead, it's a tutorial on how to do property-based testing using PropEr.

Let us suppose that we actually wanted to write a program that only sorts lists without duplicates. How would we write the same-length property then?

One way of writing this property is to impose a precondition for it. That is, only test its validity if this precondition is satisfied. The IMPLIES macro of PropEr allows us to write the property in a way that the property is checked only in this case:

prop_same_length_conditional_check() ->
    ?FORALL(L, list(integer()),
            ?IMPLIES(no_duplicates(L), length(L) =:= length(sort(L)))).

that is, only check the property if the list L contains no duplicates. A possible implementation of function no_duplicates/1 is the following:

%% better implementations of no_duplicates/1 exist ...
no_duplicates([]) -> true;
no_duplicates([A|T]) ->
     not lists:member(A, T) andalso no_duplicates(T).

Let's test this property:

15> c(my_sort).
{ok,my_sort}
16> proper:quickcheck(my_sort:prop_same_length_conditional_check()).
........x.x.......x......xxx..xxxxxx....xxx.x.x..x.xxxxxxx.xxx.x.....x.xxxxx
xx.xxx.x.x.xxxx..xxxxx..x.xx.x..x.xx.xxxxx.xx.....xxx.xxx...xxx..x..xx.xxxx.
xxxx.xx.xx.xxxxx.xxxx.x..x...x.x.xxx..xx..xxxx.x....xx.x.x.xxxxx.x.xx..xx.
OK: Passed 100 test(s).
true

What's happening here? Well, for starters the property is now found to hold when running 100 tests, which enhances our confidence that our sort/1 function now satisfies this property for its intended uses.

But what are the x symbols that are intermixed with the . symbols above? This is PropEr's way of indicating that it generated some list of integers for which the no_duplicates/1 precondition is false. Consequently, the property was not checked for these inputs, and PropEr had to generate some other list(s) in order to reach one hundred randomly generated test cases for which the property passes. In fact, in this case, PropEr had to generate a total of 226 input lists of integers in order to find 100 of them that do not contain duplicates.

This is clearly wasteful. In situations where the precondition is more involved, it may take quite a while before random input generation comes up with a particular number of instances that satisfy the precondition. In such cases, we may be better off generating inputs for which the precondition holds by construction. Let's do this for lists that do not contain duplicate elements. We can do this in a really simple way: we will use PropEr's built-in machinery to generate random lists of integers and we will filter out any duplicate elements ourselves. The LET macro can be used for this:

list_no_dupls(T) ->
    ?LET(L, list(T), remove_duplicates(L)).

%% better versions of remove_duplicates/1 exist ...
remove_duplicates([]) -> [];
remove_duplicates([A|T]) ->
    case lists:member(A, T) of
        true -> remove_duplicates(T);
        false -> [A|remove_duplicates(T)]
    end.

This provides a custom generator (for lists of some type T that contain no duplicates). We can now use this generator for our same-length property:

prop_same_length_no_dupls() ->
    ?FORALL(L, list_no_dupls(integer()), length(L) =:= length(sort(L))).

which we can include in the module and test:

17> c(my_sort).
{ok,my_sort}
18> proper:quickcheck(my_sort:prop_same_length_no_dupls()).
....................................................................................................
OK: Passed 100 test(s).
true

Note that now PropEr does not disgard any input which is provided by the list_no_dupls/1 custom generator.

At this point we have a module that contains four different properties. In EUnit it was possible to run all tests of a module by calling eunit:test(ModuleName). Is there something analogous for PropEr? Indeed there is:

19> proper:module(my_sort).
Testing my_sort:prop_same_length_no_dupls/0
....................................................................................................
OK: Passed 100 test(s).

Testing my_sort:prop_same_length_conditional_check/0
.......x....x...xx.....x...x...x..x...x.xxx.....x..x....x.x.xx.xxxxx...x.xx.x.xxxxxxxx....xxxxxx.x.x
..xx..xx.xxxxx.x..x.xxxxx.x.xxxx..x.xx.x....x.x.x.x.x...x.xx.xx....xxx.x.xxxxxxx.x.xx.x.xxxxxx.xx..x
x..
OK: Passed 100 test(s).

Testing my_sort:prop_ordered/0
....................................................................................................
OK: Passed 100 test(s).

Testing my_sort:prop_same_length/0
.........................!
Failed: After 26 test(s).
[4,7,-5,-9,7,0,23]

Shrinking .......(7 time(s))
[0,0]

[{my_sort,prop_same_length,0}]

PropEr examines all properties, one at a time, and either successfully completes some specific number of tests for which the property is true, or discovers some input that falsifies a property (and shrinks this input to a mimimal one that also falsifies the property). Note that the return value of proper:module/1 is a list of MFAs of properties for which a counterexample is found.

We've come quite far in using property-based testing. Still, if we think about it, the two properties (ordered and same-length) that we specified for our sort function only partially capture its intended behaviour. We probably would like to also check that the output list contains not only the same number of elements, but exactly the same elements as the input list. There are various ways that this property can be specified, but this is the point where we can take a shortcut. Rather than specifying a function that ensures this, we can employ some already defined - and presumably well-tested - function that is known to have the property we are interested in and check that the two functions are equivalent. For our sort/1 function that does not need to handle duplicate elements in the input list, an appropriate such function is the usort/1 function from the lists module of the standard library. Specifying this property is now very easy:

prop_equiv_usort() ->
    ?FORALL(L, list(integer()), sort(L) =:= lists:usort(L)).

So is testing it:

20> c(my_sort).
{ok,my_sort}
21> proper:quickcheck(my_sort:prop_equiv_usort()).
....................................................................................................
OK: Passed 100 test(s).
true

Although this particular property may feel a bit trivial, testing two functions for equivalence using a simple FORALL property like the above is something that comes handy very often. For example, during program development, it allows us to start from a quite inefficient but 'obviously correct' prototype implementation, and gradually refine and optimize it using the prototype implementation as a reference to check whether the code refactorings and optimizations we performed have not somehow jeopardized its correctness.

Let us bring us one last point before we finish this tutorial. As can be seen in its spec, our sort/1 function is polymorphic. In fact, in Erlang it is supposed to be applicable to lists containing all sorts of Erlang terms, not some particular sort only, and certainly not only integers. How can we be sure that indeed it works for all Erlang terms? Shouldn't we have been testing that our properties hold for lists of any term instead of just for lists of integers? Well, let's do that. Let's modify this last property to work for general lists; for this we just need to use the built-in list/0 generator:

prop_equiv_usort() ->
    ?FORALL(L, list(), sort(L) =:= lists:usort(L)).

and test the property again:

22> c(my_sort).
{ok,my_sort}
23> proper:quickcheck(my_sort:prop_equiv_usort()).
....................................................................................................
OK: Passed 100 test(s).
true

Unsurprisingly, the property still passes 100 tests. But how can be sure that PropEr indeed generated lists containing Erlang terms of different sorts instead of some particular one? More importantly, how does shrinking work for such lists? We can see this by using the list/0 generator in the prop_same_length property which is false for our sort/1 function. Let's do that:

prop_same_length() ->
    ?FORALL(L, list(), length(L) =:= length(sort(L))).

and see it fail:

24> c(my_sort).
{ok,my_sort}
25> proper:quickcheck(my_sort:prop_same_length()).
.......................!
Failed: After 24 test(s).
[[],{[<<1:2>>,-3]},[<<6:3>>,-4,'^\227m-\220­',[{},{[]}],-3,1,'\e\214\024'],
[[],23.308667937715537,-7.223776166565898,{},11,5.782189592459757,-8],
2.06372249239632,[-1.2567064751604635,-8.020835435581166,{[]},[],-3,'',{}],
[],<<228,37>>]

Shrinking ....(4 time(s))
[[],[]]
false

So we see that PropEr has no problem in generating lists containing any Erlang term and then shrinking them down to a minimal list that still falsifies the property. This is something that other property-based testing tools for Erlang can not do (or consciously decide not to support).

Summary

In this introductory tutorial you've learned:

Note: You can get the complete final code of this tutorial, without this last modification to the prop_same_length property, by clicking on the following link: my_sort.erl