In this chapter, we take a look at an important technique in functional programming: recursion.
It is best to read this chapter after you have read the chapter on lists (and all preceding chapters, on which it builds).
It also helps if you are familiar with the following imperative features (however, they should be relatively self-explanatory):
ref()
: Consult “Mutation” in the ReasonML docs (until the corresponding chapter in this book is finished).for
loops: Consult “Imperative Loops” in the ReasonML docs (until the corresponding chapter in this book is finished).Recursive function calls are a convenient tool for processing recursive data. Before we get into the function calls, let’s first review how recursive data is built.
As an example, consider the following recursive data type:
type intTree =
| Empty /* atomic */
| Node(int, intTree, intTree); /* compound */
To create an intTree
:
Atomic: We start by creating one or more atomic pieces. In general, there may be several kinds of atomic pieces and each kind may or may not contain data. In the case of intTree
, all atomic pieces are Empty
. For example:
Empty
Compound: We continue by creating compound pieces, by assembling atomic pieces and/or other compound pieces. That is, a compound piece is one that contains one or more other pieces. Therefore, the corresponding part of the type definition recursively mentions the type. For example, the compound Node
refers to intTree
, twice. The following expression creates a tree with a single node that holds the integer 123
:
Node(123, Empty, Empty)
The way we process recursive data is similar to how we build it. Take the following code:
let rec summarizeTree = (t: intTree) =>
switch t {
| Empty => 0 /* base case */
| Node(i, leftTree, rightTree) => /* recursive case */
i + summarizeTree(leftTree) + summarizeTree(rightTree)
};
There are two kinds of cases:
Aside for advanced readers: this example and several upcoming examples are not tail-recursive. We’ll learn later how to change that.
Lists are a recursive data structure. Let’s examine code for lists through the lens of what we have just learned:
let rec len = (myList: list('a)) =>
switch myList {
| [] => 0 /* base case */
| [_, ...tail] => /* recursive case */
1 + len(tail)
};
Once again, we have:
Even when you work with natural numbers, you often encounter a similar pattern:
let rec factorial = (x) =>
if (x <= 0) { /* base case */
1;
} else { /* recursive case */
x * factorial(x - 1);
};
This function hints at the fact that you can define natural numbers via a recursive data type:
type nat = Zero | Succ(nat);
This type says: A natural number nat
is either zero or the successor of a natural number (i.e., one plus that number). The conversion from int
to nat
can also be defined recursively:
let rec natFromInt = (x: int) =>
if (x <= 0) { /* base case */
Zero;
} else { /* recursive case */
Succ(natFromInt(x-1));
};
natFromInt(2);
/* Succ((Succ(Zero))) */
Another core functional technique is to use recursion instead of loops.
repeat()
Consider the following typical example of using a loop in an imperative language: repeating a string str
zero or more times.
let repeat = (~times: int, str: string) => {
let result = ref("");
for (_count in 1 to times) {
result := result^ + str;
};
result^;
};
repeat(~times=3, "abc"); /* "abcabcabc" */
The same functionality can be implemented via recursion like this:
let rec repeat = (~times: int, str: string) =>
if (times <= 0) { /* base case */
"";
} else { /* recursive case */
str ++ repeat(~times = times - 1, str);
};
Once again, there is a base case and a recursive case. If you come from an imperative language, using recursion in this manner initially looks weird, but you eventually get used to it.
summarizeList()
Computing the sum of a list of ints is another typical example of looping in imperative languages:
let summarizeList = (l: list(int)) => {
let result = ref(0);
for (index in 0 to List.length(l)-1) {
let elem = List.nth(l, index);
result := result^ + elem;
};
result^;
};
summarizeList([1, 2, 3]); /* 6 */
Aside: Lists are not very efficient w.r.t. random-access operations such as List.length()
and List.nth()
. However, I only used lists, because I wanted the code to be similar to the more functional implementations. In real-world code, you have the option of using an array.
If, instead, you process the list recursively, you get to use pattern matching, via switch
:
let rec summarizeList = (l: list(int)) =>
switch l {
| [] => 0
| [head, ...tail] =>
head + summarizeList(tail)
};
Read on for another way of implementing summarizeList()
that does not involve recursion, yet is also functional.
One key technique in functional programming are higher-order helper functions for processing data structures. They are an alternative to using recursion. We can, for example, use fold_left()
to re-implement summarizeList()
from the previous section:
let summarizeList = (theList: list(int)) =>
ListLabels.fold_left(
~f = (sum, elem) => sum + elem,
~init = 0,
theList
);
~f
is invoked for each element elem
of theList
:
sum
is the current state (the sum computed thus far).~f
becomes the next state.The final state is returned by fold_left()
as its result. ~init
specifies the initial state.
Other higher-order functions for processing lists are (module ListLabels
):
map: (~f: ('a) => 'b, list('a)) => list('b)
~f
to each element of the list.filter: (~f: ('a) => bool, list('a)) => list('a)
~f
returns true
.for_all: (~f: ('a) => bool, list('a)) => bool
true
if ~f
returns true
for all elements of the list.exists: (~f: ('a) => bool, list('a)) => bool
true
if ~f
returns true
for at least one element of the list.iter: (~f: ('a) => unit, list('a)) => unit
~f
for each element of the list. iter()
only makes sense if ~f
has side effects (output, mutations, etc.).Except for filter()
, all of these functions are also available for arrays.
Sometimes, (naive) recursion is not how one thinks about a problem. Take for example, the following problem: Given a list of ints, return the greatest element. If we only use recursion, we get the following code. Alas, it is neither intuitive nor efficient:
let rec maxList = (l: list(int)) =>
switch (l) {
| [] => min_int
| [head, ...tail] when head > maxList(tail) => head /* A */
| [_, ...tail] => maxList(tail) /* B */
};
maxList([-3, 12, 1]); /* 12 */
In this code, we used two switch
cases at the end (line A, line B); one of which has a when
clause. Instead, we could have used a single case and an if
expression, but the flat code is easier to read.
So how does this code work? For each list, we compute the maximum of the tail and compare it with the head. The first time the recursive call in line A directly returns a value is at the end of the list (when l
is []
). That, in turn, enables the caller to return a value. Then the caller’s caller. Etc.
In contrast, the imperative version of maxList()
is more intuitive. It is also more efficient (apart from the random accesses):
let maxList = (l: list(int)) => {
let max = ref(min_int);
for (index in 0 to List.length(l)-1) {
let elem = List.nth(l, index);
if (elem > max^) {
max := elem;
}
};
max^;
};
Is it possible to write a functional version that is as intuitive? Yes it is, via a technique called accumulator. The following code uses this technique:
let maxList = (theList: list(int)) => {
let rec aux = (~max=min_int, l) =>
switch l {
| [] => max
| [head, ...tail] when head > max =>
aux(~max=head, tail); /* A */
| [_, ...tail] =>
aux(~max, tail);
};
aux(theList); /* B */
};
~max
is the accumulator: a parameter that is continually updated during recursion and returned at the very end. Similar to the first parameter of the callback of fold_left()
, it represents the current state. In line A, you can see that ~max
can be used almost as if it were mutable: It is set to a new value that is used from now on.
One downside of this technique is that you need to manage the additional parameter for the accumulator. There are two ways of doing so.
We have already seen the first way: Creating an internal recursive helper function called aux
or helper
or maxListRec
(etc.). I like to set up the parameters of aux
as follows:
maxList
are passed on to aux
and change there. I do that at the end of maxList
(line B).aux
can just refer to them.Alternatively, you can introduce the accumulator right away as a parameter and give it a default value. Callers will normally ignore it, but it is used by recursive calls. That looks as follows:
let rec maxList = (~max=min_int, l: list(int)) => {
switch l {
| [] => max
| [head, ...tail] when head > max =>
maxList(~max=head, tail);
| [_, ...tail] =>
maxList(~max, tail);
};
};
Given that one prefers recursion to loops in functional programming – won’t performance suffer due to all of those function calls? It turns out that tail calls, a specific kind of function calls, can be highly optimized.
A function call is a tail call if it is the last thing that the surrounding function does. Before we look at an example, let’s examine a function that does not have tail calls:
let rec repeat = (~times: int, str: string) =>
if (times <= 0) { /* base case */
"";
} else { /* recursive case */
str ++ repeat(~times = times - 1, str); /* A */
};
The recursive call in line A is not a tail call: after it returns, its result is appended to str
and returned. That is, the appending is the last thing that repeat()
does. Therefore, when we call repeat()
, we must save two kinds of information:
str
after repeat()
returns.The following call tree visualizes what happens when we make the function call repeat(~times=2, "a")
:
repeat(~times=2, "a")
"a" ++ repeat(~times=1, "a")
"a" ++ repeat(~times=0, "a")
""
Each indentation indicates that our storage needs grow. Parameters, variables and return addresses are usually stored on a stack, which explains why you get a stack overflow if there are too many nested (non-tail) function calls.
How can we make repeat()
more efficient? By introducing an accumulator (~result
):
let repeat = (~times: int, str: string) => {
let rec aux = (~result="", n) =>
if (n <= 0) { /* base case */
result;
} else { /* recursive case */
aux(~result = result ++ str, n-1); /* A */
};
aux(times);
};
Now the function call in line A is a tail call: it is the result of the surrounding function. Because no information needs to be saved for the caller, the call in line A is effectively a jump and does not increase our storage needs.
The function call repeat(~times=2, "a")
is executed as follows:
aux(~result="", 2)
aux(~result="a", 1)
aux(~result="aa", 0)
"aa"
The flatness indicates that the function calls are more like jumps and less like recursive calls.
Terminology:
Accumulators are a common way of introducing tail calls. In the following sections, we make more functions tail-recursive. And each time, we use an accumulator to do so.
summarizeList()
The following implementation of summarizeList()
calls itself recursively in line A and it is not a tail call:
let rec summarizeList = (l: list(int)) =>
switch l {
| [] => 0
| [head, ...tail] =>
head + summarizeList(tail) /* A */
};
If we introduce an accumulator, we get a tail call in line A:
let summarizeList = (theList: list(int)) => {
let aux = (~sum=0, l) =>
switch l {
| [] => sum
| [head, ...tail] =>
summarizeList(~sum=sum+head, tail); /* A */
};
aux(theList);
};
reverse()
The following function is code we haven’t seem before – a recursive implementation of reversing a list l
:
let rec reverse = (l: list('a)) =>
switch l {
| [] => l
| [head, ...tail] =>
List.append(reverse(tail), [head]) /* A */
};
reverse(["a", "b", "c"]);
/* ["c", "b", "a"] */
Line A says: a reversed list is the reversed tail followed by the head (append()
concatenates lists).
What would an iterative version of reverse()
look like? Consider the following intuitive algorithm for reversing a list l = ["a", "b", "c"]
:
The initialization value is the empty list:
let result0 = [];
We start with the head of ["a", "b", "c"]
:
let result1 = ["a", ...result0]; /* ["a"] */
We continue with the head of ["b", "c"]
:
let result2 = ["b", ...result1]; /* ["b", "a"] */
We continue with the head of ["c"]
:
let result3 = ["c", ...result2]; /* ["c", "b", "a"] */
Final result: result3
As it turns out, this algorithm is easy to implement via an accumulator:
let rec reverse = (theList: list('a)) => {
let rec aux = (~result=[], l) =>
switch l {
| [] => result
| [head, ...tail] =>
aux(~result=[head, ...result], tail)
};
aux(theList);
};
When we invoke reverse(["a", "b", "c"])
, the following steps are taken:
aux(~result=[], ["a", "b", "c"])
aux(~result=["a"], ["b", "c"])
aux(~result=["b", "a"], ["c"])
aux(~result=["c", "b", "a"], [])
["c", "b", "a"]
It is interesting to observe how both implementations process data in a different order:
reverse()
works with:
head
reverse(tail)
reverse()
works with:
result
head
In this case, the order in which the accumulator version processes data is a better fit for our intuitive understanding of the algorithm.
append()
If, however, we implement append()
, then a recursive algorithm is simpler:
let rec append = (l1: list('a), l2: list('a)) =>
switch l1 {
| [] => l2
| [head, ...tail] =>
[head, ...append(tail, l2)] /* A */
};
append(["a", "b"], ["c", "d"]);
/* ["a", "b", "c", "d"] */
Line A says: the concatenation of two lists l1
and l2
is the head of l1
followed by the concatenation of the tail of l1
and l2
.
If we think about appending iteratively, we get the following algorithm:
l2
.l1
in front of the previous result.l1
in front of the previous result.That is, we must reverse l1
via List.rev()
(line A), before we can work with it iteratively:
let append = (l1: list('a), l2: list('a)) => {
let rec aux = (~result, l) =>
switch l {
| [] => result
| [head, ...tail] =>
aux(~result=[head, ...result], tail)
};
aux(~result=l2, List.rev(l1)); /* A */
};
contains()
The following function checks whether theList
contains a given value
:
let rec contains = (~value: 'a, theList: list('a)) =>
switch theList {
| [head, ...tail] when head == value =>
true /* A */
| [_, ...tail] => contains(~value, tail)
| [] => false
};
contains(~value=1, [1,2,3]); /* true */
contains(~value=5, [1,2,3]); /* false */
Conveniently, this recursive implementation is also iterative: all recursive calls are tail calls. Because we are doing the recursion ourselves, we can stop as soon as we have found value
(line A).