Recursion • Exploring ReasonML

18 Recursion

In this chapter, we take a look at an important technique in functional programming: recursion.

18.1 Required knowledge

It is best to read this chapter after you have read the chapter on lists (and all preceding chapters, on which it builds).

It also helps if you are familiar with the following imperative features (however, they should be relatively self-explanatory):

Mutable state via ref(): Consult “Mutation” in the ReasonML docs (until the corresponding chapter in this book is finished).
for loops: Consult “Imperative Loops” in the ReasonML docs (until the corresponding chapter in this book is finished).

18.2 Processing data via recursive function calls

Recursive function calls are a convenient tool for processing recursive data. Before we get into the function calls, let’s first review how recursive data is built.

18.2.1 Building recursive data

As an example, consider the following recursive data type:

type intTree =
  | Empty /* atomic */
  | Node(int, intTree, intTree); /* compound */

To create an intTree:

Atomic: We start by creating one or more atomic pieces. In general, there may be several kinds of atomic pieces and each kind may or may not contain data. In the case of intTree, all atomic pieces are Empty. For example:
```
Empty
```
Compound: We continue by creating compound pieces, by assembling atomic pieces and/or other compound pieces. That is, a compound piece is one that contains one or more other pieces. Therefore, the corresponding part of the type definition recursively mentions the type. For example, the compound Node refers to intTree, twice. The following expression creates a tree with a single node that holds the integer 123:
```
Node(123, Empty, Empty)
```

18.2.2 Recursing over recursive data

The way we process recursive data is similar to how we build it. Take the following code:

let rec summarizeTree = (t: intTree) =>
  switch t {
  | Empty => 0 /* base case */
  | Node(i, leftTree, rightTree) => /* recursive case */
    i + summarizeTree(leftTree) + summarizeTree(rightTree)
  };

There are two kinds of cases:

One or more base cases process atomic data. This is where the recursion stops.
One or more recursive cases process compound data. Processing continues via recursive function calls.

Aside for advanced readers: this example and several upcoming examples are not tail-recursive. We’ll learn later how to change that.

18.2.3 Example: lists

Lists are a recursive data structure. Let’s examine code for lists through the lens of what we have just learned:

let rec len = (myList: list('a)) =>
  switch myList {
  | [] => 0 /* base case */
  | [_, ...tail] =>  /* recursive case */
    1 + len(tail)
  };

Once again, we have:

A base case that processes empty lists. Recursion stops at this point.
A recursive case that processes non-empty lists. Recursion continues with the tail of the list.

18.2.4 Example: natural numbers

Even when you work with natural numbers, you often encounter a similar pattern:

let rec factorial = (x) =>
  if (x <= 0) { /* base case */
    1;
  } else { /* recursive case */
    x * factorial(x - 1);
  };

This function hints at the fact that you can define natural numbers via a recursive data type:

type nat = Zero | Succ(nat);

This type says: A natural number nat is either zero or the successor of a natural number (i.e., one plus that number). The conversion from int to nat can also be defined recursively:

let rec natFromInt = (x: int) =>
  if (x <= 0) { /* base case */
    Zero;
  } else { /* recursive case */
    Succ(natFromInt(x-1));
  };

natFromInt(2);
  /* Succ((Succ(Zero))) */

18.3 Looping via recursion

Another core functional technique is to use recursion instead of loops.

18.3.1 Example: `repeat()`

Consider the following typical example of using a loop in an imperative language: repeating a string str zero or more times.

let repeat = (~times: int, str: string) => {
  let result = ref("");
  for (_count in 1 to times) {
    result := result^ + str;
  };
  result^;
};
repeat(~times=3, "abc"); /* "abcabcabc" */

The same functionality can be implemented via recursion like this:

let rec repeat = (~times: int, str: string) =>
  if (times <= 0) { /* base case */
    "";
  } else { /* recursive case */
    str ++ repeat(~times = times - 1, str);
  };

Once again, there is a base case and a recursive case. If you come from an imperative language, using recursion in this manner initially looks weird, but you eventually get used to it.

18.3.2 Example: `summarizeList()`

Computing the sum of a list of ints is another typical example of looping in imperative languages:

let summarizeList = (l: list(int)) => {
  let result = ref(0);
  for (index in 0 to List.length(l)-1) {
    let elem = List.nth(l, index);
    result := result^ + elem;
  };
  result^;
};
summarizeList([1, 2, 3]); /* 6 */

Aside: Lists are not very efficient w.r.t. random-access operations such as List.length() and List.nth(). However, I only used lists, because I wanted the code to be similar to the more functional implementations. In real-world code, you have the option of using an array.

If, instead, you process the list recursively, you get to use pattern matching, via switch:

let rec summarizeList = (l: list(int)) =>
  switch l {
  | [] => 0
  | [head, ...tail] =>
    head + summarizeList(tail)
  };

Read on for another way of implementing summarizeList() that does not involve recursion, yet is also functional.

18.4 Looping via higher-order helper functions

One key technique in functional programming are higher-order helper functions for processing data structures. They are an alternative to using recursion. We can, for example, use fold_left() to re-implement summarizeList() from the previous section:

let summarizeList = (theList: list(int)) =>
  ListLabels.fold_left(
    ~f = (sum, elem) => sum + elem,
    ~init = 0,
    theList
  );

~f is invoked for each element elem of theList:

sum is the current state (the sum computed thus far).
The result of ~f becomes the next state.

The final state is returned by fold_left() as its result. ~init specifies the initial state.

Other higher-order functions for processing lists are (module ListLabels):

map: (~f: ('a) => 'b, list('a)) => list('b)
Produces a new list by applying ~f to each element of the list.
filter: (~f: ('a) => bool, list('a)) => list('a)
Produces a new list that contains only those elements of the input list for which ~f returns true.
for_all: (~f: ('a) => bool, list('a)) => bool
Returns true if ~f returns true for all elements of the list.
exists: (~f: ('a) => bool, list('a)) => bool
Returns true if ~f returns true for at least one element of the list.
iter: (~f: ('a) => unit, list('a)) => unit
Invokes ~f for each element of the list. iter() only makes sense if ~f has side effects (output, mutations, etc.).

Except for filter(), all of these functions are also available for arrays.

18.5 Recursive technique: accumulator

Sometimes, (naive) recursion is not how one thinks about a problem. Take for example, the following problem: Given a list of ints, return the greatest element. If we only use recursion, we get the following code. Alas, it is neither intuitive nor efficient:

let rec maxList = (l: list(int)) =>
  switch (l) {
  | [] => min_int
  | [head, ...tail] when head > maxList(tail) => head /* A */
  | [_, ...tail] => maxList(tail) /* B */
  };
maxList([-3, 12, 1]); /* 12 */

In this code, we used two switch cases at the end (line A, line B); one of which has a when clause. Instead, we could have used a single case and an if expression, but the flat code is easier to read.

So how does this code work? For each list, we compute the maximum of the tail and compare it with the head. The first time the recursive call in line A directly returns a value is at the end of the list (when l is []). That, in turn, enables the caller to return a value. Then the caller’s caller. Etc.

In contrast, the imperative version of maxList() is more intuitive. It is also more efficient (apart from the random accesses):

let maxList = (l: list(int)) => {
  let max = ref(min_int);
  for (index in 0 to List.length(l)-1) {
    let elem = List.nth(l, index);
    if (elem > max^) {
      max := elem;
    }
  };
  max^;
};

Is it possible to write a functional version that is as intuitive? Yes it is, via a technique called accumulator. The following code uses this technique:

let maxList = (theList: list(int)) => {
  let rec aux = (~max=min_int, l) =>
    switch l {
    | [] => max
    | [head, ...tail] when head > max =>
      aux(~max=head, tail); /* A */
    | [_, ...tail] =>
      aux(~max, tail);
    };
  aux(theList); /* B */
};

~max is the accumulator: a parameter that is continually updated during recursion and returned at the very end. Similar to the first parameter of the callback of fold_left(), it represents the current state. In line A, you can see that ~max can be used almost as if it were mutable: It is set to a new value that is used from now on.

One downside of this technique is that you need to manage the additional parameter for the accumulator. There are two ways of doing so.

We have already seen the first way: Creating an internal recursive helper function called aux or helper or maxListRec (etc.). I like to set up the parameters of aux as follows:

The initial value of the accumulator is provided via a parameter default value.
Some parameters of maxList are passed on to aux and change there. I do that at the end of maxList (line B).
Sometimes, outer functions have parameters that don’t change. Then aux can just refer to them.

Alternatively, you can introduce the accumulator right away as a parameter and give it a default value. Callers will normally ignore it, but it is used by recursive calls. That looks as follows:

let rec maxList = (~max=min_int, l: list(int)) => {
  switch l {
  | [] => max
  | [head, ...tail] when head > max =>
    maxList(~max=head, tail);
  | [_, ...tail] =>
    maxList(~max, tail);
  };
};

18.6 Tail call elimination

Given that one prefers recursion to loops in functional programming – won’t performance suffer due to all of those function calls? It turns out that tail calls, a specific kind of function calls, can be highly optimized.

18.6.1 What are tail calls?

A function call is a tail call if it is the last thing that the surrounding function does. Before we look at an example, let’s examine a function that does not have tail calls:

let rec repeat = (~times: int, str: string) =>
  if (times <= 0) { /* base case */
    "";
  } else { /* recursive case */
    str ++ repeat(~times = times - 1, str); /* A */
  };

The recursive call in line A is not a tail call: after it returns, its result is appended to str and returned. That is, the appending is the last thing that repeat() does. Therefore, when we call repeat(), we must save two kinds of information:

The values of parameters and local variables, because we need str after repeat() returns.
The return address, which indicates where to continue computation after the function call is finished.

The following call tree visualizes what happens when we make the function call repeat(~times=2, "a"):

repeat(~times=2, "a")
  "a" ++ repeat(~times=1, "a")
    "a" ++ repeat(~times=0, "a")
      ""

Each indentation indicates that our storage needs grow. Parameters, variables and return addresses are usually stored on a stack, which explains why you get a stack overflow if there are too many nested (non-tail) function calls.

How can we make repeat() more efficient? By introducing an accumulator (~result):

let repeat = (~times: int, str: string) => {
  let rec aux = (~result="", n) =>
    if (n <= 0) { /* base case */
      result;
    } else { /* recursive case */
      aux(~result = result ++ str, n-1); /* A */
    };
  aux(times);
};

Now the function call in line A is a tail call: it is the result of the surrounding function. Because no information needs to be saved for the caller, the call in line A is effectively a jump and does not increase our storage needs.

The function call repeat(~times=2, "a") is executed as follows:

aux(~result="", 2)
aux(~result="a", 1)
aux(~result="aa", 0)
"aa"

The flatness indicates that the function calls are more like jumps and less like recursive calls.

Terminology:

A function whose recursive calls are tail calls is called tail-recursive.
A tail-recursive algorithm is also called iterative. Iterative algorithms can also be implemented via loops.

Accumulators are a common way of introducing tail calls. In the following sections, we make more functions tail-recursive. And each time, we use an accumulator to do so.

18.6.2 Example: `summarizeList()`

The following implementation of summarizeList() calls itself recursively in line A and it is not a tail call:

let rec summarizeList = (l: list(int)) =>
  switch l {
  | [] => 0
  | [head, ...tail] =>
    head + summarizeList(tail) /* A */
  };

If we introduce an accumulator, we get a tail call in line A:

let summarizeList = (theList: list(int)) => {
  let aux = (~sum=0, l) =>
    switch l {
    | [] => sum
    | [head, ...tail] =>
      summarizeList(~sum=sum+head, tail); /* A */
    };
  aux(theList);
};

18.6.3 Example: `reverse()`

The following function is code we haven’t seem before – a recursive implementation of reversing a list l:

let rec reverse = (l: list('a)) =>
  switch l {
  | [] => l
  | [head, ...tail] =>
    List.append(reverse(tail), [head]) /* A */
  };
reverse(["a", "b", "c"]);
  /* ["c", "b", "a"] */

Line A says: a reversed list is the reversed tail followed by the head (append() concatenates lists).

What would an iterative version of reverse() look like? Consider the following intuitive algorithm for reversing a list l = ["a", "b", "c"]:

The initialization value is the empty list:
```
let result0 = [];
```

We start with the head of ["a", "b", "c"]:

let result1 = ["a", ...result0]; /* ["a"] */

We continue with the head of ["b", "c"]:

let result2 = ["b", ...result1]; /* ["b", "a"] */

We continue with the head of ["c"]:

let result3 = ["c", ...result2]; /* ["c", "b", "a"] */

Final result: result3

As it turns out, this algorithm is easy to implement via an accumulator:

let rec reverse = (theList: list('a)) => {
  let rec aux = (~result=[], l) =>
    switch l {
    | [] => result
    | [head, ...tail] =>
      aux(~result=[head, ...result], tail)
    };
  aux(theList);
};

When we invoke reverse(["a", "b", "c"]), the following steps are taken:

aux(~result=[], ["a", "b", "c"])
aux(~result=["a"], ["b", "c"])
aux(~result=["b", "a"], ["c"])
aux(~result=["c", "b", "a"], [])
["c", "b", "a"]

It is interesting to observe how both implementations process data in a different order:

The recursive reverse() works with:
- Current data: head
- Future data: reverse(tail)
The iterative reverse() works with:
- Past data: result
- Current data: head

In this case, the order in which the accumulator version processes data is a better fit for our intuitive understanding of the algorithm.

18.6.4 Example: `append()`

If, however, we implement append(), then a recursive algorithm is simpler:

let rec append = (l1: list('a), l2: list('a)) =>
  switch l1 {
  | [] => l2
  | [head, ...tail] =>
    [head, ...append(tail, l2)] /* A */
  };
append(["a", "b"], ["c", "d"]);
  /* ["a", "b", "c", "d"] */

Line A says: the concatenation of two lists l1 and l2 is the head of l1 followed by the concatenation of the tail of l1 and l2.

If we think about appending iteratively, we get the following algorithm:

Start with l2.
Prepend the last element of l1 in front of the previous result.
Prepend the second last element of l1 in front of the previous result.
Etc.

That is, we must reverse l1 via List.rev() (line A), before we can work with it iteratively:

let append = (l1: list('a), l2: list('a)) => {
  let rec aux = (~result, l) =>
    switch l {
    | [] => result
    | [head, ...tail] =>
      aux(~result=[head, ...result], tail)
    };
  aux(~result=l2, List.rev(l1)); /* A */
};

18.6.5 Example: `contains()`

The following function checks whether theList contains a given value:

let rec contains = (~value: 'a, theList: list('a)) =>
  switch theList {
  | [head, ...tail] when head == value =>
    true /* A */
  | [_, ...tail] => contains(~value, tail)
  | [] => false
  };
contains(~value=1, [1,2,3]); /* true */
contains(~value=5, [1,2,3]); /* false */

Conveniently, this recursive implementation is also iterative: all recursive calls are tail calls. Because we are doing the recursion ourselves, we can stop as soon as we have found value (line A).