# Indefinite recursion

In most of the examples of recursion that we have seen so far, the shape of the recursive computation has been guided and controlled either by the shape of the data structure on which it operates (flat list recursion, deep list recursion, pair recursion) or by a natural-number counter that steps down to zero or up to some fixed limit.

In an indefinite recursion, the computation, in a sense, determines its own shape: There is still a base case, but it may be extremely difficult to determine when that base case will be reached, or even whether it will be reached at all.

For instance, here is a procedure that finds the prime factors of a given positive integer:

```;;; prime-factors: construct and return a list of the
;;; prime factors of a given positive integer

;;; Given:
;;;   NUMBER, an exact integer.

;;; Result:
;;;   FACTORS, a list of exact integers.

;;; Precondition:
;;;   NUMBER is positive.

;;; Postconditions:
;;;   (1) Every element of FACTORS is a prime natural number.
;;;   (2) The product of the elements of FACTORS is NUMBER.

(define prime-factors
(lambda (number)
(if (not (and (integer? number)
(exact? number)
(positive? number)))
(error "prime-factors: requires an exact positive integer")
(prime-factors-kernel number 2))))

;;; prime-factors-kernel: construct and return a list of the
;;; prime factors of a given positive integer that are greater
;;; than or equal to another given positive integer

;;; Given:
;;;   NUMBER and DIVISOR, both exact integers.

;;; Result:
;;;   FACTORS, a list of exact integers.

;;; Preconditions:
;;;   (1) NUMBER is positive.
;;;   (2) DIVISOR is greater than or equal to 2.
;;;   (3) NUMBER is not divisible by any positive integer
;;;       greater than or equal to 2 but less than DIVISOR.

;;; Postconditions:
;;;   (1) Every element of FACTORS is a prime natural number
;;;       greater than or equal to DIVISOR.
;;;   (2) The product of the elements of FACTORS is NUMBER.

(define prime-factors-kernel
(lambda (number divisor)
(cond ((= number 1) '())
((zero? (remainder number divisor))
(cons divisor
(prime-factors-kernel (quotient number divisor)
divisor)))
(else (prime-factors-kernel number (+ divisor 1))))))
```

For instance, here is a summary of the steps in the evaluation of `(prime-factors 60)`:

```(prime-factors 60) -->
(prime-factors-kernel 60 2) -->
(cons 2 (prime-factors-kernel 30 2)) -->
(cons 2 (cons 2 (prime-factors-kernel 15 2))) -->
(cons 2 (cons 2 (prime-factors-kernel 15 3))) -->
(cons 2 (cons 2 (cons 3 (prime-factors-kernel 5 3)))) -->
(cons 2 (cons 2 (cons 3 (prime-factors-kernel 5 4)))) -->
(cons 2 (cons 2 (cons 3 (prime-factors-kernel 5 5)))) -->
(cons 2 (cons 2 (cons 3 (cons 5 (prime-factors-kernel 1 5))))) -->
(cons 2 (cons 2 (cons 3 (cons 5 '()))))
```

It happens to be the case that the `prime-factors` procedure terminates and returns a value eventually, no matter what positive integer you give it, but this is not exactly obvious. I'm sure some of you are interested in knowing why `prime-factors` terminates, so I'll give you a link to the proof; the rest of you may, if you like, take my word for it.

Even after you study the proof of termination, however, it is not easy to determine in advance how much computation the procedure will do before the base case is reached and the answer can actually be constructed and returned. The number of recursive calls depends less on the magnitude of the number than on what its factors are. For instance, finding the prime factors of 1007 requires fifty-three calls to `prime-factors-kernel`; finding the prime factors of 1008 requires only thirteen; finding the prime factors of 1009 requires more than a thousand. There is a pattern, but it can't be anticipated: To know how much computation will be required, you actually have to perform the computation and identify the prime factors.

Here is an even subtler example. The following procedure takes any positive integer as an argument and returns a list of positive integers -- if it returns at all ...

```;;; Collatz-sequence: construct and return a Collatz
;;; sequence beginning with a given positive integer
;;; and ending with 1

;;; Given:
;;;   NUMBER, an exact integer.

;;; Result:
;;;   SEQUENCE, a list of exact integers.

;;; Precondition:
;;;   NUMBER is positive.

;;; Postcondition:
;;;   If this procedure terminates:
;;;     (1) SEQUENCE is not empty.
;;;     (2) The last element of SEQUENCE is 1.
;;;     (3) Each even element of SEQUENCE is followed
;;;         immediately by its half.
;;;     (4) Each odd element of SEQUENCE, except 1, is
;;;         followed immediately by the successor of its
;;;         triple.

(define Collatz-sequence
(lambda (number)
(cons number
(cond ((= number 1) '())
((even? number)
(Collatz-sequence (quotient number 2)))
(else
(Collatz-sequence (+ (* 3 number) 1)))))))
```

In other words: The Collatz sequence of `number` begins with `number`. If `number` is 1, the sequence ends there. Otherwise, if `number` is even, its sequence continues with the Collatz sequence for half of `number`; if `number` is odd, its sequence continues with the Collatz sequence for three times `number` plus one.

The Collatz sequences for many positive integers are known, and all those that are known eventually reach 1 and terminate. However, it is not known whether every Collatz sequence includes 1 -- it is possible that there is a number whose Collatz sequence goes on forever. Applying `Collatz-sequence` to such a number would cause a runaway recursion.

Worse yet, no general rule is known for determining the length of a number's Collatz sequence, or even putting an upper bound on it, so if we apply `Collatz-sequence` to a particular number and then wait and wait and wait for the result to appear, in general we won't know whether we have discovered an infinite Collatz sequence or just one that takes a long time for DrScheme to assemble. So we can't tell whether we have a runaway recursion or not!

Because some indefinite recursions have such inconvenient computational properties, it's good practice to regard them with suspicion and to avoid them whenever some more conventional form of recursion is a reasonable alternative. In some cases, you'll have to be a skillful mathematician to prove that a given indefinite recursion terminates -- whereas if you use list recursion or recursion with natural numbers, the structure of the list or the definition of `natural number' guarantees that the recursion eventually reaches its base case and gives you some idea of how long the process takes.

## Tail-call elimination

In previous labs, we've seen several examples illustrating the idea of separating the recursive kernel of a procedure from a husk that performs the initial call. Sometimes we've done this in order to avoid redundant precondition tests, or to prevent the user from bypassing the precondition tests. In other cases, we saw that the recursion can be written more naturally if the recursive procedure has an additional argument, not supplied by the original caller.

There is yet another reason for adopting the husk-and-kernel approach, and it has to do with efficiency. An implementation of Scheme is required to perform tail-call elimination -- to implement procedure calls in such a way that, if the last step in procedure A is a call to procedure B (so that A will simply return to its caller whatever value is returned by B), the memory resources supporting the call to A can be freed and recycled as soon as the call to B has been started. To make this possible, the implementer arranges for B to return its value directly to A's caller, bypassing A entirely. In particular, this technique is required to work when A and B are the same procedure, invoking itself recursively (in which case the recursion is called tail recursion), and even if there are a number of recursive calls, each of which will return to its predecessor the value returned by its successor. In the implementation, each of the intermediate calls vanishes as soon as its successor is launched.

However, this clever technique, which speeds up procedure calling and sometimes enables Scheme to use memory very efficiently, is guaranteed to work only if the procedure call is the last step. For instance, tail-call elimination cannot be used in the `sum` procedure as we defined it in an earlier reading:

```;;; sum: find the sum of the numbers in a given list

;;; Given:
;;;   LS, a list of exact numbers.

;;; Result:
;;;   TOTAL, an exact number.

;;; Preconditions:
;;;   None.

;;; Postconditions:
;;;   TOTAL is the sum of all of the elements of LS,
;;;   and is 0 if LS has no elements.

(define sum
(lambda (ls)
(if (null? ls)
0
(+ (car ls) (sum (cdr ls))))))
```

The recursive call in this case is not a tail call, since, after it returns its value, the first number on the list still has to be added to that value.

However, it is possible to write a tail-recursive version of `sum`:

```;;; sum: find the sum of the numbers in a given list

;;; Given:
;;;   LS, a list of exact numbers.

;;; Result:
;;;   TOTAL, an exact number.

;;; Preconditions:
;;;   None.

;;; Postconditions:
;;;   TOTAL is the sum of all of the elements of LS,
;;;   and is 0 if LS has no elements.

(define sum
(lambda (ls)
(sum-kernel ls 0)))

;;; sum-kernel: add the elements of a given list to a
;;; given running total

;;; Givens:
;;;   LS, a list of exact numbers.
;;;   RUNNING-TOTAL, an exact number.

;;; Result:
;;;   FINAL-TOTAL, an exact number.

;;; Preconditions:
;;;   None.

;;; Postconditions:
;;;   FINAL-TOTAL is the result of adding all of the
;;;   elements of LS to RUNNING-TOTAL.  (If LS has no
;;;   elements, FINAL-TOTAL is the same as RUNNING-TOTAL.)

(define sum-kernel
(lambda (ls running-total)
(if (null? ls)
running-total
(sum-kernel (cdr ls) (+ (car ls) running-total)))))
```

The idea is to provide, in each recursive call, a second argument, giving the sum of all the list elements that have been encountered so far: the running total of the previously encountered elements. When the end of the list is reached, the value of this running total is returned; until then, each recursive call strips one element from the beginning of the list, adds it to the running total, and finally calls itself recursively with the shortened list and the augmented running total. The ``finally'' part is important: `sum-kernel` is tail-recursive.

Here is a summary of the execution of a call to this version of `sum`:

```(sum '(97 85 34 73 10)) -->
(sum-kernel '(97 85 34 73 10) 0) -->
(sum-kernel '(85 34 73 10) 97) -->
(sum-kernel '(34 73 10) 182) -->
(sum-kernel '(73 10) 216) -->
(sum-kernel '(10) 289) -->
(sum-kernel '() 299) -->
299
```

Note that the additions are performed on the way into the successive calls to `sum-kernel`, so that when the base case is reached no further calculation is needed -- the value of the second argument in that last call to `sum-kernel` is returned without further modification as the value of the original call to `sum`.

As another example, let's consider the following exercise:

Develop a procedure `index` that has two arguments, an item `a` and a list of items `ls`, and returns the index of `a` in `ls`, that is, the zero-based location of `a` in `ls`. If the item is not in the list, the procedure returns -1. Here are some sample calls to get you started:

```> (index 'so '(do re mi fa so la ti do)
4
> (index 3 '(1 2 3 4 5 6))
2
> (index 'a '(b c d e))
-1
> (index 'cat '())
-1
```

The idea is to work down through the elements of the list, keeping track of how many of them we pass. If we reach the end of the list, we return -1; if we encounter the item that we are looking for, we return the number of bypassed items; otherwise, we continue down the list, adding 1 to the number of bypassed items.

In order to use tail calls, we need to keep track of the number of bypassed items in a third parameter to the recursive kernel procedure. The husk procedure `index` fills in the correct initial value for this third parameter -- 0, since initially we haven't bypassed any items -- and calls the kernel:

```;;; index -- return the position of a given item in a
;;; given list

;;; Given:
;;;   A, a value.
;;;   LS, a list.

;;; Result:
;;;   POSITION, an exact integer.

;;; Preconditions:
;;;   None.

;;; Postcondition:
;;;   If A is an element of LS, POSITION is the least
;;;   position in LS that is occupied by A (using
;;;   zero-based indexing); otherwise, POSITION is -1.

(define index
(lambda (a ls)
(index-kernel a ls 0)))
```

In the kernel procedure, we first test for the base cases -- the situations in which the answer can be returned immediately. If the list that we're given is empty, it's pointless to look for the item in it, so we return -1 at once. Otherwise, we look at the first element of the list; if it's the one we want, its position number is equal to the number of bypassed elements, which we return at once.

Recursion comes into play only if neither of these conditions is met. We invoke the kernel procedure recursively to find the same item in the cdr of the current list, with the extra parameter increased by 1 to indicate that we're now bypassing the car of the current list.

Here's the resulting code:

```;;; index-kernel -- return the position of a given item
;;; relative to a list from which a given number of elements
;;; have been stripped

;;; Given:
;;;   SOUGHT, a value.
;;;   LS, a list.
;;;   BYPASSED, an exact integer.

;;; Result:
;;;   POSITION, an exact integer.

;;; Preconditions:
;;;   BYPASSED is non-negative.

;;; Postcondition:
;;;   If SOUGHT is an element of LS, POSITION is the sum of
;;;   BYPASSED and least position in LS that is occupied by
;;;   SOUGHT (using zero-based indexing); otherwise, POSITION
;;;   is -1.

(define index-kernel
(lambda (sought ls bypassed)
(cond ((null? ls) -1)
((equal? (car ls) sought) bypassed)
(else
(index-kernel sought (cdr ls) (+ bypassed 1))))))
```