Syntactic extensions

Course links

Patterns and templates

We have now studied almost all of the kinds of expressions that the Revised5 report on the algorithmic language Scheme describes: literal constants, variables, procedure calls, if-expressions, lambda-expressions, cond-expressions, and-expressions, or-expressions, let-expressions, let*-expressions, letrec-expressions, named let-expressions, begin-expressions, and do-expressions. Even with this rich variety, however, there are some cases in which Scheme's linguistic structures themselves seem cumbersome or ill-adapted to a particular work to which we want to put them.

Scheme provides a mechanism for overcoming its own expressive limitations: It allows the programmer to define new kinds of expressions, with their own keywords, provided that she can describe how to transform expressions of these new kinds into semantically equivalent expressions that have more familiar structures.

She describes this transformation process by writing an expression of a kind we haven't seen before: a syntax-rules expression, which consists of one or more clauses, each of which contains a pattern and a template. Each pattern describes one possible form that expressions using the new keyword can have, and the corresponding template tells how to build an equivalent expression. When a syntax-rules transformer is applied to an expression, it checks the patterns one by one against the structure of the expression; as soon as it finds one that matches, it fills in the template and returns the resulting expression.

An example: provided-expressions

To fill in some of the details of how the process of matching a pattern and filling in a template works, it will be helpful to look at a particular example.

One expression structure that comes up a lot in Scheme programs, but is a little cumbersome in practice, occurs when the programmer calls a procedure like assoc or (binary-search <=), which conducts some kind of a search and usually comes up with a useful value when the search succeeds -- a pair in the case of assoc, or a position number in the case of (binary-search <=) -- but sometimes returns #f to indicate that the search has failed. Often the programmer wants to carry out some operation involving the result of the search if it is successful, but to simply skip that operation and return #f if the search fails. For example, suppose that the programmer wants to look for the symbol [ISBN] as a key in an association list called book-facts, and output the value that is associated with that key if there is one. Perhaps the most natural way to do this in Scheme is to write something like

(if (assq 'ISBN book-facts)
    (display (cdr (assq 'ISBN book-facts)))
    #f)

But this is clumsy and inefficient, because if the search is successful, it has to be performed all over again to get the value. So one would probably write a let-expression:

(let ((search-result (assq 'ISBN book-facts)))
  (if search-result
      (display (cdr search-result))
      #f))

It is possible to shrink this down a little bit by using an and-expression instead of an if-expression:

(let ((search-result (assq 'ISBN book-facts)))
  (and search-result (display (cdr search-result))))

Even this, however, seems forced and unnatural. What one would really like is a kind of expression that binds an identifier to the value of an expression conditionally, returning #f if the expression that is supposed to supply the value fails, but otherwise going ahead and evaluating the body of the expression. One might call such an expression a provided-expression and lay it out like this:

(provided search-result (assq 'ISBN book-facts)
  (display (cdr search-result)))

In other words: One wants a provided-expression to take three subexpressions, of which the first is an identifier, the second is an expression that supplies a value to that identifier, and the third is the body. In a provided-expression, one first evaluates the second subexpression. If the result is #f, that becomes the value of the entire provided-expression. Otherwise, the result is bound to the identifier, the body is evaluated, and the value of the body is the value of the entire provided-expression.

No provided-expression is built into Scheme, but we can write a transformer that shows how to convert a provided-expression into the equivalent if-expression. The transformer needs only one clause, since there is only one pattern that a provided-expression can match. Here's how one might write that pattern inside a syntax-rules clause:

(_ boundvar expr body)

In order to match a pattern, a transformer must find a kind of structural correspondence between the components of the given expression and the components of the pattern. Just as this pattern, considered as a datum, is a list with exactly four elements, so the expression that it matches, considered as a datum, must be a list with exactly four elements. In the example of a particular provided-expression given above, here's how the elements of the pattern match up with the elements of the provided-expression:

       _  <===>  provided
boundvar  <===>  search-result
    expr  <===>  (assq 'ISBN book-facts)
    body  <===>  (display (cdr search-result))

In other words: each identifier that occurs in the pattern is thought of as corresponding to some subexpression of the particular provided-expression that it matches.

Here is the template that we want the transformer to use in building a semantic equivalent for the provided-expression:

(let ((boundvar expr))
  (if boundvar body #f))

Note that the template uses some of the same identifiers that appear in the pattern. In the template, these identifiers mark slots into which the corresponding subexpressions are to be fitted. In this case, the template says to build a let-expression with one binding specification in its binding list and one if-expression as its body, and it specifies that the boundvar and expr pieces of the provided-expression are to be fitted into the binding specification and the boundvar and expr pieces into the if-expression as its test and consequent, leaving the literal #f as the alternate of the if-expression.

In other words, the template implicitly says to use the elements discovered by the pattern match to build up the following expression:

(let ((search-result (assq 'ISBN book-facts)))
  (if search-result (display (cdr search-result)) #f))

which is exactly what we want our sample provided-expression to mean.

In order to write the syntax-rules expression that describes this transformer, we first need to assemble the pattern and the template into a clause; we do this just by enclosing them in a pair of structural parentheses.

((_ boundvar expr body)
 (let ((boundvar expr))
   (if boundvar body #f)))

A syntax-rules-expression, then, begins with the keyword syntax-rules, followed by a (possibly empty) list of internal keywords and one or more pattern-plus-template clauses. An internal keyword is a keyword that appears inside another expression and serves only as a syntactic marker, having no denotation or value of its own; for example, else is an internal keyword in cond-expressions.

Our provided-expressions don't have or need any internal keywords, so the syntax-rules-expression that we write to describe the transformer will have an empty list immediately after syntax-rules:

(syntax-rules ()
  ((_ boundvar expr body)
   (let ((boundvar expr))
     (if boundvar body #f))))

Now that we have this transformer, we can use it, in effect, to extend the Scheme programming language so that it includes provided-expressions. Normally, the programmer wants such an extension to be global in effect and so introduces it by means of a syntax definition, in which the new keyword is, in effect, bound to the transformer:

(define-syntax provided
  (syntax-rules ()
    ((_ boundvar expr body)
     (let ((boundvar expr))
       (if boundvar body #f)))))

The transformer will be applied automatically to any provided-expressions that appear farther on in the program, before those expressions are evaluated:

> (define book-facts
  '((author . "Dybvig, R. Kent")
    (title . "The Scheme programming language")
    (place-of-publication . "Cambridge, Massachusetts")
    (publisher . "The MIT Press")
    (date-of-publication . "2003")
    (edition . 3)
    (ISBN . "0-262-54148-3")))
> (provided search-result (assq 'ISBN book-facts)
    (display (cdr search-result)))
0-262-54148-3
> (provided search-result (assq 'price book-facts)
    (display (cdr search-result)))
#f

Local syntax bindings

It is also possible to bind a keyword to a transformer locally rather than globally, using let-syntax-expressions and letrec-syntax-expressions, which are similar to let- and letrec-expressions, except they bind identifiers to transformers rather than to values:

> (let-syntax ((local-provided (syntax-rules ()
                                 ((_ boundvar expr body)
                                  (let ((boundvar expr))
                                    (if boundvar body #f))))))
    (local-provided search-result (assq 'ISBN book-facts)
      (display (cdr search-result))))
0-262-54148-3

A letrec-syntax-expression should be used when one or more of the templates in a transformer includes the new keyword that is being defined.

Ellipsis patterns

Many of the patterns for which one would like to write transformers include subexpressions that can be repeated indefinitely. Ellipsis patterns make it possible for patterns to incorporate such repeated subexpressions.

For example, one of the small annoyances of Scheme's syntax is that, if you want the consequent of an if-expression to be a sequence of two or more expressions, you must enclose the sequence in a begin-expression]:

(if (eof-object? next-char)
    (begin
      (close-input-port source)
      (close-output-port target)))

The rationale for this requirement is that, without the enclosing begin, it would be impossible for Scheme to figure out where the consequent of an if-expression ends and the alternate begins. But the requirement is still imposed even in cases like this one, where the if-expression is one-armed, that is, contains no alternate.

So it would be nice if Scheme provided a different expression -- a when-expression, perhaps -- that would have one subexpression as a test and an arbitrary number of subsequent subexpressions that are evaluated if, but only if, the value of the test expression is ``truish'' (that is, not #f). Then we could write, clearly and concisely,

(when (eof-object? next-char)
  (close-input-port source)
  (close-output-port source))

Standard Scheme does not provide when-expressions but does allow the programmer to define an appropriate transformer. The pattern for the transformer is

(_ test action follow-up ...)

Here the pattern variable test matches the first subexpression after the keyword when and the pattern variable action matches the subexpression after that. Then, because the pattern variable follow-up is followed by the ellipsis, ..., the pattern matcher treats it as matching any number of further subexpressions (including zero). In general, when ... immediately follows some structure in a pattern, it means zero or more repetitions of that structure.

The ellipsis is then also used in positions in the template where one wants a corresponding occurrence of zero or more subexpressions derived from the pattern. In the transformer for when, the template is

(if test (begin action follow-up ...))

and the ellipsis indicates that follow-up is the slot for zero or more subexpressions matched by the corresponding part of the pattern.

Here's the full syntax definition for when:

(define-syntax when
  (syntax-rules ()
    ((_ test action follow-up ...)
     (if test (begin action follow-up ...)))))

Primitive and derived expression types

Early in the semester, we saw that some of Scheme's built-in procedures (such as reverse and list-ref) could if necessary be defined in terms of more basic procedures, so that they didn't really have to be built in.

(define reverse
  (lambda (ls)
    (let kernel ((so-far '())
                 (rest ls))
      (if (null? rest)
          so-far
          (kernel (cons (car rest) so-far) (cdr rest))))))
(define list-ref
  (lambda (ls position)
    (if (zero? position)
        (car ls)
        (list-ref (cdr ls) (- position 1)))))

A similar observation applies to Scheme's built-in expression types: Once we have the apparatus for defining syntactic extensions, some of the built-ins can be defined in terms of others. For example, the let-expression can be defined using procedure calls and lambda-expressions (for the unnamed version) and letrec-expressions (for the named let):

(define-syntax let
  (syntax-rules ()
    ((_ ((var expr) ...) action follow-up ...)
     ((lambda (var ...) action follow-up ...) expr ...))
    ((_ proc ((var expr) ...) action follow-up ...)
     (letrec ((proc (lambda (var ...) action follow-up ...)))
       (proc expr ...)))))

Notice that there are two clauses in the transformer described here, each with its own pattern and its own template. The first clause matches ordinary let-expressions and the second matches named let-expressions. The part of the each pattern that reads ``((var expr) ...)'' means that a binding list can contain any number of binding specifications, each consisting of one var and one expr.

For instance, the first clause transforms the let-expression

(let ((alpha 3) (beta 6))
  (+ alpha beta))

into a call to an anonymous procedure:

((lambda (alpha beta) (+ alpha beta)) 3 6)

The second clause transforms the named let-expression

(let kernel ((so-far 0)
             (rest ls))
  (if (null? rest)
      so-far
      (kernel (+ so-far 1) (cdr rest))))

into

(letrec ((kernel (lambda (so-far rest)
                   (if (null? rest)
                       so-far
                       (kernel (+ so-far 1) (cdr rest))))))
  (kernel 0 ls))