Metaprogramming

Course links

Programs as data

Many real-world programs make very extensive use of records. After writing out a few sets of definitions to implement record types, however, Scheme programmers generally make an important discovery: Typing out these definitions is boring. They have a very predictable form -- only the name of the type and the number, names, and types of the fields change from one kind of record to another -- and yet one has to sit down and type out a dozen or so procedures each time one wants to use a new kind of record. The process soon becomes tedious and error-prone.

The solution to this problem is metaprogramming: the creation of procedures and programs that automatically construct the definitions of other procedures and programs. Metaprogramming automates some of the tedious and error-prone parts of the programmer's job. Scheme is particularly well suited to metaprogramming because of the fact that definitions and commands in Scheme have the same form as data: They are basically pair structures in which the leaves include such symbols as lambda, if, and cons, together with literal constants of various sorts.

For instance, if we wanted to build a datum that would look just like the Scheme definition

(define square
  (lambda (n)
    (* n n)))

we could do it simply by using the list procedure to collect the right symbols in the right ways:

> (list 'define 'square (list 'lambda (list 'n) (list '* 'n 'n)))
(define square (lambda (n) (* n n)))

It's not much more difficult to metaprogram a procedure that will take a symbol that is the name of a record type and a list of symbols that are the names of its fields, and returns a datum that looks just like a constructor procedure:

;;; constructor-maker: build a datum that looks like the definition of a
;;; constructor procedure for a given record type with given fields

;; Givens:
;;   RECORD-NAME, a symbol.
;;   Some number of symbols, collectively called FIELD-NAMES.

;; Result:
;;   DEFINITION, a list.

;; Preconditions:
;;   (1) RECORD-NAME and all of FIELD-NAMES are valid Scheme identifiers.
;;   (2) All of FIELD-NAMES are different symbols.

;; Postconditions:
;;   DEFINITION is an S-expression which, if evaluated, defines a procedure
;;   that takes as many arguments as there are FIELD-NAMES and returns a
;;   record of type RECORD-NAME, with the argument values as the fields
;;   of the record.

(define constructor-maker
  (lambda (record-name . field-names)
    (let ((record-string (symbol->string record-name))
          (vector-size (+ (length field-names) 1)))
      (let ((constructor-name
             (string->symbol (string-append "make-" record-string)))
            (type-marker-name
             (string->symbol (string-append "produce-" record-string "-mark")))
            (mutator-name
             (lambda (field)
               (string->symbol (string-append record-string
                                              "-"
                                              (symbol->string field)
                                              "-set!")))))
        (let ((mutators (let kernel ((rest field-names))
                          (if (null? rest)
                              '()
                              (cons (list (mutator-name (car rest))
                                          'result
                                          (car rest))
                                    (kernel (cdr rest)))))))
          (list 'define
                constructor-name
                (list 'lambda
                      field-names
                      (cons 'let
                            (cons (list (list 'result
                                              (list 'make-vector vector-size)))
                                  (cons (list 'vector-set!
                                              'result
                                              0
                                              (list type-marker-name))
                                        (append mutators
                                                (list 'result))))))))))))

Here's what happens in a typical invocation of it:

> (constructor-maker 'compound 'name 'formula 'molecular-weight 'melting-point 'boiling-point 'color)
(define make-compound
  (lambda (name formula molecular-weight melting-point boiling-point color)
    (let ((result (make-vector 7)))
      (vector-set! result 0 (produce-compound-mark))
      (compound-name-set! result name)
      (compound-formula-set! result formula)
      (compound-molecular-weight-set! result molecular-weight)
      (compound-melting-point-set! result melting-point)
      (compound-boiling-point-set! result boiling-point)
      (compound-color-set! result color) result)))

If we copy this datum-that-looks-like-a-definition into the Definitions window and subsequently press the Execute button, or write it out to a file and then pull it back in with the load procedure, neither DrScheme nor any human reader can tell that it is ``only a datum''! It is, in fact, indistinguishable from a definition that a human programmer might have written into the Definitions window directly.

We can similarly metaprogram all the other components of the implementation of a record type. We can even collect them into a procedure -- let's call it generate-record-definition-file -- that writes the entire collection of procedure definitions (a constructor, a selector for each field, a mutator for each field, a type predicate, an equality test for values of the new type, and a copying procedure) into an appropriately named file. If we can write this metaprogram once, we won't have to write our own definitions for record types ever again.

The first argument to generate-record-definition-file can again be simply the symbol denoting the new record type. We can derive the name of the file in which the procedure definitions will be stored directly from this symbol. Instead of supplying mere field names as the remaining arguments, we'll need to provide a specification for each field: a list of five symbols, of which the first is the name of the field, the second is the name of a ``precondition predicate'' that values of the field are supposed to meet, the third is the name of an equality predicate for values of that field, the fourth is the name of a procedure that can be used to make a copy of the field (or the symbol identity, if no such procedure is necessary), and the fifth is a procedure that outputs a value that could be stored in the field, in a human-readable format.

To get a file containing the procedure definitions appropriate to the record type for chemical compounds, for instance, one could write

> (generate-record-definition-file 'compound
  '(name string? string-ci=? copy-string write)
  '(formula string? string=? copy-string write)
  '(molecular-weight positive-real? = identity display)
  '(melting-point Celsius-temperature? = identity display)
  '(boiling-point Celsius-temperature? = identity display)
  '(color symbol? eq? identity display))

For the procedures defined in the resulting file (compound-definition.ss) to work, of course, we must either define positive-real? and Celsius-temperature? interactively or add those definitions to compound-definition.ss before loading it:

;;; positive-real?: determines whether a given value is a positive real
;;; number

;; Given:
;;   SOMETHING, a value

;; Result:
;;   OUTCOME, a Boolean

;; Preconditions:
;;   None.

;; Postconditions:
;;   OUTCOME is #T if SOMETHING is a positive real number, #F if it is
;;   not.

(define positive-real?
  (lambda (something)
    (and (real? something) (positive? something))))
;;; Celsius-temperature: determines whether a given value is a valid
;;; temperature, measured in degrees Celsius

;; Given:
;;   SOMETHING, a value

;; Result:
;;   OUTCOME, a Boolean

;; Preconditions:
;;   None.

;; Postconditions:
;;   OUTCOME is #T if SOMETHING is a real number greater or equal than the
;;   one that represents absolute zero on the Celsius scale, #F if it is
;;   not.

(define Celsius-temperature?
  (let ((absolute-zero -273.15))
    (lambda (something)
      (and (real? something)
           (<= absolute-zero something)))))

On the other hand, the copier-maker in generate-record-definition-file recognizes identity as a special case and generates appropriate code for it.

Once we finish these preliminary steps, we will be able to load compound-definition.ss and use the procedures relating to that record type:

> (load "compound-definition.ss")
> (define sample (make-compound "gadolinium iodide" "GdI3" 537.96 926 1340 'yellow))
> (compound? sample)
#t
> (compound-formula sample)
"GdI3"
> (compound-color-set! sample 'yellowish-orange)
> (display-compound sample)
#[compound "gadolinium iodide" "GdI3" 537.96 926 1340 yellowish-orange]

Quasiquotation

As delightful as it will be to have the generate-record-definition-file procedure available, we still have the rather daunting problem of writing it. If we use the methods employed above in the definition of constructor-maker, we'll have to thread our way laboriously through the process of assembling the necessary symbols with list and cons. This is tricky and, again, error-prone (as I well know -- I got in wrong in several earlier editions of this reading).

To simplify the process of writing generate-record-definition-file itself, we need one more kind of expression that Scheme provides: the quasiquotation.

Ordinary quotation converts a symbol, a list, or a vector into a Scheme datum by setting up a barrier to evaluation. The apostrophe tells the Scheme expression evaluator not to go to work on the subexpression to which it is prefixed, but rather to take that subexpression literally, in its unevaluated form. The quasiquotation mark (a backquote, ` -- you'll find this character in the upper left-hand corner of the keyboard) also converts a symbol, a list, or a vector into a datum, but its message to the Scheme expression evaluator is more subtle: It prohibits the evaluation of any subexpression except one that is immediately preceded by a comma or by the two-character prefix ,@ (``comma-at''). For a subexpression that is preceded by a comma or comma-at, evaluation is switched back on again, and the result of the evaluation is inserted into the datum at the point at which the subexpression occurs.

Here are some simple examples of quasiquotations:

> `(The sum of 7 and 5 is ,(+ 7 5))
(the sum of 7 and 5 is 12)

The quasiquote marks the list that it is attached to as a datum, turning off evaluation; the comma turns evaluation back on again for the subexpression (+ 7 5).

> (let ((sum (+ 7 5))) `(The sum of 7 and 5 is ,sum))
(the sum of 7 and 5 is 12)
> (let ((name (string #\B #\o #\b))) `(My name is ,name))
(my name is "Bob")
> (reverse `(a b ,(reverse '(c d)) e))
(e (d c) b a)

If you change the quasiquote in the last example to a quote, you naturally get no evaluation within the argument to reverse:

> 
(reverse '(a b ,(reverse '(c d)) e))
(e ,(reverse '(c d)) b a)

The difference between a comma that switches evaluation back on and the comma-at switch is that the value that results from a comma-at evaluation, which must be a list, is spliced into the context in which it occurs, instead of just being inserted as a list element:

> (define full-name '(Robert Calvin Makkai))
> `(My full name is ,full-name and I like it)
(my full name is (robert calvin makkai) and i like it)
> `(My full name is ,@full-name and I like it)
(my full name is robert calvin makkai and i like it)

As a more realistic example, here is how to write the constructor-maker procedure with the help of quasiquotation:

;;; constructor-maker: build a datum that looks like the definition of a
;;; constructor procedure for a given record type with given fields

;; Givens:
;;   RECORD-NAME, a symbol
;;   Some number of symbols, collectively called FIELDS

;; Result:
;;   DEFINITION, a list

;; Preconditions:
;;   (1) RECORD-NAME and all of the elements of FIELDS are valid Scheme
;;       identifiers. 
;;   (2) No two elements of FIELDS are identical.

;; Postcondition:
;;   DEFINITION is an S-expression which, if evaluated, defines a procedure
;;   that takes as many arguments as there are elements of FIELDS and
;;   returns a record of type RECORD-NAME, with the argument values as the
;;   fields of the record.

(define constructor-maker
  (lambda (record-name . fields)
    (let ((record-string (symbol->string record-name))
          (vector-size (+ (length fields) 1)))
      (let ((constructor-name
             (string->symbol (string-append "make-" record-string)))
            (type-marker-name
             (string->symbol (string-append "produce-" record-string "-mark")))
            (mutator-name
             (lambda (field)
               (string->symbol (string-append record-string
                                              "-"
                                              (symbol->string field)
                                              "-set!")))))
        (let ((mutators (let kernel ((rest fields))
                          (if (null? rest)
                              '()
                              (cons `(,(mutator-name (car rest))
                                      result
                                      ,(car rest))
                                    (kernel (cdr rest)))))))
            `(define ,constructor-name
               (lambda ,fields
                 (let ((result (make-vector ,vector-size)))
                   (vector-set! result 0 (,type-marker-name))
                   ,@mutators
                   result))))))))

The quasiquotation acts as a template; one fills in the template with the symbols and lists that are passed to constructor-maker as parameters or computed in the binding specifications of the let-expressions. A comma marks a position in the template that is to be filled in with the value of one of these identifiers; a comma-at marks a position at which a list that is associated with one of those identifiers is to be spliced in.

This version of constructor-maker produces exactly the same result as the earlier one, but it is much easier to write, because one can write out the structure of the desired result directly, as a quasiquotation, instead of having to build it up with cons, list, and append.

The full implementation of the record-building package, including the definition of generate-record-definition-file, is available here, in the file /home/stone/courses/scheme/examples/record-builder.ss. Note that quasiquotation is used extensively.