Lab: Introducing Emacs Lisp

CSC 323: Software design · Spring, 2012

Department of Computer Science · Grinnell College

0: Expression evaluation

Emacs Lisp is a special-purpose programming language, supporting the imperative and functional models of programming. Its special purpose is to make it simple and convenient to extend the capabilities of the GNU Emacs editor. Although Emacs Lisp, broadly speaking, is closely related to Common LISP, Scheme, and other variants and derivatives of LISP, Emacs Lisp supports a variety of built-in data types relating to GNU Emacs, such as buffers, frames, processes, keymaps, and fonts, and functions that perform various operations on values of those types.

Emacs Lisp's syntax for expressions and its approach to expression evaluation are much the same as Scheme's. Numerals and string literals look much the same in the two languages, as do function calls (although some of the functions have different names).

To evaluate an Emacs Lisp expression, open an Emacs window and press M-S-: (that is, hold down the Alt [in Emacs parlance, “meta”] and Shift keys and press the colon key). The prompt Eval: appears in the minibuffer at the bottom of the window. Type in the expression and press return. The value of the expression will appear in the minibuffer.

Exercise: Evaluate an Emacs Lisp expression that finds the sum of 763, 318, and 297. (Note that the value-printer displays exact integer results in octal and hexadecimal formats as well as the normal decimal format.)

Exercise: Find the square root of 882 by evaluating an Emacs Lisp expression.

Exercise: Determine the length of the string "abracadabra". (Hint: Emacs Lisp uses a generic function to determine the number of elements of any of its built-in, linearly structured data types.)

1: Definitions

Emacs Lisp uses three different styles of definition, depending on whether the identifier being defined is supposed to denote a function, a non-function value that can change during execution, or a non-function value that is not supposed to change. These styles are signalled by the use of different keywords: defun, defvar, and defconst respectively.

A minimal variable definition consists of the keyword defvar and an identifier, enclosed in parentheses: (defvar counter). This allocates a storage location and binds it to the identifier counter, but does not initialize it.

To initialize a variable as part of its definition, add an expression that evaluates to the initial value just before the right parenthesis: (defvar counter 0).

The GNU Emacs help system can recover and display documentation about a variable that you introduce if you include a documentation string in the variable definition. Only initialized variables can be documented in this way. The documentation takes the form of an expression with a string value, usually a string literal, placed between the initialization expression and the right parenthesis in the definition: (defvar counter 0 "a tally of inserted characters.").

Definitions return values in Emacs Lisp, just as expressions do. The value of a variable definition is the identifier being defined (as a symbol).

To initialize or change the value of a variable, use an assignment expression, which consists of the keyword setq, the variable to be changed, and an expression whose value is to be stored into the location associated with the variable, all enclosed in parentheses: (setq counter (+ counter 1)). It is not actually necessary to define a variable before using it; if GNU Emacs's first encounter with a variable is an assignment expression, GNU Emacs simply allocates storage at that point and associates it with the identifier.

In constant definitions, the keyword is defconst and an initialization expression is required rather than optional; otherwise, constant definitions are just like variable definitions. In particular, the documentation string works the same way.

GNU Emacs does not actually prevent changes in the value of identifiers introduced through constant definitions. It's up to the programmer to abstain from such changes. (So the difference between defvar and defconst is mostly rhetorical.)

A function definition consists of the keyword defun, an identifier to denote the function being defined, a list of parameters, and one or more expressions, making up the body of the function, all enclosed in a pair of parentheses. Here's the definition of a function that takes a pair as its argument and returns a similar pair, but with the car and the cdr swapped:

(defun swap-contents (pr) (cons (cdr pr) (car pr)))

And here's a stock example of list recursion (finding the sum of a list of numbers):

(defun sum (ls) (if (null ls) 0 (+ (car ls) (sum (cdr ls)))))

The value of a function definition is the identifier being defined, as a symbol.

You can associate a documentation string with a function by placing it immediately after the parameter list. It is optional and has no effect on function execution.

It's not very convenient to enter such definitions into the minibuffer. Fortunately, there's another way to get GNU Emacs to evaluate an expression or to process a definition: Put the definition in a buffer. The C-x C-e key sequence (that is, control-X followed by control-E) directs GNU Emacs to process the expression or definition that immediately precedes the position of the editing cursor. Again, the value of the expression will appear in the minibuffer.

Exercise: Write the definition for a constant avogadro denoting Avogadro's number, 6.02 · 1023. (Emacs Lisp uses the same convention for exponential notation as Scheme and C, with e as the marker separating the coefficient from the exponent.) Use C-x C-e to tell GNU Emacs to learn that definition. Then have GNU Emacs evaluate the expression (* avogadro 2).

Exercise: Add a documentation string to your definition of avogadro and have GNU Emacs learn the new definition. Then press C-h v and type in the name avogadro to see how the GNU Emacs help system reports on it.

Exercise: A more precise value for Avogadro's number is 6.02214 · 1023. Write an assignment expression changing the value of the so-called constant avogadro to this more precise value. Have GNU Emacs evaluate avogadro afterwards to confirm that the old value has been replaced.

Exercise: Write the definition for a function called cube that computes the third power of its argument. Include a documentation string. Use C-x C-e to tell GNU Emacs to learn the definition, then have GNU Emacs evaluate the expression (cube avogadro). Press C-h f and type in the name cube to see how the GNU Emacs help system describes your function.

Exercise: Write the definition for a function called rev that takes a list as argument and returns a similar list, but with the elements in the opposite order.

For comparison, here is the definition of rev in Scheme:

(define (rev ls) (letrec ((rev-helper (lambda (rest so-far) (if (null? rest) so-far (rev-helper (cdr rest) (cons (car rest) so-far)))))) (rev-helper ls '())))

The exercise is essentially to port this definition to Emacs Lisp. (Hint: Define rev-helper as a separate function. Emacs Lisp does not support either letrec-expressions or named let-expressions.)

2: Interactive functions

If you are defining a function that someone might use interactively during an editing session, you should also place an interactive declaration at the beginning of the function's body (but after the documentation string). An interactive declaration consists of the keyword interactive and, optionally, an expression indicating how the function should receive arguments from the user. Section 21.2 of the GNU Emacs Lisp reference manual goes into detail about how the great variety of expressions that can appear in interactive declarations. In this lab, however, we'll consider just a few of the common examples.

If the expression in an interactive declaration is a string beginning with a lower-case x, an interactive invocation of the function prompts the user for the argument, which can be any Emacs Lisp value. The part of the string following the initial x is used as the prompt. For instance, here is sum, revised so that it can receive the list of numbers directly from the interactive user:

(defun sum-helper (rest) (if (null rest) 0 (+ (car rest) (sum-helper (cdr rest)))))

(defun sum (ls) "determines the sum of the elements of a given list." (interactive "xList of numbers: ") (prin1 (sum-helper ls)))

The prin1 function invoked in the body of sum is the Emacs Lisp equivalent of display; it writes the value of its argument into the minibuffer. (Interactive functions are normally invoked only for their side effects, so the return values resulting from such invocations are discarded. Thus we need the explicit call to prin1 in order to have the result show up in the minibuffer.)

To invoke an interactive function, once GNU Emacs has learned it, you press M-x (that is, hold down the Alt key and press x), type in the name of the function (in this case, sum) and press Return. The List of numbers: prompt appears, and you type the desired argument into the minibuffer and press Return again at the end.

Instead of calling prin1, we could invoke the built-in function insert to place the result in the buffer, at the point of the editing cursor. However, insert would misinterpret an numeric argument as the Unicode codepoint for a character, so we need to convert the sum into a string value before turning it over to insert:

(defun insert-sum (ls) "inserts the sum of the elements of a given list at point." (interactive "*xList of numbers: ") (insert (number-to-string (sum-helper ls))))

The extra asterisk at the beginning of the string literal in the interactive declaration indicates that GNU Emacs should signal an error if the current buffer is read-only when this function is invoked.

You can also use M-S-: to evaluate an expression like (sum '(7 2 4 3)). With the present definition of sum, however, this causes the sum of the elements to be written to the minibuffer twice, once by the call to prin1 and again by the value-printer. (Unlike Scheme's display, prin1 returns the string that it prints as the value of the function call.)

Exercise: Make your definition of rev interactive. Give it a documentation string, if it does not already have one. Test it interactively (using M-x rev).

Exercise: The built-in insert function always returns the symbol nil as its value. If the M-S-: expression evaluator is used to evaluate a call to the insert-sum function, where, if anywhere, will this symbol be displayed?

Exercise: The built-in function current-time-string takes no arguments and returns a string giving the current time of day and the date. Evaluate an invocation of this function. Then use it, along with the insert and substring functions, to write an interactive insert-time-of-day function that, when invoked, inserts the current time of day (hour and minute only) into the current buffer.

The substring function in GNU Emacs is like Scheme's: Given a string, a starting position, and an ending position, it returns the substring starting with the character in the given starting position and ending just before the given ending position. Positions are zero-based, as in Scheme. (Hint: Since insert-time-of-day takes no arguments and requires its buffer to be read-only, the appropriate expression for the interactive declaration is simply "*".)

3: Notes on nil, t, and self-denoting symbols

The symbol nil actually plays three different roles in Emacs Lisp. In addition to being a symbol, it represents both the Boolean false value and the empty list, much as the integer 0 represents the Boolean false value and a null pointer in C.

It is useless to point out that this conflation is arbitrary and confusing. Like the names car and cdr, it is just a convention that we have inherited from early implementations of LISP, and so much Emacs Lisp code already depends on it that the convention is not likely to change.

Similarly, the symbol t represents the Boolean true value. In addition, both nil and t are predefined identifiers, each denoting itself (as a symbol). It is therefore unnecessary to precede either one with a quote when referring to it in Emacs Lisp code.

Emacs Lisp also provides that any symbol that begins with a colon is automatically self-denoting, as nil and t are. Thus evaluating the identifier :foo yields the symbol :foo, whereas evaluating the identifier foo is an error (unless some variable definition, constant definition, or assignment expression has already allocated and initialized a storage location and associated it with that identifier).

Exercise: The built-in predicate eq (no question mark) takes two arguments and returns a Boolean value indicating whether they are identical. What is the value of (eq '() nil)?

Exercise: What is the value of :nil? Is it identical to the empty list?

4: Notes on characters, character literals, and strings

Characters in Emacs Lisp are identified with their Unicode code points. That is, they are integers, rather than being a distinct data type of their own -- more like C's char values than like characters in Scheme. A character literal is formed by placing a question mark in front of the character itself. (For instance, the literal ?A denotes 65, the Unicode code point for the capital Latin letter A.) Backslash escapes are used to extend this convention to characters without graphic representations, such as ?\s for 32 (the space character) and ?\n for 10 (the newline character). The same escapes (without the initial question marks) can also be used inside string literals.

Strings are stored as arrays of character values. As in Scheme, there is a make-string function that takes a non-negative integer and a character as arguments and returns a string of the specified length consisting entirely of copies of the character; for instance, the call (make-string 80 ?\s) returns a string of eighty space characters. Also as in Scheme, there is a string function that takes any number of characters as arguments and assembles them into a string. There is a function for concatenating any number of strings, although it is called concat in Emacs Lisp (Scheme calls it string-append). The predicate that determines whether two strings are the same is called string= in Emacs Lisp.

Exercise: Write an Emacs Lisp expression that, when evaluated, inserts eight newline characters into the current buffer at the position of the editing cursor.

5: Initialization scripts

When GNU Emacs starts up, any definitions and expressions that are in the user's ~/.emacs file are processed non-interactively, for their side effects only. For instance, the assignment expression (setq-default require-final-newline t) establishes the true Boolean value as the default value of the predefined variable require-final-newline, so that GNU Emacs will ensure that any text buffer that is saved to a file has a newline character at the end of the last line, even if the user did not explicitly insert one. If this expression is placed in the ~/.emacs file, it will be executed automatically whenever GNU Emacs is launched (unless the user directs otherwise by specifying the --no-init-file command-line option or some option that implies it).

A small ~/.emacs file is placed in each MathLAN account when it is created, replacing some of GNU Emacs's default behaviors with others that are arguably more appropriate for novice users. You might want to look through your ~/.emacs file at this point to see how it affects the editing environment.

You can bind a function to a key by invoking the global-set-key function, which takes an expression denoting a key or key combination as its first argument and a symbol that names a function as its second argument. For instance, if you find the Alt-Shift-colon combination that fires up the interactive eval-expression function difficult to remember or awkward to execute, you can bind that function to the F11 key at the top of the keyboard by evaluating the expression

(global-set-key [f11] 'eval-expression)

which can also be placed in your ~/.emacs file. Although GNU Emacs itself binds most of the easy key combinations using control, alt, shift, and a letter key, it leaves the function keys available for customization by users.

Exercise: Revise your ~/.emacs file so that it binds your insert-time-of-day function to the F6 key. Exit from GNU Emacs and restart it (or, alternatively, evaluate the expression (load-file "~/.emacs")) and confirm that pressing the F6 key inserts the current time of day.

Exercise: Write an insert-current-date function that inserts the current date (in ISO format, 2012-04-13) at the position of the editing cursor. Then bind that function to the F7 key. (Hint: The predefined procedure format-time-string will deliver the date in the format if you give it the format string "%Y-%m-%d".)

Exercise: Write an interactive Emacs Lisp function that accepts a string and inserts, at the position of the editing cursor, a decorative header consisting of three lines. The first line should consist of two slashes, a space, and seventy-two plus signs; the second should consist of two slashes, a space, and the string provided interactively; the third should be a duplicate of the first. Bind this function to the F8 key. (Hint: If the expression in the interactive declaration is a string literal beginning with the characters *s, the asterisk causes GNU Emacs to signal an error if the function is invoked when the current buffer is read-only, and the s causes whatever input the user supplies to be read in as a string, even if it is not enclosed in quotation marks. The part of the string literal that comes after the *s is the prompt that GNU Emacs will issue.)

6: Point, mark, and region

GNU Emacs keeps track of the position of the editing cursor within the current buffer as a positive integer, equal to one plus the number of characters in the buffer that precede the editing cursor. The predefined function point takes no arguments and returns this position number.

The user often wants to mark some region of the current buffer as the operand for some editing operation, most often cutting or copying. Under a graphical user interface, this is generally done by left-clicking at one end of the region, dragging to the other end, and releasing. If the mouse is not available, the user can do essentially the same thing by positioning the editing cursor at one end, pressing C-space (that is, holding down the Control key and pressing the space bar), and then moving the editing cursor to the other end of the region. The C-space key combination invokes an Emacs Lisp function that stores the integer designating the position into one of GNU Emacs's internal variables. The current region is the part of the buffer's text that lies between the editing cursor and the most recently set mark. The function mark, which takes no arguments, returns the position of the most recently set mark.

In addition to the most recently set mark, GNU Emacs maintains a stack of previously set marks. So the usual way for an Emacs Lisp program to set a mark at a particular position is to invoke the push-mark function, which pushes the current mark onto this stack and replaces it with a position specified by the argument, or with the position of the editing cursor if no argument is given. The pop-mark function, which takes no arguments, pops the stack of previously set marks and replaces the current mark with the position popped from the stack.

The buffer-substring function returns a string consisting of the part of the current buffer that lies between two given positions. So, for instance, you can obtain the contents of the current region by evaluating (buffer-substring (point) (mark)). (The arguments to buffer-substring can be in either order; the value returned always begins with the character in the lesser position and ends just before the greater position.)

Often an Emacs Lisp function carries out some operations that would normally have side effects on the point and mark positions. Such side effects can confuse or disorient a user who is not expecting them. Emacs Lisp therefore supports a kind of expression that automatically saves the point and mark positions at the beginning of a block of code and restores them at the end of the block: the save-excursion expression. As you would expect, it consists of the keyword save-excursion and any number of expressions, all enclosed in parentheses.

Exercise: What value does point return when the editing cursor is at the very beginning of the current buffer? Check your answer.

Exercise: Write an expression that sets the mark at the beginning of the current buffer without changing the position of the editing cursor.

7: Motion functions

Emacs Lisp supports several functions for moving the position of the editing cursor around under program control: forward-char and backward-char, which moves it forwards or backwards through a specified number of characters, basically by incrementing or decrementing the variable that keeps track of point; forward-word and backward-word, which moves it forwards or backwards through the specified number of stretches of word-constituent characters alternating with word-separator characters; goto-char, which moves it directly to the position specified by the argument (i.e., assigns the value of the argument to the point variable).

The point-min and point-max functions take no arguments and return, respectively, the positions of the beginning and end of the buffer.

You can also move around by lines in the text: (beginning-of-line) moves you to the left end of the current line, (end-of-line) to the right end; the forward-line function moves you to the left end of the next line, and you can give it an argument to advance a specified number of lines forwards (positive integer arguments) or backwards (non-positive integer arguments) before positioning the editing cursor at the left end.

Still another alternative is to move forward or backward until a copy of given string is encountered; the search-forward and search-backward commands do this.

Exercise: Write a sequence of Emacs Lisp expressions that have the effect of moving the editing cursor to the beginning of the next line containing a left parenthesis.

8: Cutting and pasting

The kill-region function is used for deleting a region from a buffer. It takes two positive integers as arguments, interprets both of them as character positions, and cuts the characters from the first position (inclusive) to the second (exclusive). When invoked interactively, it simply fills in the two arguments with the values (point) and (mark), rather than prompting the user for arguments.

The kill-region places the deleted region in GNU Emacs's “kill ring”, so the user can restore it with C-y, possibly after moving the editing cursor. The kill ring is a data structure rather like a stack of strings, except that one can access elements other than the top one by following up the initial C-y with repetitions of M-y.

The copy-region-as-kill function is similar to kill-region, but copies the specified region into the kill ring (for subsequent pasting) without deleting it.

The yank function inserts a string from the kill ring at the position of the editing cursor. It takes an optional argument, specifying the (one-based) position, counting from the top of the kill ring, of the string to be recovered and inserted. If the argument is omitted, it defaults to 1, indicating that the most recently killed string should be inserted.

Exercise: Write an interactive Emacs Lisp function swap-next-words that swaps the two words immediately following the editing cursor (converting "the and" into "and the", for instance). Use save-excursion so that changes in point and mark during the function are discarded.

Exercise: Add the definition of the swap-next-words function to your .emacs file and bind it to the F9 key.

9: Libraries

A file containing some thematically related Emacs Lisp definitions (and optionally some expressions as well) is called a library. You can tell GNU Emacs to process all of the definitions and expressions in a library by giving the interactive command M-x load-library and supplying the name of the file. If the file name ends in .el, as Emacs Lisp libraries generally do, you can leave off the suffix when naming the file, and the load-library function will find it anyway.

In searching for a library file, GNU Emacs looks in directories specified in a list of strings stored in the variable load-path. You can add directories to this list with a command such as

(setq load-path (append load-path (list "/home/spelvin/emacs-libs")))

If you have written some Emacs Lisp libraries of your own, and placed them in some directory, this command is a good one to put into your ~/.emacs file.

Thousands of Emacs Lisp libraries have been released by their authors as free software. MathLAN makes over 1300 of these available to users. If you're curious to see Emacs Lisp in action, you can find many of these libraries in /usr/share/emacs and its subdirectories. Dependencies among libraries are signalled by the use of functions called require and provide. Calling require directs GNU Emacs to load a specified library if it is not already loaded; calling provide announces the availability of a library to be "required" by other libraries.

If a library is so large that loading it takes a noticeable amount of time, it is usually compiled and distributed in compiled form, as described in Chapter 16 (“Byte compilation”) of the GNU Emacs Lisp reference manual. The file extension .elc conventionally indicates a compiled Emacs Lisp library.

If a function is defined in a library, as most GNU Emacs functions are, you can bring up the source code for that library by typing C-h f, followed by the name of the function, to bring up the GNU Emacs help system's entry on that function, and then left-clicking on the name of the library in the text of that entry. The source code will appear in the help window. This is sometimes instructive if you're trying to figure out exactly what the function does or how it does it.

Exercise: Place your insert-time-of-day and insert-date functions in a file named insert-time.el. Add a call to the load-library function to your ~/.emacs file so that insert-time.el is loaded every time GNU Emacs starts up. Exit from Emacs, start it up again, and confirm that the definitions in your library were indeed processed.

Exercise: The sunrise-sunset function computes the local time of sunrise and sunset for the current day. What is the name of the library in which it is defined? Where is that library located on MathLAN workstations? (Hint: Use the find command.) Is it byte-compiled? What happens if you run this function at the South Pole, where the sun doesn't get above the horizon at all today [April 16, 2012]? What would you need to know in order to make this function report the time of sunrise and sunset for your home town?