When a Scheme program is designed to work with large volumes of data, it is often more convenient for the user to prepare its input in one or more separate files, using an appropriate tool (such as a text editor or a statistical package), than to type the data in as the program is running. The Scheme program itself finds the files containing the data and reads them, without user intervention.
To provide for this possibility, each of Scheme's input procedures can be provided with an extra argument that specifies the input port through which the data will be read in. In theory, any kind of a device that supplies data on demand can be on the other side of the input port, and some implementations of Scheme provide several ways of creating them. However, we'll consider only the default input port, through which data typed at the keyboard are transmitted to a Scheme program interactively, and file input ports, through which Scheme programs read data stored in files.
When DrScheme or MzScheme starts up, it automatically creates the default
input port and connects the keyboard to it. This is the input port on
which the read procedure normally operates. When the user
exits from Scheme, this port is closed as part of the cleanup process.
To read data from a file, however, the programmer must explicitly open an
input port and connect that file to it. There is a built-in Scheme
procedure to do this: open-input-file takes one argument, a
string, and returns an input port to which the file named by the string is
connected. For instance, the call (open-input-file
"/home/stone/courses/scheme/examples/sample.dat") returns an input
port to which the file
/home/stone/courses/scheme/examples/sample.dat is connected.
Constructing the input port does you no good unless you give it a name, so
open-input-file is almost always invoked within some binding
construction, such as a definition or a let-expression:
(define source (open-input-file "/home/stone/courses/scheme/examples/sample.dat"))
The sample.dat file is a text file that contains one line,
consisting of the cheerful greeting Hi!. One can now access
the contents of the file by calling Scheme's build-in input procedures, but
giving them the input port source as an argument.
An input port can be used as an argument to two primitive input procedures:
read-char, which reads in (and returns) one character from the
file on the other side of the input port, and peek-char, which
looks through the input port to see what the next character in the file is,
and returns that character, but does not actually read it in from the file.
The difference is that you can peek at the next character as often as you
like, and it remains accessible through the input port, but once you read
in a character there is no way to ``un-read'' it -- the port advances
inexorably to the next character in the file.
For example, using the source input port that we defined
above:
> (read-char source) #\H > (peek-char source) #\i > (peek-char source) #\i > (read-char source) #\i > (read-char source) #\! > (read-char source) #\newline
Notice that the peek-char procedure peeks through the port to
see what the next available character of the file is, and returns the
character it sees. The read-char procedure pulls that
character in through the port and returns it, leaving the port open with
the following character accessible through it.
Scheme automatically provides a sentinel for every file input port it
opens. The sentinel is a special value known as the end-of-file
object. It is returned by any of the three input procedures when
there is nothing left to be read from the file. MzScheme prints the
end-of-file object as #<eof>. To continue the preceding
example:
> (peek-char source) #<eof> > (read-char source) #<eof> > (read-char source) #<eof>
The end-of-file object is not a character, and there is no standard Scheme
name for the end-of-file object, but there is a primitive predicate
eof-object? that detects it:
> (eof-object? (read source)) #t
As an example of the use of read-char, here's the definition
of a procedure called read-line, which reads in characters
through a given input port until it reaches the end of the file or
encounters a #\newline character, then returns a string
containing all of the characters that it has read in:
(define read-line
(lambda (source)
(list->string (let kernel ((next (read-char source)))
(if (or (eof-object? next)
(char=? next #\newline))
null
(cons next (kernel (read-char source))))))))
When all of the data have been read from a file, the programmer should
explicitly close the input port by invoking the
close-input-port procedure, giving it the input port as an
argument. Close-input-port is invoked only for its side
effect.
> (close-input-port source)
It is also possible to use a one-argument form of the read
procedure, which pulls a complete Scheme datum through a given input port
instead of just one character. It too leaves the port open, with the next
character accessible through it.
Here's another example of how to use Scheme's facilities for input from a
file. The sum-of-file procedure takes one argument, a string
that names a file full of numbers; the procedure opens that file, reads in
the numbers it contains one by one, adds each one in turn to a running
total, closes the file, and returns the total.
(define sum-of-file
(lambda (file-name)
(let ((source (open-input-file file-name)))
(let kernel ((candidate (read source)))
(cond ((eof-object? candidate) (begin
(close-input-port source)
0))
((number? candidate) (+ candidate (kernel (read source))))
(else (begin
(close-input-port source)
(error "sum-of-file: The file contains a non-number."))))))))
In the base case of the recursion, there are no numbers in the file, and
the call to the read procedure immediately returns the
end-of-file object. The sequencing specified by the
begin-expression ensures that the input port will be closed
before the answer, 0, is returned.
If the value of (read source) is a number, it is added to the
value of a recursive call to kernel, which is the sum of all
the subsequent numbers in the file.
If sum-of-file discovers a non-number in the file whose
contents it is adding up, then one of its preconditions has been violated,
and it closes the file and reports the error.
The file /home/stone/courses/scheme/examples/numbers contains five hundred and twenty-eight natural numbers. What is their sum?
Using sum-of-file as a pattern, write a Scheme procedure
file-size that takes as argument a string that names a file
and returns the number of characters in that file (that is, the number of
times that read-char can be called to read a character from
the file without returning the end-of-file object).
Find out what happens if sum-of-file or file-size
is given a string that does not name any existing file.
Similarly, when a Scheme program generates a lot of output, it is often more convenient to have it store the output in one or more files, instead of displaying it in the window that the interactive interface is using. Other programs can recover the results from such files if further processing is needed.
To provide for this possibility, each of Scheme's output procedures can be provided with an extra argument that specifies the output port through which the data will be written. As before, we'll consider only the default output port -- the interaction box, under DrScheme -- and file output ports, through which Scheme programs write data to files.
If you followed the discussion of input ports, there are few surprises
about output ports. The default output port is created when the
Scheme interactive interface starts up and closed when it shuts down; in
between, Scheme uses this port for most calls to write,
display, and newline.
To write data to a file instead, the programmer must explicitly invoke
open-output-file, which returns a file output port; once this
output port is given a name, it can be used as an extra argument to any of
the output procedures, with the effect that the values will be written to
the file rather than to the interaction window. When no more output is to
be written to the file, the programmer must explicitly close the port by
invoking close-output-port.
As an example, here's a procedure that takes two arguments -- the first a string that names the output file to be created, the second a positive integer -- and writes the exact divisors of the positive integer into the specified output file:
(define store-divisors
(lambda (file-name dividend)
(let ((target (open-output-file file-name)))
(let kernel ((trial-divisor 1))
(if (< dividend trial-divisor)
(close-output-port target)
(begin
(if (zero? (remainder dividend trial-divisor))
(begin
(write trial-divisor target)
(newline target)))
(kernel (+ trial-divisor 1))))))))
Use the store-divisors procedure to draw up a list of the
divisors of 120, storing them in a file named divisors-of-120.
Examine the file afterwards and confirm that the answer is correct. (Don't
give this procedure an extremely large number as argument -- it's too slow.
There are more efficient ways to find divisors!)
The Scheme standard says that if you try to open an output port to a file that already exists, ``the effect is unspecified,'' i.e., anything might happen. Find out what DrScheme does in this situation. (DrScheme actually gives the programmer the opportunity to modify this behavior -- the Help Desk document ``Opening file ports'' describes the possibilities, if you're curious.)
Incidentally, to enable the programmer to test the precondition for
open-output-file, DrScheme supplies a
file-exists? predicate, which takes a string as argument and
returns #t if it is the name of an existing file and
#f if it is not. It also supplies a delete-file
procedure that takes a string as argument and tries to annihilate the file
that it names (if there is such a file). Neither of these procedures is
standard, however, so other Scheme implementations do not always provide
them.
Two positive integers are said to be relatively prime if they have
no common divisors other than 1 -- in Scheme: (= (gcd first second)
1). With store-divisors as a model, write a Scheme
procedure store-relative-primes that takes two arguments, the
first a string that names the output file to be created and the second a
positive integer, and writes into the specified output file every positive
integer that is less than the specified positive integer and relatively
prime to it.
Besides write, display, and newline,
Scheme provides a primitive procedure write-char that is used
to create an output file one character at a time. It takes two arguments,
the character to be written and the output port through which it is to be
sent.
The following procedure uses write-char to write a given
string to a file through a given output port, converting any upper-case
letters to lower case en route:
(define display-string-in-lower-case
(lambda (str out)
(let ((len (string-length str)))
(let kernel ((written 0))
(if (< written len)
(begin
(write-char (char-downcase (string-ref str written)) out)
(kernel (+ written 1))))))))
Scheme provides the type predicate input-port?, which can be
applied to any object to determine whether it is an input port, and the
analogous predicate output-port?.
The current-input-port procedure, which takes no arguments,
returns the default input port, in case you want to give it a name, pass it
as an argument to a procedure that expects a port, and so on. Similarly,
the current-output-port procedure takes no arguments and
returns the default output port.
It is a bad idea to attempt to close the default ports. The best thing that can happen is that whatever implementation of Scheme you're using will ignore the attempt or report it as an error.
Copy the definition of display-string-in-lower-case, above,
into DrScheme and test it by displaying a mixed-case string in the current
output port.
This document is available on the World Wide Web as
http://www.cs.grinnell.edu/~stone/courses/scheme/files.xhtml
created October 23, 1997
last revised March 17, 2000