On MathLAN, the utility program cp is often used to create an
exact duplicate of a given text file. In a terminal-emulator window, the
command
cp original duplicate
copies the file named original into a new file named
duplicate. You wind up with two files that have exactly the
same contents.
At this point, we can write a Scheme program to do exactly the same thing:
(define copy-file
(lambda (name-of-original name-of-duplicate)
(let ((source (open-input-file name-of-original))
(target (open-output-file name-of-duplicate)))
(let kernel ((next-character (read-char source)))
(if (eof-object? next-character)
(begin
(close-input-port source)
(close-output-port target))
(begin
(write-char next-character target)
(kernel (read-char source))))))))
(copy-file "original" "duplicate")
Here's the definition of copy-file in English: Let
source be a port through which we can pull characters in from
the file to be copied, and let target be a port through which
we can push characters out to the new file. Try to read a character from
source. If it's the end-of-file object, close both ports and
we're done; otherwise, write the character to target, try to
read another character from source, and repeat this step.
Since every new call to the kernel procedure consumes one
character from the source file, the end of that file will ultimately be
reached and the recursive calls will cease.
After the definition, we complete the program with an appropriate call to
the copy-file procedure, giving it the file names as arguments.
The copy-file procedure exemplifies one of the common patterns
of complete-file recursion -- recursion guided by the structure of
the file from which data is read. The base case in a complete-file
recursion is the case in which the file contains no data, or at least no
more data, so that the value of a call to some input procedure is the
end-of-file object. If that base case has not yet been reached, a
complete-file recursion procedure performs some operation on the value that
has just been read in -- in copy-file, the character
next-character -- and invokes itself recursively to deal with
the rest of the file, starting with an attempt to read in another datum.
The copy-file procedure illustrates the tail-recursive version
of complete-file recursion. (It is tail-recursive because the transfer of
each character from the source input port to the
target output port takes place before the recursive call is
made; after the recursive call has been evaluated, there is no more work to
be done.)
The sum-of-inputs procedure from the lab
on files illustrates complete-file recursion in its non-tail-recursive
form: Each recursive call to sum-of-inputs returns the sum
of the part of the file that has not been read yet at the time the call is
made, and the current element is added to that sum after the
recursive call returns it.
The arguments that the caller supplies to the copy-file
procedure are the strings that name the files. The
copy-file procedure itself is responsible for opening and
closing the ports to those files. An alternative approach, frequently used
because of its greater flexibility, is to write the copying procedure so
that it takes the ports as arguments, making the caller
responsible for opening them before the procedure call and closing them
afterwards. Here's how the program looks if this approach is used:
(define port-copy
(lambda (source target)
(let kernel ((next-character (read-char source)))
(if (not (eof-object? next-character))
(begin
(write-char next-character target)
(kernel (read-char source)))))))
This is a much simpler and clearer procedure. On the other hand, whoever
calls it has to open the input and output ports before invoking
port-copy and close them afterwards, and it's easy to forget
to do this.
If you have the definition of port-copy in a DrScheme
definitions window, what expression would you add after this definition to
complete a program that has the same effect as the shell command cp
original duplicate?
Adapt the port-copy procedure so that it copies only
letters and whitespace characters to the output port, discarding all
others.
Let's say that the complement of the character in position n in the ASCII character set is the character in position 127 - n. (For example, the complement of the capital Y, which is in position 89, is the ampersand, &, which is in position 38.) Adapt either version of the copying program so that, instead of echoing each character from the source file into the target file without change, the program replaces each character with its complement, producing an encrypted file.
An input port operation is a Scheme procedure that takes an input
port as its only argument. For instance, it would be easy to rewrite the
sum-of-file procedure from the first lab
on files as an input port operation, by requiring the caller to create
the port before invoking the procedure and to close it afterwards:
(define port-sum
(lambda (source)
(if (not (input-port? source))
(error 'port-sum "The argument must be an input port"))
(let kernel ((total 0)
(next-number (read source)))
(if (eof-object? next-number)
total
(kernel (+ total next-number) (read source))))))
One advantage of writing this procedure as an input port operation is that
one can then use the primitive Scheme procedure
call-with-input-file to invoke it. The
call-with-input-file procedure takes two arguments, the first
of which is a string that names an existing file and the second an input
port operation. Call-with-input-file automatically opens the
file, invokes the input port procedure (giving it the port to the input
file), collects the value that it returns, closes the port, and returns the
value collected from the input port procedure. In other words, it works
essentially as if it were defined like this:
(define call-with-input-file
(lambda (name-of-input-file operation)
(let* ((source (open-input-file name-of-input-file))
(result (operation source)))
(close-input-port source)
result)))
If the file numbers.dat contains nothing but numbers, the
following expression computes the sum of those numbers:
(call-with-input-file "numbers.dat" port-sum)
Write an input port operation port-size that reads characters
one at a time through a given port until it encounters the end-of-file
object, then returns the number of characters read (not including the
end-of-file object).
Use port-size and call-with-input-file to
determine how many characters are in the file
/home/stone/courses/scheme/examples/sample.dat.
Naturally, there is a corresponding notion of an output port
operation -- a procedure that takes an output port as its only
argument. Scheme provides a built-in procedure
call-with-output-file that takes as its arguments a string
that names a file to be created and an output port operation, opens a port
to the specified output file, runs the output port operation on that port,
closes the port, and returns the result of the output port operation.
At this point, call-with-output-file seems much less useful
than call-with-input-file, because it's hard to think of
plausible output-port operations -- all the interesting output procedures
take two or more arguments. Shortly we'll see how to get around this
restriction.
Using the read-line procedure defined in the lab on files, write a Scheme procedure
line-lengths that takes as arguments an input port and an
output port, reads a line at a time from the input port, and writes to the
output port the length of each line that it reads (i.e., the number of
characters on that line, including the newline character that terminates
the line).
Since read-line never returns the end-of-file object, you'll
have to identify the base case for your file recursion differently in
defining line-lengths. (Note that read-line
always returns a string containing at least one character until the end of
the file is reached, at which point read-line returns a null
string.)
Figure out how to test the procedure you wrote in the preceding exercise and run the test.
This document is available on the World Wide Web as
http://www.cs.grinnell.edu/~stone/courses/scheme/recursion-with-files.xhtml
created October 28, 1997
last revised March 17, 2000