A ``permuted-word puzzle'' is a string of letters that, when rearranged, form a familiar word. The object of the puzzle is to determine the word. For instance, the string `regano' is a permuted-word puzzle; its solution is `orange'.
It is not difficult to write a Scheme procedure that can take a string of letters and rearrange them in every possible way. However, since the number of rearrangements is typically quite large, detecting the solution amid the large volume of gibberish output is often a problem in its own right. The project is to write a Scheme procedure that will identify and display only the most ``plausible-looking'' permutations.
How can the computer distinguish plausible arrangements of letters from
gibberish? Well, one mark of gibberish is that it contains impossible or
rare combinations of letters, such as hj or pw. The
solution to a permuted-word puzzle is likely to be made up entirely of more
common combinations. In our sample puzzle, for instance, `gnraoe' can be
recognized as gibberish, because the combinations ao and
nr are uncommon; on the other hand, such permutations as `angore'
and `garone' are plausible guesses that look vaguely like English words,
partly because they consist entirely of high-frequency letter-pairs.
A Scheme program that implements such a procedure can be found at /home/stone/courses/scheme/html/unjumble-project.ss. Copy this file into your home directory by opening a dtterm window and giving the shell command
cp /home/stone/courses/scheme/html/unjumble-project.ss ~/unjumble-project.ss
Load the program into Chez Scheme and try it on a few five- and six-letter permuted-word problems, such as the following (taken from the ``Jumble'' feature by Henri Arnold and Bob Lee, as printed in the Des Moines Register for March 8, 1998):
For instance, to having it generate the ten most plausible guesses for the first of these problems, you would type
(unjumble "neetic" 10)
How accurate are the program's guesses? How helpful would it be to a baffled solver of permuted-word puzzles?
One defect of the program, in its current form, is that the scoring system is heavily weighted towards common letter combinations. It simply adds up the frequencies of the letter-pairs occurring in a permutation. If an e occurs anywhere in the puzzle, the program will try very hard to force it to the end of the word, because an e at the end of a word receives a score almost twice as high as an e before any letter of the alphabet (because, in the sample from which the frequency statistics were drawn, e occurred at the end of a word almost twice as often as before any single letter). On the other hand, the program hardly distinguishes between combinations that are possible but infrequent (such as xa) and those that are virtually impossible (such as mx).
To correct this problem, we should transform the raw frequency values applying some function that grows rapidly for small positive values and more slowly for large ones, such as the square-root or logarithm function. Make this change in the program, re-load it into Scheme, and try to judge whether the performance of the program improves.
Write a Scheme procedure that takes a permuted-string puzzle and its
solution as arguments and determines how many guesses the strategy you have
formulated will take to get the right answer. (In other words, determine
the position of the correct answer on the sorted list of permutations
that the expression (sort (score-permutations jumbled))
returns.)
Suppose that you want to start with a given word and to construct a permuted-word puzzle that has the given word as its solution. The puzzle is, in a way, more interesting if the jumbled version (a) looks like an English word rather than like gibberish, but (b) does not look at all like the solution. Write a Scheme procedure that takes the given word as its argument and comes up with a puzzle permutation that satisfies these requirements. e
This document is available on the World Wide Web as
http://www.math.grin.edu/~stone/courses/scheme/unjumble-project.html
created March 8, 1998
last revised June 9, 1998