• Java 2 Platform Standard Edition 5.0 API specification
• OpenJDK source code repository for library classes
• Source code from Data structures and problem-solving using Java, third edition
One of the difficulties involved in compiling the index of a book is simply consolidating and sorting the data that the indexer collects. Typically, the indexer reads through the book, making a note of every occurrence of a name, term, or topic that seems appropriate for an index entry. Such a note might contain the number of the page (or range of pages) containing that occurrence and the text of the heading and sub-heading (if any) for the relevant index entry. For instance, here is a selection of the notes that an indexer might make in going through a book:
79: Exponential sums 95: Matrix/inverse 100: Subtraction/floating-point 105: Exponential sums 113: Exponential sums 152: Primitive recursive function 178: Subtraction 191-192: Subtraction 197: Subtraction 265: Subtraction 314: Matrix/inverse 366: Exponential sums 425: Matrix/null space 468: Subtraction/complex 482: Matrix/inverse 506: Subtraction/power series 602: Subtraction/continued fractions 625-626: Matrix/null space 214: Subtraction/floating-point 219: Subtraction/floating-point 230: Subtraction/floating-point 232: Subtraction/floating-point 238-239: Subtraction/floating-point 249: Subtraction/floating-point 250: Subtraction 657: Matrix/inverse 197: Subtraction
The same information would show up in the finished index thus:
Exponential sums, 79, 105, 113, 366
Matrix
inverse, 95, 314, 482, 657
null space, 425, 625-626
Primitive recursive function, 152
Subtraction, 178, 191-192, 197, 250, 265
complex, 468
continued fractions, 602
floating-point, 100, 214, 219, 230, 232, 238-239, 249
power series, 506
Your program should read in an indexer's notes from a text file specified
on the command line, compile the index, and write it out to standard output
(System.out).
Whoever prepares the input file is supposed to make sure that each line of the input file either is empty (so that BufferedFileReader.readLine() returns a string of length 0) or contains an indexer's note, consisting of a page number or page range, a colon, a space, and a heading or heading and sub-heading (separated by a slash). The text of a heading or sub-heading will never contain either a colon or a slash. A page range will consist of two page numbers separated by a hyphen.
If your program encounters a line in the input file that isn't empty and
doesn't have the right structure to be an indexer's note, it should echo
that line to System.err, appropriately labelled, but leave it out of
the index. It should then go on to the next line of the input file
(instead of, say, crashing).
In the index that you write out, the headings should be alphabetized, the
sub-headings associated with each heading should be alphabetized and
indented four spaces, and the page numbers associated with each heading or
sub-heading should be arranged in ascending numerical order. Page ranges
should be sorted according to their starting page numbers. Duplicate page
numbers associated with the same heading (or the same heading and
sub-heading), such as the two "197: Subtraction" notes in the sample
data set above, should be consolidated (that is, the duplicated page number
should appear only once in the index).
Submit the output from a test run on the data set above, along with the data sets and output for any other test runs you'd like me to consider. I reserve the right to run your program on additional data sets of my own nefarious contrivance.
This assignment will be due on Friday, April 18.