Fundamentals of Computer Science II (CSC-152 98S)

[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Labs] [Assignments] [Examples] [Bailey Docs] [SamR Docs] [Tutorial] [API]

Lab 7: Page Counting with Dictionaries

Description: In the laboratory, you will be developing a program that analyzes simplified web log files. Your program will identify a page that has the most hits (if multiple pages have the same number of hits, you need only identify one) and a machine that generated the most hits.

Purpose: In this lab, you will gain experience with Dictionaries in Java. You will also gain some knowledge of pages on the web and prepare yourself for assignment 3. More advanced users may gain some experience with Enumerations in Java.


You can find some sample log files in /home/rebelsky/Examples. The files are log.short, log.medium, and log.long. Each line of each file contains a host name and the "path" to a file on the web server. You may want to look at these files to better familiarize yourself with their content. You can look at access_log to see the format our server normally creates.

I've started to develop a simple analysis tool for log files. Right now, all it does is count the number of accesses for each page by storing a Counter for each page it finds. The code is a little weird, but you should be able to understand it. Make a copy of that code with
% example

You may want to read the documentation for and java.util.Hashtable. In reading the documentation for Hashtable, pay particular attention to get, put, and containsKey.

Notes on Code


One of the strange parts of the page counter is the line

          ((Counter) pages.get(page)).increment();
What does it say? It says look up page in the dictionary pages. Since get returns an object and we know that that object is a Counter, we tell Java about its real type. Since it's a Counter, we can (and do) increment it.

To tell Java more about the type of an object, your preface the object by the type in parentheses. This is calle casting the object.


Finally, a program that actually uses the args that we've included so frequently. As you may be able to tell from the documentation, Java assumes that some programs will be run from the command line, just like mkdir and a host of others. Hence, your Java program will need to be able to access the other values on the command line. Java passes them to your main routine as an array of strings.

For example, if someone typed
% ji YourProgram alpha beta gamma

In this program, we use the 0th argument as the name of the file to process.

Basic Extensions

Extend the page counter so that it prints out the most frequently accessed page. If there are many pages that are accessed the same number of times, you only need print one of those pages.

Extend the page counter so that it counts the number of different sites that accessed pages on this server.

Extend the page counter so that it prints out the server that accessed the most pages.

Advanced Extension

Extend the page counter so that if there are many pages accessed the "maximum" number of times, it prints out all of them. How will you do this? Once you've determined this maximum number of times, you can use pages.keys(), which gives you an Enumeration (almost like a list) of the keys. Then you can step through the hash table, checking the counter for each key and seeing if it equals the maximum number of accesses.


Here's the code for the sample usage analyzer.

import;		// So we can determine end-of-file
import java.util.Hashtable;		// For storing information on accesses
import;	// Yes, we're generating output
import;	// And reading input
import rebelsky.util.Counter;		// For counting accesses

 * Count a series of web page accesses and report on the most
 * frequently accessed page.  Takes the name of the file
 * containing this information from the command line.
 * @author Samuel A. Rebelsky
 * @version 1.0 of February 1998
public class PageCounter
   * Count those pages.
   * @exception Exception
   *   when any trouble occurs.  Yes, that's right.  This crashes and
   *   burns horribly without any real error checking.
  public static void main(String[] args)
    throws Exception
    // Input to the program.
    SimpleReader file;
    // Output from the program.
    SimpleOutput out = new SimpleOutput();
    // The hash table that stores the pages we've seen.
    Hashtable pages = new Hashtable();
    // The host that requested the page
    String host;
    // The page that was requested
    String page;
    // The number of pages we've processed
    Counter processed = new Counter();

    // Sanity check.  Was the program called correctly?
    if (args.length != 1) {
      out.println("Usage: java PageCounter filename");
      out.println("   or: ji PageCounter filename");

    // Initialize input
    file = new SimpleReader(args[0]);

    // Read lines until end of file
    try {
      while (true) {
        // Read the host and page.
        host = file.readString();
        page = file.readString();
        // Skip anything else on the line.
        // And processs ...
        // If we've already seen the page, just increment its counter.
        if (pages.containsKey(page)) {
          ((Counter) pages.get(page)).increment();
        // If we haven't seen the page, build a new counter with base
        // value 1.
        else {
          pages.put(page, new Counter(1));
        // Note that we've processed another line
      } // while
    } //try
    catch (EOFException e) {
      // Do nothing except exit the loop.
    // Okay, we're done, report anything interesting.
    out.println("We've processed " + processed.value() + " lines.");
    out.println("We've seen " + pages.size() + " different pages.");
    // ...

    // That's it.
  } // main
} // PageCounter

[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Labs] [Assignments] [Examples] [Bailey Docs] [SamR Docs] [Tutorial] [API]

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

Source text last modified Thu Feb 12 21:15:53 1998.

This page generated on Thu Feb 12 21:19:44 1998 by SiteWeaver.

Contact our webmaster at