A Web Use Analysis System for SiteWeaver
Request to Grinnell College Grant Board for Summer 1998
Samuel A. Rebelsky
Mathematics and Computer Science

Background

The growth of the World-Wide Web has led to the development of a number of commercial and non-commercial tools for authoring webs of information. Unfortunately, most tools support page-level authoring, rather than site-level authoring, and emphasize physical design, rather than logical design. For teachers, a drawback of most tools is that they provide little support for course-specific tasks, such as generating syllabi, and that they do not provide sufficient support for analyzing student use of course webs.

To help meet these needs, I am developing SiteWeaver, a tool suite that supports site-level authoring. It includes tools that support a number of common authoring tasks and situations, particularly for course webs. Among its features (current and planned) are:

This summer, I will be working on tools to support more sophisticated analysis of web usage. I am requesting funding for three summer research students to work on the project.

I. Narrative

Hypertext and hypermedia are systems for organizing information in which the information is segmented into individual nodes which are connected by links which indicate relationships. Each node may be connected to a number of other nodes. While this is not a new way of organizing information, computers have simplified the design and presentation of hypertext systems.

The World-Wide Web was originally designed as a general hypertext system which made it simple for average information authors to build hypertext systems. HTML, the hypertext markup language, is a human-readable language used to describe the links and other components of nodes in a hypertext.

Although HTML is easy to learn and use, many potential authors felt daunted by the prospect of learning a "language" for creating hypertexts. In addition, many of the steps in writing HTML are repetitive and error-prone. As the web grew, these deficiencies led developers to create web-authoring tools.

I.a. Deficiencies of authoring tools

While the original specification of HTML emphasized logical design -- in which one describes the roles of pieces of text -- over physical design -- in which one describes the appearance of pieces of text, most authoring tools emphasize physical authoring. Emphasis on physical authoring often leads to texts that may be difficult to navigate, and which may be unusable on some platforms or for some users. Given that few information authors understand all the requirements of good page design, it would be preferable if tools separated from writing.

The separation of information from its presentation also pertains to the problem of retargeting. Often, it becomes necessary to take the same information and present it in different ways. For example, a designer might determine a better way to lay out information, or an author might choose to present information in one way for onscreen viewing and another for printing.

It must also be acknowledged that hypertext authoring must consider more than individual nodes; it must also include the design of a site as a whole, as the roles and relationships of the nodes. Typical web-authoring tools provide little, if any, support for such site-level authoring tasks.

Because of these deficiencies, new and better site-level authoring tools are needed. Some support is provided by new languages and add-ons, such as cascading style sheets and XML, the extended markup language. However, tools can provide more comprehensive facilities that are not possible or appropriate with these extensions.

I.b. Hypertext and education

The growth of the web has also led a number of faculty to put course resources online, building small course webs. As a computer scientist and educator, I have been building course webs and experimenting with the effects of course webs and the design of course webs since 1994.

A particularly important need for many faculty is the ability to analyze student use of course webs (primarily anonymously, but also providing student names or codes if they agree). To most, this means more than simply checking on what pages students visit. It also involves understanding the paths students take through materials (e.g., given a question that students need to answer, in what order do they use examples, discussions, tutorials, and such), and the amount of time the spend at each place. For many educational hypertext systems, especially those based on Apple's HyperCard this information is readily available and there are tools for analyzing the information, such as MacSQEAL. For typical HTML-based webs, this information is nearly impossible to determine, and there are no readily-available tools that support the types of analyses that faculty members need to do.

I.c. SiteWeaver

My observations about the limitations of authoring tools, and my need for tools that helped with course-specific tasks, led me to develop a number of tools, both small and large, to help with the creation and analysis of course webs. I am gradualing unifying the tools and building additional tools to create a tool suite that I call SiteWeaver. Because SiteWeaver contains many partially developed tools, it is not yet available for general use. However, I have already received a number of requests to provide a more generally usable version and intend to spend some time working on polishing existing tools. One student is working with me this term to do some of this polishing as well as some planning for the coming semesters.

I.d. Analyzing Web Use

This summer, I will be concentrating my effort on developing tools for analyzing usage of course webs. While some web analysis tools are currently available, they do not support the types of analyses mentioned above. There are four aspects to developing a comprehensive web analysis system: a logging subsystem, an analysis subsystem, an identification subsystem and a translation subsystem.

Current analysis tools provide only limited kinds of analyses because the typical web server does not log sufficient information. In particular, the server does not log the user for a page nor does it report when the user moves to a new page. Because of this, an important aspect of developing an analysis tool is finding ways to extend web pages (or scripts that generate web pages) so that they provide sufficient information to do more comprehensive analyses.

A logging subsystem will provide this additional information. There are a number of techniques for sending this data including (1) using cookies (pieces of information stored on the local computer and automatically sent when a page is requested); (2) relying on queries as part of the URL; (3) adding Java applets to the pages; and (4) more ad-hoc methods for estimating information, such as running a finger on the source machine. Some testing will be necessary to determine which is best (in terms of categories like difficulty to implement, level of information available, and overhead in usage). In addition, since it is possible to turn off some of these features in the browser, it may be necessary to use some combination.

Once the information is generated, it is necessary to have a subsystem that reads the log files and permits appropriate analysis. The analysis subsystem permits gradual exploration of the data, categorization of pages, and visualization of particular parts of the data set. Through the use of simulated log files, it will be possible to develop this tool before the logging subsystem is complete.

To conserve data and to ensure that student use is logged, it will be necessary to extend pages so that they "check" to ensure that a student identification is included (if students agree to being logged and the instructor deems it important to do so). In addition, it may be reasonable to require that only hits from within a certain domain (e.g., grin.edu) be logged. The identification subsystem is responsible for these tasks.

Since the logging and identification subsystems will most likely require that pages include special logging and identification components, it will be necessary to develop a translation subsystem that inserts those components into the page.

It will also be necessary to investigate a number of technical details, such as methods for ensuring that only one process writes to a log file at the same time.

II. Scholarly products

II.a. Prior

My qualitative research on the effects of course webs on learning was presented at a number of conferences and published in the December 1996 issue of the Journal of Universal Computer Science.

CourseWeaver, a precursor to SiteWeaver, was a HyperCard stack that supported many of the tasks pertaining to course webs, including the automation of schedule creation (given component parts), the numbering of course components, and the retargeting of course materials for both online and printed viewing. A paper on CourseWeaver was presented at the 1997 World Conference on Multimedia and Hypermedia in Education and appeared in the proceedings of that conference.

II.b. Current/Recent

In the Fall of 1997, I conducted a preliminary study of student in-class web usage of large course webs that, when printed, totaled over 200 pages each. A paper reporting on those results has been accepted to the 1998 EdMedia World Conference on Multimedia and Hypermedia in education which will be held in June in Freiburg, Germany. The study helped highlight some of the needs for more sophisticated tools for analyzing web usage.

In the Fall of 1997, I began analyzing current site-level authoring tools with an eye towards their role in development of course webs. A seminar on site-level authoring has been accepted to the 1998 SIGCSE conference on Computer Science Education which will be held in late February in Atlanta. The acceptance rate for seminars at SIGCSE is approximately 33%. I will also be presenting a poster and running a tutorial on site-level authoring at EdMedia 1998.

Current course webs that use some of my tools can be viewed at http://www.math.grin.edu/~rebelsky/Courses/CS103/98S/index.html , http://www.math.grin.edu/~rebelsky/Courses/CS152/98S/index.html , and http://www.math.grin.edu/~rebelsky/Courses/CS302/98S/index.html .

II.c. Planned

The analysis tool (or tools) we develop are one of the core products from this phase of the project. The author of MacSQEAL (mentioned earlier) has expressed interest in using such a tool, and it is likely that others will to. The manual for this tool will be added to the departmental technical report series.

It is likely that the summer students and I will submit one or more papers on the analysis system we develop to appropriate conferences (EdMedia and WebNet are likely candidates) and, possibly, to one of the journals on technology in education which are published by the Association for Advancement of Computing in Education.

I have also begun planning a special issue of the Journal of Multimedia Tools and Applications devoted to site-level authoring. While the analysis tool may be appropriate for that issue, it is likely that a more general article on SiteWeaver will be included in that issue.

III. Funds requested

Funding is requested for three summer research students to work on the development of tools to analyze usage of course webs. The development of a sophisticated analysis tool requires four main components, as well as some related development for underlying support of the components. The components are discussed in section I.d. above. It is likely that one student will work on the logging subsystem, one on the analysis subsystem, and one on the identification and translation subsystems, which are likely to be smaller than the other two. Students will share the responsibility of developing the related utilities, such as file locking methods.

Each subsystem is of appropriate length for completion by a summer research student. None is so complicated that it cannot be completed by a diligent student within one summer. At the same time, each subsystem is interesting enough that it will require significant effort. If a student moves particularly quickly, there will be opportunities to begin summary papers, write user documentation, and develop ancillary tools.

III.a. Equipment

The HP workstations in the MathLAN, and the equipment for Glimmer (the Grinnell laboratory for interactive multimedia experimentation and research) should provide sufficient equipment for this project.

IV. Notes

Given the relatively small number of women in computer science, I am happy to report that two women have already approached me about working on this project, even though I have yet to give my public presentation inviting students to join.

To further support this project, I have sent a separate proposal to the Noyce Program to provide additional software and other resources. The outcome of that proposal has not yet been determined.