Skip to main content

Assignment 4: Processing files and data

Due
Tuesday, 26 September 2017 by 10:30pm
Summary
For this assignment, you will put your data science skills to work by processing data and files.
Collaboration
You must work with your assigned partner(s) on this assignment. You may discuss this assignment with anyone, provided you credit such discussions when you submit the assignment.
Submitting
Email your answers to csc151-01-grader@grinnell.edu. The subject of your email should be [CSC151 01] Assignment 4 and should contain your answers to all parts of the assignment. Scheme code should be in the body of the message, not in an attachment.
Warning
So that this assignment is a learning experience for everyone, we may spend class time publicly critiquing your work.

Problem 1: File Summary

Topics: Files, Lists, Strings

Most word processors include a “word-count” feature that reports various statistics about the current document. For example, Libre Office Writer’s word-count tool reports the number of words, the number of characters including spaces, and the number of characters without spaces.

Write a procedure (file-summary filename) that takes a path to a text file and returns a string containing a summary of the file. Your procedure should report

  1. the number of words in the file,
  2. the number of characters including whitespace characters,
  3. the number of characters excluding whitespace characters, and
  4. the number of lines in the file.

For example, if you have the us-zip-codes.txt file saved on your Desktop, then the following is an example of what the output should be.

> (file-summary "/home/username/Desktop/us-zip-codes.txt")
"Summary of /home/username/Desktop/us-zip-codes.txt:\nNumber of words: 89\nNumber of characters including whitespace: 557\nNumber of characters excluding whitespace: 471\nNumber of lines: 21"
> (display (file-summary "/home/username/Desktop/us-zip-codes.txt"))
Summary of /home/username/Desktop/us-zip-codes.txt:
Number of words: 89
Number of characters including whitespace: 557
Number of characters excluding whitespace: 471
Number of lines: 21

Hint: Use the char-whitespace? procedure to check if a character is a whitespace character.

Problem 2: Searching for words

Topics: Files, Lists, Strings

Write a procedure (file-find str filename) that finds all the occurrences of the string str in the file filename. The return value of file-find should be a table whose entries correspond to lines that contain the given string. Each entry of the table should contain the line number in the file and the actual string of the corresponding line. For example,

> (define file "/home/username/Desktop/us-zip-codes.txt")
> (file-find "Zip" file)
'((1 "US Zip Codes ")
  (4 "Zip codes and other information of cities in the US.")
  (8 "* 0: Zip code"))
> (file-find "Cleanup" file)
'((17 "Cleanup by Samuel A. Rebelsky on 2017-09-11: Removed quotation marks from")
  (20 "Cleanup by Samuel A. Rebelsky on 2017-09-12: \"Federated States of Micro\""))
> (file-find "curmudgeon" file)
'()

Hint: There is a procedure built-in to Scheme called string-contains? that may be helpful for this problem.

Problem 3: Who’s teaching that class again?

Topics: Lists, Strings

Write a procedure (instructors-of schedule classname) that takes a list schedule of teaching responsibilities for faculty and a string classname and returns a list of the last names of the instructors currently teaching that class. The format of the teaching responsibilites parameter is the same format as the following table.

(define teaching-fall-2017
  (list
    (list "Curtsinger" 
          "Charlie" 
          "Professor" 
          (list "TUT-100" "CSC-211"))
    (list "Klinge" 
          "Titus" 
          "Professor" 
          (list "CSC-151" "CSC-341" "CSC-395"))
    (list "Osera" 
          "Peter-Michael" 
          "Professor" 
          (list "TUT-100" "CSC-207" "MAT-208"))
    (list "Rebelsky" 
          "Samuel" 
          "Campus Curmudgeon" 
          (list "CSC-151" "CSC-301" "CSC-321" "CSC-322"))
    (list "Vostinar" 
          "Anya" 
          "Professor" 
          (list "CSC-207" "CSC-301"))
    (list "Weinman" 
          "Jerod" 
          "Department Chair" 
          (list "CSC-161"))))

Here are a few example executions of the procedure.

> (instructors-of teaching-fall-2017 "CSC-151")
'("Klinge" "Rebelsky")

> (instructors-of teaching-fall-2017 "TUT-100")
'("Curtsinger" "Osera")

Problem 4: Testing the instructors-of procedure

Topics: Testing

Write a RackUnit test suite for instructors-of. You may assume that the inputs to instructors-of have the proper types and are formatted correctly, but be sure to consider schedules different than the example table given.

We are likely to run your tests on some non-working versions of instructors-of on a variety of inputs. Your tests should catch any reasonable errors.

Evaluation

We will primarily evaluate your work on correctness (does your code compute what it’s supposed to and are your procedure descriptions accurate); clarity (is it easy to tell what your code does and how it achieves its results; is your writing clear and free of jargon); and concision (have you kept your work short and clean, rather than long and rambly).