Algorithms and OOD (CSC 207 2014S) : Assignments

Project: Designing a JSON Library


Phase 1 due: 10:30 p.m., Wednesday, 16 April 2014.

Phase 2 due: 10:30 p.m., Wednesday, 23 April 2014.

Summary: In this project, you will implement a library that other programmers could use to deal with the JSON Data Interchange format.

Purposes: To give you experience working with lists and hash tables. To encourage you to explore problems of parsing. To help you think about writing public utility code (which is not the same as public-utility code). To explore other issues of design.

Collaboration: I encourage you to work in groups of size two to four, although you may work alone. You may discuss this project with anyone, provided you credit such discussions when you submit the assignment.

Submitting your work: Create a GitHub for your project with an appropriate name. Choose a name that might promote (or at least describe) your project. Email your grader the address of that repository. Please title your email “CSC207 2014S Phase 1 (Your Names)” and “CSC207 2014S Phase 2 (Your Names)”. You will also submit a short essay to Prof. Rebelsky (see below). Finally, you will present your API to the class (and anyone else I can get to attend the presentation) on Monday, 28 April 2014.

Warning: So that this assignment is a learning experience for everyone, we may spend class time publicly critiquing your work.

Background

One problem that creators of Web services face is designing interfaces through which clients can easily get complex data. XML was designed as one approach. However, many RESTful Web services instead use the JSON Data Interchange Format. JSON is a simple way to describe not only simple values and objects, but also arrays and compound objects (objects that include other objects).

The full details of JSON are specified in ECMA Standard 404, The JSON Data Interchange Format, available online at http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf. You can also find a more concise definition at http://www.json.org/.

JSON is pretty simple. Section 5 of the standard specifies that there are five kinds of values: objects, arrays, numbers, strings, and symbolic constants. There are three symbolic constants, true, false, and null.

Section 9 of the standard gives a description of JSON strings that you will find describes strings very much like those you have worked with in C and Java: Strings are sequences of characters surrounded by quotation marks. Certain characters within the string are represented by an escape sequence that starts with a backslash. For example, "hello" and "Sam says \"Hello\"" are both strings.

Section 8 of the standard gives a description of JSON numbers that you will find look much like numbers in C or Java: an optional negative sign, a bunch of digits, an optional period and sequence of digits, and an optional exponent. For example, 23, -11.23, 0.1, 4e25, and 1.4e25 are all numbers.

Section 7 of the standard gives a description of JSON arrays. In essence, an array is a sequence of values, surrounded by square braces and separated by commas. For example, [1,2,"hello"] is an array of two integers and a string.

Section 6 of the standard gives a description of JSON objects. JSON objects are comma-separated sequences of key/value pairs, surrounded by curley braces and separated by commas. Keys are always strings. Key/value pairs are separated by colons. For example, an object to represent this class might look like {"Department":"CSC","Number":207,"Prof":{"LName":"Rebelsky","FName":"Sam"}}

And that's about it. As the last example suggests, the definition is recursive. Since objects are values, objects can contain other objects and arrays can contain objects. Similarly, since arrays are values, arrays can contain other arrays and arrays can contain objects.

Assignment

As you might expect, there are a few parsers for JSON written in Java. But evidence suggests that it's never a bad idea to have more - a well designed API or a particularly efficient implementation can convince people to switch. Your goal in this assignment is to create a simple but powerful library for handling JSON.

You will do this project in three phases. In phase 1, you will build the basic structure of the library: The design of the API, the unit tests, the core procedures. In phase 2, you will provide the additional aspects that might turn your class project into something that other people would use: additional procedures, documentation, even a license. In phase 3, you will report on the project to me and to the class.

Phase 1

Design and implement a basic API for JSON. Your API should include at least two basic methods, one to convert a string of JSON code to a Java object that corresponds to the code and one to convert such an object back to JSON.

You may choose what objects are best to represent the various JSON values.

One approach is to use standard Java objects. That is, you could represent JSON numbers as Java Integer objects or Double objects (or BigInteger objects or BigDecimal objects), JSON strings as Java String objects, Java arrays as Vector<Object> objects (or ArrayList<Object> objects or something similar), and JSON objects as Java Hashtable<String,Object> objects.

Another approach is to define your own classes and perhaps a hierarchy. You might define a JSONValue interface, and create classes for each kind of value. For example, a JSONString would implement the JSONValue interface and would wrap an underlying Java string.

There are, of course, many possible approaches. These are just two.

In implementing your parser from string representing JSON to Java objects, you need only support simple JSON strings; no unicode is necessary.

You should also build a reasonable set of unit tests for your interface. In writing the unit tests, you may find it helpful to store some of the JSON in a file. (As you've seen in previous assignments, dealing directly with strings that include backslashes may be cumbersome.)

Phase 2: Extend, Document, and License

At this point, you should have a useful and usable JSON library, one that you might use in Web projects. But let's suppose that you're a bit more entrepreneurial - what if you want to get other people to use your library? (Writing a library that others use is not necessarily an immediate road to fame and fortune, but it makes you feel good.) What do you need? At least three things:

First, you should identify some additional features you can add. For example, you might support ways to manipulate the parsed JSON, you might support additional input or output formats, you might provide particularly good error reporting for incorrect input.

Second, you need to clearly document your code. The Javadoc should give detailed information on your procedures, but most users need a README or equivalent that gives an overview and perhaps describes some use cases. Assume that you are writing a library for next semester's students to use. (You may be.)

Third, you need to decide how to license your code. You can pick one of the open source licenses or you can choose a commercial license. (If you choose a commercial license, you must grant me royalty-free permission to use the work in the class this semester.)

For phase 2, you must extend your project with additional features, clear documentation, and a license.

In addition to putting these materials in the repository, you must send me a short essay describing key aspects of the projects, including the following:

  • A short description of special features you decided to include and why you decided on those features.
  • An explanation of what criteria you used in deciding upon your license.
  • Comments on any other aspect of the project that you think I should know about.

Phase 3: Presentation

On Monday, 28 April 2014, each group will give a five-minute presentation to an audience of potential users. At the end of the presentations, I will ask each member of the audience to select the libraries they would be most likely to adopt. Those who are selected by more audience members are likely to earn a bit of extra credit.

FAQ

What methods should I provide?

That depends a bit on which approach you are taking and what additional features you plan to add. Minimally, you need (1) a procedure that takes JSON as input and produces an object that represents that JSON and (2) a procedure that takes represented JSON as an input and returns a string as output.

If you choose the “use existing Java objects” model, I'd expect something like the following:

  /**
   * Parse a JSON string and return an object that corresponds to the
   * value described in that string.  See README.md for further
   * details.
   */
  public Object parse(String str)
  {
    ...;
  } // parse(String)

  /**
   * Given an object created by parse, generate the JSON string
   * that corresponds to the object.
   *
   * @exception Exception
   *   If the object cannot be converted, e.g., if it does not 
   *   correspond to something created by parse.
   */
  public String toJSONString(Object obj)
  {
    ...
  } // toJSONString(Object)

Those methods might also be static. And perhaps you want to take a BufferedReader as input or produce a BufferedReader as output. Part of the goal is that you think about design.

If you choose to define a general JSONObject interface, I'd expect something like the following.

/**
 * A representation of JSON objects.  See README.md for more details.
 */
public interface JSONObject
{
  /**
   * Convert back to a JSON string.
   */
  public String toJSONString();
} // interface JSONObject

/**
 * Various things related to JSON objects.
 */
public class JSON
{
  /**
   * Parse a string.  See README.md for more details.
   */
  public static JSONObject parse(String)
  {
    ...;
  } // parse(String)

  ...
} // class JSON

Again, there are many approaches that you can take.

Do you have any hints on how we might implement the parser?

You're probably going to need recursion to parse the parts of an object or array of objects. And when you recurse, you'll be advancing through the string or BufferedReader or whatever you are using for the JSON input. So, you'll need a way for the recursive procedure to communicate back not only the result of parsing, but also how far it has advanced. If you use a string, that probably means you need to pass along some sort of “state” object that the recursive call can modify. If you use a BufferedReader, it probably has an implicit state.

If we parse a string and convert back to a string, should we get exactly the same string?

Not necessarily. For example, you might end up presenting the fields of an object in a different order. It's also okay if numbers show up slightly differently (assuming that they still are essentially the same number).

Can you give some simple examples?

Remember that a lot depends on the particular design you've chosen. Here are some possibilities.

  • "Hello" is a five-character string. It might be a String object, it might be a JSONString object, it might be something else you decide is appropriate. Note that for testing, you would probably write parse("\"Hello\"").
  • 24.2 is a decimal number. It might be a Number or a Double or a BigDecimal or a JSONNumber or a JSONReal or something else you decide is appropriate.
  • {"uid":"rebelsky","id":32154} is an object with two fields, one named uid and one named id. The uid field refers to a string. The id field refers to some kind of number (preferably some kind of integer because we want ids represented exactly). If you choose to use dictionaries (probably hash tables or hash maps) to represent objects, the dictionary will contain the two obvious key/value pairs.

I'm using str.split(":") to find key/value pairs. What do you think of that approach?

I'd need to see more details of your code, but it strikes me as dangerous. JSON is much more than a sequence of key/value pairs. I really do recommend that you iterate through the characters in the input.

Can you walk me through an extended parse?

Let's try a short one. Consider the following string (including indices).

Index:  012345678901234567890123
String: [212,"a",{"id":32},null]
  • Position 0: We see an open brace. That means we need to prepare to build an array. We recurse starting at position 1.
  • Position 1: We see a 2. That represents a digit. We parse a number, starting at digit 2. That parse returns the number 212 and advances the parse state to position 4.
  • We add that returned object to our array
  • Position 4: We see a comma. Conveniently, we are in a structure that permits a comma at this point (an array or object). We skip over the comma and recurse starting at position 5.
  • Position 5: We see a double-quotation mark. That signals a string. We read to the end of the string and return the object that corresponds to the one-character string. We also advance the state to position 8.
  • We add that returned object to our array
  • Position 8: We see a comma. Conveniently, we are in a structure that permits a comma at this point (an array or object). We skip over the comma and recurse starting at position 9.
  • Position 9: We see an open brace. That signals the start of an object. We skip over that brace and recurse starting at position 10.
    • Position 10: We see a double-quotation mark. That signals a string. We read to the end of the string and return the object that corresponds to the string. We are now at position 14.
    • We got a string, which we expected, since objects are sequences of key/value pairs. We remember the key.
    • Position 14: We skip over the expected colon.
    • We recurse starting at position 15 to get the value.
    • Position 15: We see a digit. We read the number and advance to position 17.
    • We now have a key/value pair, so we store it in the represented object.
    • Position 17: We see a right brace. That ends the object. We advance to position 18 and return the object.
  • We add the returned object to the array.
  • Position 18: Another comma. See above for details.
  • We recurse starting at position 19.
  • Position 19: We see a bare character. That should signal a special identifier. We read the identifier, advance to position 23, and return the identifer.
  • We add the returned object to the array.
  • Position 23: We see a right brace. We end the array, advance to position 24, and return the array.

Can there be extraneous spaces in the input?

You can assume that there is no extraneous whitespace in the input. (If you want to support whitespace, that could be a feature.)

I don't understand how parse can return an interface value, such as JSONValue.

It's inheritance/polymorphism in action. If we say that a method returns an interface, it can return any object that meets that interface.

In some ways, this feels like our parenthesis matching assignment. Should we take advantage of ideas from that assignment?

The parenthesis matching ideas could be used in error checking. However, in doing parenthesis matching, you were managing an explicit stack. If you use recursion, you get an implicit stack. You can decide which is easier.

Given that we can't predict the order in which we get key/value pairs back from the hash table, how do we run tests?

Good question. Here are some options.

  • Option 1: You test the parse method by actually querying the underlying object, rather than converting it back into a string.
  • Option 2: Your toString method sorts the key/value pairs. (You can grab the keys, sort them, and then grab the key/value pair for each key.)
  • Option 3: You compare the result of the toString method to all possible permutations of the fields.
  • Option 4: You build the same object by hand and compare th results of toString.

You can probably come up with others.

Citations

This assignment is roughly based on an assignment in dealing with Ushahidi's JSON output that I gave in a previous semester.

The requirements to decide upon a license and work on making a public piece of software was inspired by conversations with members of the POSSE (Professors' Open Source Summer Experience) community.

Copyright (c) 2013-14 Samuel A. Rebelsky.

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.