Section 1 Notes
CSCI E-259: XML with Java
by Blake Meike, former teaching fellow




The Homework: Some Context.

It helps me a bit, when undertaking a task, to get a "big picture". If you are like me, perhaps these three illustrations will help you understand the homework.

First, here's a picture of what happens when you run "java cscie259.project1.mf.Tester file.xml 1".

The source XML file and the output XML file should be nearly identical. At first this may seem pointless (there are other ways that are so much easier to copy the source to output!). On the other hand, if you think about it, this really is a perfect test of a parser. Even though it is moderately complex and difficult to analyze, if your Parser/Serializer system exactly reproduces the input at the output, you have very good evidence that it is working correctly.

The hoop through which your program must jump is the defined by the java Interface "ContentHandler". That interface is (as shown in the picture) the link between the Parser and the XMLSerializer. It is also one of the parts of the code that you may not modify. The first portion of the assignement is exactly building code that will pass the information in an XML document across that interface.

This is the large-scale structure of your code for questions 11, 13, and perhaps 17.

For question 15, you must do something substantially different: build a DOM. The test for question 15 is run with the command "java cscie259.project1.mf.Tester file.xml 2". Again the input should be nearly identical to the output and again (although it may seem silly) it is a perfect way to test that no information is lost or added by your system.

For question 15, the large-scale structure of your code is like this:

In addition to the obviously increased complexity of this picture, there is an important structural difference which is the essence of the difference between the SAX interface and the DOM data structure. When you use Tester ... 2, instead of passing the XML serializer to your parser, you pass it the DOMBuilder. Since the DOMBuilder implements the same interface (ContentHandler) this should work perfectly. Unlike the previous test, though, when your parser finishes parsing the entire document, there is still no output. Instead, a data structure, the DOM, has been constructed in memory. That structure could hang around for minutes, hours or even weeks (as long as you program continues running) after the last SAX event is long gone.

But then Tester ... 2 then runs another piece of code that you have modified, the DOMWalker. This code visits every single node in the DOM, in order, and produces SAX events. It shouldn't be surprising that, since it was possible to create the DOM in response to SAX events, it is possible to generate the identical set of SAX events from the information in the tree. If this were not possible, information would have been lost (or added). Of course, those SAX events can be sent to the XMLSerializer and if all goes well, out comes the original XML.

Homework: XML Tool Implementations

Here's table that may help organize your thoughts about the homework:

Assignment
Implementation
Parser
ContentHandler
(SAX Interface)
DOM Objects
My First XML Parser You and CSCI E-259 staff used in questions
11, 13, 15 & 17

extended in
11, 13, & 17

used in questions
11, 13, 15 & 17

extended in 13

used in questions
15 (& maybe 17)

extended in 15

Attribute Converter Xerces used in question 22 used in question 22 later...

In this homework you use two distinct implementations of three different interfaces. The section above discusses the ways in which the various interfaces are used. This table addresses their implementation.

In all of the My First XML Parser questions you are using an implementation of the various interfaces written by CSCI E-259 staff which you extend. As of questions 22, however, you will never use those parser, content handler or DOM object implementations again. Implementing them is a pedagogical device to help you understand how XML parsing works. From question 22, until the end of the class, you will use a standard implementation.

Under real circumstances it is very unlikely that you would change the implementation of an XML parser or the definitions or implementations of SAX or DOM interfaces. (One of the chief points of using XML is that these things already exist in fast, reliable and standardized implementations!) In AttributeConverter (now that you understand how the parsing, SAX and DOM interfaces work internally) you move on to using a standard parser, Xerces, to accomplish a specific task. For this part you will want to refer to the JAXP API documentation (link on Resources page of course website). You'll be using SAX interfaces again, but this time you'll be using the ones in the org.xml.sax package, which differ a bit from those used in My First XML Parser. Also, to instantiate a SAXParser, you'll need to have a look at the SAXParserFactory and SAXParser classes in the javax.xml.parsers package.