Thursday, September 20, 2007

Uniform XML Vocabularies on the Programmable Web

The question which I've been thinking about recently is what difference do uniform XML vocabularies (formats), as opposed to custom ones, make on the programmable web.

Specifically, while looking at Atom Syndication Format and Atom Publishing Protocol, I'm trying to answer the question :

What difference does it make to the consumer that the given collection resource's state is represented as an Atom feed with members being represented as this feed's entries as opposed to using a custom XML format to represent the state.

I think I start understanding better why people advocate using an Atom format to capture typical collection/member representations. One of the most cited reasons is that a lot of client tools understand the Atom format. This is an important enough reason for choosing a format which is widely supported. For ex, GData-based client tools can be both, say, Google Calendar clients and simple ATOM interpreters at the same time.

However, what I'm still not certain about, is what is the real audience of such uniform formats is.

Lets take one step back first. Many people asked : why do you use XML as opposed to that custom binary format ? One of the answers was that there were a lot of XML tools available on the market already. When writing client applications, you don't need to pick up a new parser every time, you just use the same XMLParser, the same favourite XML API, and get to the data of interest, feed them into JAXB generated classes or apply XPath expressions, and do something with it. It makes sense for all types of client applications, be they UI-based tools interfacing the humans or low-level applications passing the data along the chain for some further processing.

Now, if we have two different XML formats representing the same collection resource, then it's still XML. It's trivial to get to the data of interest and find the links to traverse the collection further, using XPath, WADL's cool feature to find the data of interest, etc. Using one format as opposed to the other one does not help much to understand associated semantics though

Lets say I write an application based on the uniform format like Atom, say, I write a Google Calendar application. If I get this application to consume an Atom collection representing the book library then the only thing I can do with this collection is to show it is to the human user, even though the Atom book collection may contains hints as to how to deal with a given book entry.

This makes me think that the uniformity of XML vocabularies in itself does not matter much. What matter more is the associated processing model.

For example, lets take SOAP. Understanding SOAP Envelope and Body does not help the application to understand semantics but it can let it become a SOAP node.

Atom Syndication Format describes how a feed may contain entries, but APP completes the picture by describing the processing model. This processing model is a sound RESTful model. But one can deal with collection resources RESTfully while defining custom XML formats too.

I think it all mostly matters to generic tools which can let a user to browse through the Atom collection, like a browser can let a user to browse through the collection resource whose state is represented as an XHTML page. Browser understands html links, an ATOM-enabled tool can understand atom:link entries and let user to browse through any ATOM-wrapped collection.

As I said, I'm not quite sure how other types of clients can benefit. If they need to handle books then dealing with books represented as atom entries won't help them to magically start dealing with fruits also represented as atom entries

I may've got it wrong but I'm still looking into it and I'll be interested to learn more about how uniform XML vocabularies can be applied. I've read this entry on the Microsoft Astoria blog recently, it was interesting to see the reasoning which led Pablo Castro to deciding in favour of supporting Atom.

No comments: