March 30, 2003

RSS and the RESTian Dilemma

There we where thinking we had some agreement on the RSS format and had achieved some convergence and stability at the 2.0 level, then this happens!

So, what have Don and Sam created? Is it RSS v4.0, or RSS 2.0+xhtml:body, or what? - and how exactly do we work out what it is and how to process it?

The problem this change demonstrates is what I call the "RESTian Dilemma" - we have no way to link the data we get back with the _exact_ details of the format of that data. My computer does not know that Don's RSS 2.0 feed uses xhtml:body while my RSS 2.0 feed currently uses content:encoded.
From the HTTP content type we know it is XML in RSS format plus the document element in the XML data tells us it is RSS v2.0, but nothing tells us we need to look for the full item entries in xhtml:body elements rather than content:encoded elements.
Sure, we can work in a degraded state (ie. back to RSS v0.91 levels!), but that's not the point here.

I completely understand that both xhtml:body and content:encoded fragments are valid extensions to the base RSS v2.0 core format, and this is a totally appropriate use of XML's (and RSS's) extensibility through namespaces facilities.
However the question still remains of how do we version XML DOCUMENT FORMATS as a whole, rather than just individual schemas / namespaces?
I think that when we are talking about standardized payload document formats (like RSS needs to be, IMHO), we need to start using some form of "meta-schemas" that define an exact _fixed_ and _versioned_ format for the contents. Obviously XML Schema language is more than able to handle this, but we need it here at the "meta-schema" level by defining the format for this XML data is ( rss-v2.0 + dc + xhtml:body ) rather than ( rss-v2.0 + content:encoded ) or ( rss-v0.91 )

I am not necessarily saying we should prohibit further extensibility in RSS feeds, just that for stable / standardized (and fully processable) payload formats we need to consider VERSIONED EXTENSIBILITY.

This is the next level of challenge for versioning with XML I believe, and was exactly the point I was making in my posts last week about needing to standardize the RSS format.

The impact of this change is obvious - Greg Reinacker has to update NewsGator to support grabbing the full item from the xhtml:body elements as well as the current content:encoded elements, then he needs to push an updated version out to all his customers (including me) so we can read Don's full blog entries while offline.
This is not in and or itself a problem of course, but the fact that we still don't have a totally stable and agreed definition of the RSS format *IS*. A one-off change we can probably live with, but the speed that Sam and Don iterate at ? ..... well, you know the answer already don't you?

RSS is currently demonstrating the "Doomsday Scenario" which will completely rule it out for use in most "mainstream" products IMHO - any experienced Product Architect (and I am one, so I do know!) will be looking for stability in a specification that is TWO orders of magnitude greater than what we currently have in RSS - ie. change frequencies of years or months, rather than weeks or days.

So, lets have some dialog on how we (the industry) move forward on this one.....

Entry categories: RSS Standards XML
Posted by Jorgen Thelin at March 30, 2003 03:00 AM - [PermaLink]
I can't resist. ;-) Your 2.0 feed uses both link and guid. Why is that? Your 2.0 feed provides both a description and content:encoded. Why is that? Posted by: Sam Ruby on March 30, 2003 04:01 AM
Good questions, Sam. Is it a test? ;-) Reason (1) - Because it is all perfectly valid RSS 2.0 according to the RSS Validator Reason (2) - The link is the URL to the story on my web site, while the guid is just that - an unique ID - an opaque string value according to the RSS 2.0 spec. Completely different things. The fact that they both have the same value is just a convenient coincidence! Reason (3) - description is plain text, while content:encoded preserves the hyperlinked version of the post (which work very well in NewsGator, BTW) Reason (4) - Users of the feed get the _choice_ of which bits they use - plain text descriptions or encoded rich-text. So, how'd I do Sam? Posted by: Jorgen Thelin on March 30, 2003 04:17 AM
Wow, where do I start? As you note, we're just working in the margins of RSS 2.0. What Sam, Greg, and I are doing is no different than having three people defining a new SOAP header and then supporting it in their software. Anyone thinking of baking RSS into a "real" product needs to build flexibility/extensibility as a core feature. Many of us can rev our plumbing rapidly because we built on such a feature. This is goodness. One of the reasons I jumped into the XML world back in 1998 was because XML enabled a handful of consenting adults to quickly get agreement and build apps. The notion that every start tag in the world needs to be blessed by a standards body is folly IMO. The last thing we need is yet another working group to sort this out. The average bake time at the W3C is about 3 years with spotty results at best. RSS has flourished because a small band of implementers can iterate quickly. Nothing would kill this movement faster than bringing two professional standards wonks from every vendor together (see XML Schema for a great example of what that buys you). So far, Darwin has been doing a good job of sorting things out. My advice is to chill out and let things run their course. Posted by: Don Box on March 30, 2003 05:03 AM
Jorgen - you almost passed the test, all except for guid. guid isPermaLink="true" isn't opaque, it's what link means to most of us, but not to the people who like to point link to the third-party page they are linking to, rather than to their own weblog post. So if you are writing an RSS reader, and you want to have a link to the weblog post, you should first choose guid isPermaLink="true" if it exists, and link if it doesn't, just like you should choose xhtml:body or content:encoded over description if you want to display as much of the item as possible. Don - Sam can claim that it's a no-harm experiment, since anyone who wants full content but doesn't want to rewrite their reader (if they can) can either switch to his RSS 1.0 feed with a reader that supports content:encoded, or to his 0.91 feed with pretty much any reader, but not only is your 0.91 feed not full-items, but until your newest entry, with what virtually every reader will interpret as an actual CDATA opening tag, falls out of the feed, it'll be rather brutally destructive. Got any plans to cater to the rest of us? Posted by: Phil Ringnalda on March 30, 2003 06:00 AM
Ah, the joys of a MAY in a spec! I have set guid isPermaLink="false" to unambiguously convey my intent, and be consistent with the way I use links in the 2.0 feed. Posted by: Jorgen Thelin on March 30, 2003 11:43 AM
I am not sure I completely understand what you are saying about my 0.91 feed, Phil, but I have fixed the problem I think you are describing - the channel description contained encoded HTML in a CDATA block. Let me know it there is something else. I don't get many users of the 0.91 feed (see some stats), so I confess I haven't spent too much time on it. Posted by: Jorgen Thelin on March 30, 2003 11:55 AM
Don't worry, Don - I am chilled! And eagerly awaiting a "real RSS standard" too ;-) However, you and I both know that what is OK between "consenting adults" (and super-hero grade power-users at that!) is one thing, but Product Engineering is a whole different level of complexity and challenge, and has rather different needs and requirements. RSS can never go truly "mainstream" until we get there. I think you are ducking the real issue here about "whole document versioning", and frankly I don't blame you because it is just darned hard! I am travelling quite a bit in the next couple of weeks, so I will have lots of time on planes to think on these subjects some more, and hopefully propose some answers rather than just questions. Don't get me wrong, I don't disagree with the changes you have made, but I feel many _format versioning_ best practices have got trampled in the process. I also think you underestimate the effect you (and Sam) have in the blog community! It is slightly more significant than just "three people defining a new SOAP header [for their own use]". Did anyone consider the impact on the installed user base, for example? Classic Microsoft - force user to upgrade to a new version to continue service ;-) Thinking out loud, and experimenting through blogs I have absolutely no problem with, but defining (and documenting) format standards through blogs? - Personally, I am not so comfortable with that! Do you think that would have worked with your "Proposed Infoset Addendum to SOAP Messages with Attachments" ideas, I wonder? - After all, that just defines some extensions to SOAP as self-contained elements and attributes in independent namespaces, doesn't it? Of course, as you point out - going the W3C route with RSS would be like using a nuclear bomb to crack a nut! There must be some middle ground though - like an IETF Internet Draft / Standard perhaps? Posted by: Jorgen Thelin on March 30, 2003 01:58 PM