Atom (formerly known as Pie, Echo and Necho) has been created as a cleaner and better-defined alternative to RSS 2.0, which is underspecified. (For example, the RSS 2.0 spec neither contains nor references the definition of “entity-encoded HTML”.) But is a reformulated version of RSS 2.0 really what we need? Or would a simple title+permalink feed with the aggregator fetching the actual pages be better?
Netscape’s original RSS 0.9 did not carry actual Web site content. It carried title+permalink pairs. It solved the problem of following site updates. The users got titles of the latest items on their sites of interest linked to the Web pages carrying the actual content.
Then came the description
element. It was supposed to
be metadata: a short description of the item. However, entering
metadata is tedious and boring. What you are more likely to get is
the first words of the item extracted by software.
I am not sure how exactly people started to include full content
in description
. However, my educated guess is this: A
popular Web based aggregator was sloppily programmed and simply
inserted the character data content of the description
element in its tag soup output. Someone discovered that this could be
used to inject arbitrary tag soup in the aggregator output and then
someone figured that you might as well include the full content to
save readers the trouble of following the link to the page that
“description” was about.
Then came three-pane “desktop aggregators” that were
designed for reading full-content-in-description
RSS
feeds. Back when I started using NetNewsWire Lite, Macsanomat only
had an RSS 0.9 feed. I quickly became annoyed with not having the
content in the third pane, so I added a tag soup over RSS 0.92 feed
to Macsanomat. In April 2003, I even told a journalist, who contacted
the Macsanomat team about RSS, that being able to read the content of
many sites in a unified (three-pane) UI was really nice, even if you
only got the raw content and missed the sites’ design.
The problem is that when the full content was put on the feed, it was done in order to get through a loophole instead of analyzing the use case and coming up with a solution. Now that there are full-content feeds and apps for reading them, many people take it for granted that you have to stick all your content in a single HTTP-accessible resource and that reading such HTTP-accessible resources with something called and “aggregator” is the greatest thing compared to manually going from site to site.
If you are designing a format for a three-pane “desktop aggregator” instead of trying to push your content through a Web aggregator loophole, all your content does not have to be in a single HTTP object. Instead, the aggregator could actually make multiple HTTP requests per site! You could deliver title+permalink pairs in one HTTP object like in the RSS 0.9 days and have the aggregator pull the content resources indentified by the permalinks in separate HTTP requests. It would even save bandwidth compared to delivering each item over the network many times.
Side note: I think mode="escaped"
is the
Appendix C of
Atom—the ugly tag soup integration thing left in, because
current systems are expected to be too broken to go XML all the way.
One of the benefits of full-content RSS feeds is that you can get a chromeless version of the site. That is, you can get the main content without any templates wrapped around it. This may be useful if you want to analyze the entry content and ignore all the site navigation, blogrolls etc.
(X)HTML lacks markup for distinguishing between the site chrome
and the main content. In fact, the markup structures provided by
(X)HTML are mostly intented for describing the main content. There is
no markup for common site chrome structures. Hence, the site chrome
usually consists of meaningless div
s and allegedly
misused table
s.
The problem could be solved by agreeing to mark up the
content/chrome boundary. This would require author cooperation, but
getting Atom feeds requires author cooperation as well. Because many
people still want text/html instead of application/xhtml+xml, a
special class name for div
would be required instead of
a special namespaced element. Magic class names are ugly but are
still cleaner than using special namespaced elements in XHTML served
as text/html.
(Note: Agreeing to use a special class
attribute is a
small requirement compared to formulating an entire XHTML page to
serve as a feed, which
probably would be too much to ask.)
Some design-oriented people complain that if you read the raw unstyled content in an aggregator, you lose the nice design of the HTML+CSS site. At the same time, aggregators are embedding browsers engines in order to gain more advanced rendering abilities. (For example, FeedReader embeds Trident, and NetNewsWire is going to embed Apple’s version of KHTML.)
Instead of making aggregators more and more browser-like, would it not make more sense to add the ability to read feeds to browsers? The browser can pull an old title+permalink feed and load the pages referenced by the permalinks in the normal content area. The reader can track new entries to sites conveniently but seen the sites in all their design glory. For this purpose, even RSS 0.9 will do.
The following is not a mock-up. It is an actual screenshot.
I think I would rather have the ability to follow feeds in Camino than have a browser engine in NetNewsWire Lite.
This is not a new idea, but blogger-driven, feeds, RSS, 2003, Mozilla and broadband is a nicer combination than Microsoft-driven, push, CDF, 1997, IE and dial-up.
Joe Gregorio’s AtomAPI draft shows this example of creating a new entry:
POST /reilly HTTP/1.1 Content-Type: application/x.atom+xml <?xml version="1.0" encoding='iso-8859-1'?> <entry xmlns="http://example.com/newformat#" > <title>My First Entry</title> <subtitle>In which a newbie learns to blog...</subtitle> <summary>A very boring entry...</summary> <author> <name>Bob B. Bobbington</name> <homepage>http://bob.name/</homepage> <weblog>http://bob.blog/</weblog> </author> <issued>2003-02-05T12:29:29</issued> <content type="application/xhtml+xml" xml:lang="en-us"> <p xmlns="...">Hello, <em>weblog</em> world! 2 < 4!</p> </content> </entry>
The format aims to be general so that you can use formats other than XHTML in the payload. However, designing for such generality may be unnecessary if it is accepted that the payload will be XHTML anyway. If the payload is XHTML anyway, then the Atom envelope is superfluous. Consider the following:
POST /reilly HTTP/1.1 Content-Type: application/xhtml+xml <?xml version="1.0"?> <html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml" ><head ><title>My First Entry</title ></head ><body ><p>Hello, <em>weblog</em> world! 2 < 4!</p ></body ></html>
Things to observe
Most people are too lazy to write subtitles and summaries.
In order to avoid malicious posting, there needs to be an authentication method. Atom doesn’t need to specify how the lower protocol levels perform authentication. Both HTTP basic authentication and TLS certificate authentication allow the poster to be identified. The data about the author can be defaulted from the authentication.
The author is likely to consider the item creation time according to the server clock sufficient temporal metadata.
The minimal entry is a plain XHTML document.
If the author wishes to provide additional metadata, the metadata
can go inside the XHTML entry as opposed to going outside it. XHTML
already has a metadata wrapper: the head
element that
hosts the title
element.
POST /reilly HTTP/1.1 Content-Type: application/xhtml+xml <?xml version="1.0"?> <html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml" xmlns:atom="http://example.com/newformat#" ><head ><title>My First Entry</title ><atom:subtitle>In which a newbie learns to blog...</atom:subtitle ><atom:summary>A very boring entry...</atom:summary ><atom:author ><atom:name>Bob B. Bobbington</atom:name ><atom:homepage>http://bob.name/</atom:homepage ><atom:weblog>http://bob.blog/</atom:weblog ></atom:author ><atom:issued>2003-02-05T12:29:29</atom:issued ></head ><body ><p>Hello, <em>weblog</em> world! 2 < 4!</p ></body ></html>
The benefit of the putting additional metadata inside XHTML instead of requiring a special envelope is that the minimal entry does not require any special wrapping—just POSTing an XHTML document.