Is Atom What We Really Need?

Atom (formerly known as Pie, Echo and Necho) has been created as a cleaner and better-defined alternative to RSS 2.0, which is underspecified. (For example, the RSS 2.0 spec neither contains nor references the definition of “entity-encoded HTML”.) But is a reformulated version of RSS 2.0 really what we need? Or would a simple title+permalink feed with the aggregator fetching the actual pages be better?

Does it Make Sense to Put the Content in the Feed?

Netscape’s original RSS 0.9 did not carry actual Web site content. It carried title+permalink pairs. It solved the problem of following site updates. The users got titles of the latest items on their sites of interest linked to the Web pages carrying the actual content.

Then came the description element. It was supposed to be metadata: a short description of the item. However, entering metadata is tedious and boring. What you are more likely to get is the first words of the item extracted by software.

I am not sure how exactly people started to include full content in description. However, my educated guess is this: A popular Web based aggregator was sloppily programmed and simply inserted the character data content of the description element in its tag soup output. Someone discovered that this could be used to inject arbitrary tag soup in the aggregator output and then someone figured that you might as well include the full content to save readers the trouble of following the link to the page that “description” was about.

Then came three-pane “desktop aggregators” that were designed for reading full-content-in-description RSS feeds. Back when I started using NetNewsWire Lite, Macsanomat only had an RSS 0.9 feed. I quickly became annoyed with not having the content in the third pane, so I added a tag soup over RSS 0.92 feed to Macsanomat. In April 2003, I even told a journalist, who contacted the Macsanomat team about RSS, that being able to read the content of many sites in a unified (three-pane) UI was really nice, even if you only got the raw content and missed the sites’ design.

The problem is that when the full content was put on the feed, it was done in order to get through a loophole instead of analyzing the use case and coming up with a solution. Now that there are full-content feeds and apps for reading them, many people take it for granted that you have to stick all your content in a single HTTP-accessible resource and that reading such HTTP-accessible resources with something called and “aggregator” is the greatest thing compared to manually going from site to site.

If you are designing a format for a three-pane “desktop aggregator” instead of trying to push your content through a Web aggregator loophole, all your content does not have to be in a single HTTP object. Instead, the aggregator could actually make multiple HTTP requests per site! You could deliver title+permalink pairs in one HTTP object like in the RSS 0.9 days and have the aggregator pull the content resources indentified by the permalinks in separate HTTP requests. It would even save bandwidth compared to delivering each item over the network many times.

Side note: I think mode="escaped" is the Appendix C of Atom—the ugly tag soup integration thing left in, because current systems are expected to be too broken to go XML all the way.

What if You Want to get the Chromeless Content?

One of the benefits of full-content RSS feeds is that you can get a chromeless version of the site. That is, you can get the main content without any templates wrapped around it. This may be useful if you want to analyze the entry content and ignore all the site navigation, blogrolls etc.

(X)HTML lacks markup for distinguishing between the site chrome and the main content. In fact, the markup structures provided by (X)HTML are mostly intented for describing the main content. There is no markup for common site chrome structures. Hence, the site chrome usually consists of meaningless divs and allegedly misused tables.

The problem could be solved by agreeing to mark up the content/chrome boundary. This would require author cooperation, but getting Atom feeds requires author cooperation as well. Because many people still want text/html instead of application/xhtml+xml, a special class name for div would be required instead of a special namespaced element. Magic class names are ugly but are still cleaner than using special namespaced elements in XHTML served as text/html.

(Note: Agreeing to use a special class attribute is a small requirement compared to formulating an entire XHTML page to serve as a feed, which probably would be too much to ask.)

Are Aggregators So Great After All?

Some design-oriented people complain that if you read the raw unstyled content in an aggregator, you lose the nice design of the HTML+CSS site. At the same time, aggregators are embedding browsers engines in order to gain more advanced rendering abilities. (For example, FeedReader embeds Trident, and NetNewsWire is going to embed Apple’s version of KHTML.)

Instead of making aggregators more and more browser-like, would it not make more sense to add the ability to read feeds to browsers? The browser can pull an old title+permalink feed and load the pages referenced by the permalinks in the normal content area. The reader can track new entries to sites conveniently but seen the sites in all their design glory. For this purpose, even RSS 0.9 will do.

The following is not a mock-up. It is an actual screenshot.

[Three-pane RSS reader in Mozilla Firebird with the content pane being the usual browser content area and showing the full Web page]

I think I would rather have the ability to follow feeds in Camino than have a browser engine in NetNewsWire Lite.

This is not a new idea, but blogger-driven, feeds, RSS, 2003, Mozilla and broadband is a nicer combination than Microsoft-driven, push, CDF, 1997, IE and dial-up.

Are Atom Entries Inside Out?

Joe Gregorio’s AtomAPI draft shows this example of creating a new entry:

POST /reilly HTTP/1.1
Content-Type: application/x.atom+xml

<?xml version="1.0" encoding='iso-8859-1'?>
<entry xmlns="http://example.com/newformat#" >  
    <title>My First Entry</title> 
    <subtitle>In which a newbie learns to blog...</subtitle> 
    <summary>A very boring entry...</summary> 
 
    <author> 
      <name>Bob B. Bobbington</name> 
      <homepage>http://bob.name/</homepage> 
      <weblog>http://bob.blog/</weblog> 
    </author> 

    <issued>2003-02-05T12:29:29</issued> 
 
    <content type="application/xhtml+xml" xml:lang="en-us"> 
      <p xmlns="...">Hello, <em>weblog</em> world! 2 &lt; 4!</p> 
    </content>  

</entry>

The format aims to be general so that you can use formats other than XHTML in the payload. However, designing for such generality may be unnecessary if it is accepted that the payload will be XHTML anyway. If the payload is XHTML anyway, then the Atom envelope is superfluous. Consider the following:

POST /reilly HTTP/1.1
Content-Type: application/xhtml+xml

<?xml version="1.0"?>
<html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml"
  ><head
    ><title>My First Entry</title
  ></head
  ><body
    ><p>Hello, <em>weblog</em> world! 2 &lt; 4!</p
  ></body 
></html>

Things to observe

Most people are too lazy to write subtitles and summaries.
In order to avoid malicious posting, there needs to be an authentication method. Atom doesn’t need to specify how the lower protocol levels perform authentication. Both HTTP basic authentication and TLS certificate authentication allow the poster to be identified. The data about the author can be defaulted from the authentication.
The author is likely to consider the item creation time according to the server clock sufficient temporal metadata.
The minimal entry is a plain XHTML document.

If the author wishes to provide additional metadata, the metadata can go inside the XHTML entry as opposed to going outside it. XHTML already has a metadata wrapper: the head element that hosts the title element.

POST /reilly HTTP/1.1
Content-Type: application/xhtml+xml

<?xml version="1.0"?>
<html xml:lang="en-us" xmlns="http://www.w3.org/1999/xhtml"     
                       xmlns:atom="http://example.com/newformat#" 
  ><head
    ><title>My First Entry</title
    ><atom:subtitle>In which a newbie learns to blog...</atom:subtitle
    ><atom:summary>A very boring entry...</atom:summary
    ><atom:author
      ><atom:name>Bob B. Bobbington</atom:name
      ><atom:homepage>http://bob.name/</atom:homepage 
      ><atom:weblog>http://bob.blog/</atom:weblog 
    ></atom:author
    ><atom:issued>2003-02-05T12:29:29</atom:issued
  ></head
  ><body
    ><p>Hello, <em>weblog</em> world! 2 &lt; 4!</p
  ></body 
></html>

The benefit of the putting additional metadata inside XHTML instead of requiring a special envelope is that the minimal entry does not require any special wrapping—just POSTing an XHTML document.