XHTML—What’s the Point? (Draft, incomplete)

This document is an incomplete draft. I made this document available at this stage only in order to be able to refer to the already written part from news group posts.

On its “HTML Home Page” the W3C says: “XHTML 1.0 borrows the tags from W3C's earlier work on HTML 4, and can be interpreted by existing browsers, by following a few simple guidelines. This allows you to start using XHTML now!” But why would one want to start using XHTML now? The old browsers haven’t suddenly gotten any new abilities. What’s the point?

XHTML 1.0 Served as text/html

The XHTML 1.0 specification allows XHTML 1.0 to be served using the text/html media type if the HTML Compatibility Guidelines set forth in the specification are followed. However, XHTML 1.0 is just a reformulation of HTML 4.01 in XML. When an XHTML 1.0 document is served to a user agent that is prepared to receive HTML, the XHTML 1.0 document gets the same treatment as an HTML 4.01 document. To put it bluntly, there is no point in using XHTML when the XHTML document gets treated as old HTML.

Parsing on the User Agent Side

Old browsers just treat XHTML 1.0 sent as text/html as HTML-like tag soup—that’s what they do with HTML.

New browsers are more interesting. In theory, a user agent that has an XML parser could detect wheter a document served as text/html is in fact XHTML and send such documents down the XML code path. However, real browsers don’t work that way. Mozilla, Opera, IE 5 for Windows and IE 5 for Mac treat an XHTML 1.0 document sent as text/html the same way they treat a valid HTML 4.01 document.

It has been suggested that new browsers should check whether a document begins with an XML declaration and use an XML parser if the XML declaration is found. The idea is that this way, XHTML documents could be include new elements and still degrade gracefully in browsers that can’t properly handle the XML media types. The new elements could be Ruby Annotation elements or elements from other namespaces such as MathML and SVG. Also, in some environments, the use of proper XML media types is encumbered due to ignorance, laziness, bureaucracy or lack of time in configuring or using the HTTP server.

Checking for the XML declaration would solve some problems, but it would be a slippery slope. If some authors included an XML declaration without making sure their documents are, in fact, well-formed XML, users could pressure browser writers to relax the well-formedness constraints, in which case XML would degenerate into tag soup that requires a complicated parser to deal with. On the other hand, not providing a method for graceful degradation that doesn’t involve a special server-side solution will likely slow down the adoption of XML-based markup languages. However, the XML declaration is known to cause problems in some old browsers defeating the purpose of graceful degradation. Other methods for detecting XMLness would be significantly more complicated.

I believe that small devices with the processing capabilities and memory for housing an XML parser but not a complicated HTML-like tag soup parser won’t try to detect XHTML sent as text/html, either. I think it is more likely that such devices will access text/html content through a proxy server that converts tag soup and HTML into XHTML.

The W3C HTML WG has rejected the idea of trying to detect whether a document sent as text/html is XHTML.


The XHTML 1.0 specification in itself doesn’t describe any layout differences between HTML 4.01 and XHTML 1.0. In fact, ever since HTML 4.0 the vendor-introduced presentational features that were included in HTML 3.2 have been played down in favor of style sheets.

Some browsers implement two layout modes: one for dealing with old layout expectations and another for compliance with the CSS layout model. In those browsers the layout of XHTML 1.0 is handled in the same way as the layout of valid HTML 4.01 documents whose doctype declaration includes the URL pointing to the appropriate DTD.


Tools is the area where XHTML 1.0 served as text/html might actually have an advantage over valid HTML 4.01. If the document is edited using tools that only handle XHTML, it isn’t wothwhile to add a layer of conversion to HTML 4.01. A tool writer might choose to support only XHTML and not HTML 4.x, because parsing XHTML is easier than parsing valid HTML 4.x and immensely easier than parsing tag soup. However, if the tools in use output valid HTML 4.01, it doesn’t really make sense to convert the document into XHTML if it is served as text/html anyway.

XHTML Served Using an XML Media Type

XHTML served using an XML media type is a different story. That’s where the benefits come. Documents sent using an XML media type either get parsed using a real XML parser in new browsers or bring up a download prompt in older browsers—just like any other unknown media type.

Lighter Parser

As mentioned above, parsing XHTML is easier than parsing HTML—let alone tag soup. Writing a parser that produces a document tree from a valid HTML document isn’t unduly difficult (if the most exotic SGML minimization features are omitted) but one still has to deal with special cases where the specification requires end tags or even whole elements to be inferred if they aren’t in the document explicitly.

However, dealing with real-world tag soup is a hopeless tar pit. If an HTML parser is to become a compatible tag soup parser, it has to contain special cases after special cases after special cases. This leads to implementor agony and bloated code.

In XML—and, therefore, XHTML—all the elements that end up in the document tree show up explicitly in markup and there are no implicit end tags. Futhermore, well-formedness errors aren’t tolerated. As a result, an XML parser can be smaller and faster than a tag soup parser.

Small Devices

Small and mobile devices usually place certain requirements on software. The piece of software has to be small, because housing a larger binary in ROM is more expensive. Likewise, lesser RAM and processor cycle consumption is desirable. Considering those requirements, an XHTML Basic implementation is better suited for small mobile devices than a full-blown tag soup user agent.

But does this mean anything to the content providers? After all, with transcoding proxies, the content providers could serve valid HTML 4.01 Strict which would be converted to XHTML Basic by a proxy server.

I believe it is going to be a matter of control. If the content providers want to make sure that their files reach the reader intact without a proxy server tampering with the document, they will want to provide content in the XHTML Basic format themselves.

Browsers on Desktop Computers

The developers of the current Web browsers have put a great deal of effort in developing HTML support in their browsers. They aren’t going to throw all that away overnight. Hence, the size argument doesn’t work with desktop computers: A browser with a HTML parser and an XML parser is, of course, larger than a browser with only one parser.

With today’s computers, the perceived performance difference between an HTML parser and an XML parser isn’t big enough to motivate content providers to make the switch. The promise of pages displaying a tiny bit faster isn’t very attractive if the cost is narrowed potential audience.


So far, Ruby Annotation is the only new feature that is in XHTML but isn’t in HTML. Ruby is of interest to East Asian content providers in particular. However, Ruby alone might not alone be a good enough incentive for introducing higher browser requirements on sites.

Mixing Elements from Other Namespaces

Mixing non-XHTML content with XHTML is probably the most attractive possibility that the XMLness of XHTML provides. By using elements from other namespaces mathematical formulae or vector graphics, for example, can be included in an XHTML document.


MathML allows mathematical expressions to be encoded using an XML vocabulary. Having the browser render mathematical expressions beautifully is a far more attractive idea than using GIFs or PDF files.

I predict that MathML is going to make XHTML popular among people interested in mathematics, physics or other areas where mathematical notation is used. This will probably happen once universities start installing MathML-enabled browsers (likely Mozilla derivatives) as the default browsers.


I think SVG has the greatest potential to make XHTML popular outside academia. It will take time, though. Integrated SVG implementations need to exceed the coolness of Flash in the opinion of graphics designers.

The draft ends here. More content to come.

Copyright Henri Sivonen

Text last modified: 2001-11-13