HTML5 Parsing in Gecko: A Build

The effort of putting an HTML5 parser inside Gecko takes a step out of the vaporware land. Here’s a very preliminary build (Mac Universal; not tested on PPC) with source tarballs (Mozilla source tree and Java sources for generating the HTML parser; the latter requires javaparser).

The level of quality is “It runs and some pages render!” This build is not at all suitable for normal browsing use. Please don’t use it with your usual Firefox profile. There are numerous known issues starting with bogus memory management (leaking everything in the parser!), lack of fragment parsing support, always rendering quirks mode, HTML elements being represented as DOM nodes that behave like XHTML elements and the integration with CSS layout being inefficient. The baseline Gecko source isn’t synced with the trunk, so the other parts of Gecko don’t have all the latest patches. The parser doesn’t reflect the most recent spec changes. meta element-based encoding declarations and BOM sniffing don’t work.

If a page doesn’t render, try reloading or navigating back and forth.

The parser lives in content/html/parser/ (for linkage reasons).

There’s a boolean pref called html5.enable which defaults to true.

For background, please refer to a recent newsgroup posting of mine. (Summary: The parser core is mechanically translated from the Validator.nu HTML Parser.)

P.S. Remember to try some SVG and MathML in text/html.

Update: Sam Ruby has instructions for building from version control.