The project of putting an HTML5 parser inside Gecko has progressed. I merged in code from the trunk in order to experiment with cool new stuff such as SVG filter effects for HTML.
There have been many unobvious changes since I last blogged (this list is not exhaustive):
Rewrote the <meta>
encoding sniffer (for
the Java version, too) and hooked it up to Gecko.
Made it possible for a late <meta>
detected in the tree builder to restart the document load.
Hooked up the universal chardet to the HTML5 parser. (This is done very differently from the old way, chardet now sees only the first 512 bytes of the file. This may well be good enough, but I’m not sure.)
Hooked up doctype sniffing to the document mode exposed to CSS.
Plugged memory leaks.
Fixed known crashes in the HTML5 parser.
Fixed an issue with the doctype node randomly not showing up in the DOM.
Fixed an issue with CSS frame tree and the DOM getting out of sync after the Adoption Agency Algorithm had run. (Now there’s a new bug that causes a mutation event to fire for a parser-performed tree change here.)
Updated the parser to match the latest spec—multiple times. (There are currently some deviations from the spec that are pending Hixie’s handling of feedback.)
Made the tree builder batch tree operations so that tree changes are actuated relatively seldom and in larger batches.
Made the tree operation actuation code batch append notifications.
Made scripted DOM operations never flush the tree builder.
Made Linux and Windows builds work.
Took trunk changes up to and including ae7ce7e47f5a (2009-03-11).
Etc.
Here’s a demo
that uses SVG filter effects and clipping path on the HTML5 <video>
element without resorting to external file references. (The video is
just some file I happened
to have on the server already.) Here’s a screenshot.
You can get the latest HTML5 parsing-enabled builds for 32-bit Mac, Linux and Windows from the try server. (Look for the latest directory with “hsivonen” in the name.) A word of warning, though: The current builds are crashy.
When loading the HTML5 spec, the HTML5 parser-enabled builds
exhibit the same weirdness as trunk builds: there’s a huge gray
rectangle at the top of the page. I don’t know where it comes from
(there’s only one <body>
element in the DOM, and
the gray rectangle is the bottom of <body>
), but
it’s not an HTML5 parsing problem, since the trunk has it, too.