The Joy of about:blank

about:blank is probably the hardest Web page to load. In fact, it is so hard that in order to turn the HTML5 parser on by default in Firefox last year, we decided to special-case about:blank to use the old parser in Firefox 4.

In Firefox, about:blank is sometimes parsed from a stream and sometimes its DOM is generated without running a parser. The problem that prevented us from using the HTML5 parser for about:blank is that a bunch of test cases assume that when about:blank is parsed from a stream, the whole operation happens as a single event loop task. These tests aren’t really testing about:blank behavior, but since the test cases are Gecko-specific they inevitably have accidental dependencies on delicate Geckoisms that real Web pages wouldn’t depend on.

The HTML5 parser parses streams off the main thread, so getting anything parsed involves at least two event loop tasks on the main thread and a spin in between. First a task for setup and later another task for handling the data that the parser thread handed back to the main thread. This is OK for parsing streams from the network, because data from the network takes multiple event loop spins to arrive anyway. However, it was a major problem with data: URLs in test cases and it still is a problem with about:blank in test cases.

We want to remove the old HTML parser from the code base entirely after Firefox 4, so special-casing about:blank to use the old parser is not a reasonable long-term solution. Since Gecko’s behavior differs in subtle ways from other browsers, it’s probably a bad idea to implement a special pseudo-parser for replicating the old Gecko behavior exactly. Instead, it would make sense to see what other browsers are doing, standardize the least bizarre but still Web-compatible behavior and implement that.

Unfortunately, it’s not clear what the least bizarre but still Web-compatible behavior is. It seems that IE has had special behavior for about:blank in the window.open case practically forever while IE’s iframe behavior is refreshingly reasonable. Other browsers appear have generalized the special behavior to apply to a all browsing contexts. However, Gecko has done it differently from the others. WebKit used to handle things more like iframes in IE but has moved to a more complex direction.

I tested the behavior of loading about:blank on one hand and another (same-origin) URL on the other hand into an iframe and into a window.open-created browsing context (with pop-up blocker turned off). Since I was aware of significant implementation differences between the traditional window.open case that opens a window and browser prefs that target window.open into a new tab, I tested both cases.

I identified eight different behavior. For ease of presentation, I first list the behaviors and then show a table of which browser has which behavior in which case.

Sync
An about:blank document is created synchronously into the browsing. A load event is fired for it synchronously (or not observable in the window.open case).
Sync plus single-task
An about:blank document is created synchronously into the browsing context. A load event is not fired for it. A task is queued for loading another about:blank document into the browsing context. This second document has its DOM built during the one task, so no bodyless state of the DOM is observable. A load event fires for this second about:blank.
Sync plus async other
An about:blank document is created synchronously into the browsing context. A load event is not fired for it. Later task queue tasks incrementally build the DOM for the non-about:blank document into the browsing context as data arrives from the network. A load event fires for this second document.
Sync plus async other no load
An about:blank document is created synchronously into the browsing context. A load event is not fired for it. Later task queue tasks incrementally build the DOM for the non-about:blank document into the browsing context as data arrives from the network. No load event fires for this second document in a way observable from the outside.
Empty plus async
The browsing context first has an empty (bodyless) document. Later task queue tasks build the DOM for destination document into the browsing context. Load event fires after.
Cache-dependent premature load
If the target document is not cached: The browsing context first has an empty (bodyless) document. Later task queue tasks build the DOM for destination document into the browsing context. Load event fires after. If the target is cached: Load event fires first. The browsing context has no document when that happens. After window.open returns, the browsing context already has the DOM of the target document in it.
Cache-dependent no load
If the target document is not cached: The browsing context first has an empty (bodyless) document. Later task queue tasks build the DOM for destination document into the browsing context. Load event fires after. If the target is cached: After window.open returns, the browsing context already has the DOM of the target document in it. No load event fires.
Racy
Most often the same behavior as “Sync”. Other times like “Empty plus async”.
iframe about:blank iframe other URL window.open about:blank into tab window.open other URL into tab window.open about:blank into window window.open other into URL window
Firefox 4 Sync plus single-task Sync plus async other Sync plus single-task Sync plus async other Sync plus single-task Sync plus async other
Chrome 10 Sync Sync plus async other Sync Sync plus async other
Safari 5 Sync Empty plus async Sync Empty plus async Sync Empty plus async
Opera 11 Sync Sync plus async other Sync Sync plus async other no load Sync Sync plus async other no load
IE6 Empty plus async Empty plus async Racy Empty plus async
IE9 Empty plus async Empty plus async Sync Cache-dependent premature load Racy Cache-dependent no load

I hope this research can be used for assessing whether the current HTML5 draft makes sense on this topic. Even though the behavior “Sync” seems to be prevalent in the table, I’m rather worried about ever making the load event fire synchronously, so I’m hoping we don’t end up having to do that.

My own judgment is leaning towards the following:

I am not sure what to think of the case where about:blank is not the initial destination of navigation for an iframe. Creating a synchronous placeholder about:blank DOM would make the browsing context code the same as in the window.open case and would provide an early document.body that scripts could accidentally poke without poking null. On the other hand, setting up such a DOM first and then letting the DOM be blown away by a later network task means that the presence of element nodes in the DOM would be racy relative to the network and to script getting appended to the DOM in the same parser task that appended the iframe. In that sense, the IE/Safari behavior would be nicer.