I’m working on a smarter text zoom for Firefox. It is a UI feature, and usually such UI features are implemented in JavaScript. JavaScript is easier to write and less bug-prone than C++.
But there is a problem. My JavaScript implementation is slow. It’s not too slow for normal Web pages. In fact, the algorithm works fast enough on a random portal-style news page. However, since the UI will be locked while the script runs the, I need to consider worse cases.
Today I’ve mostly been trying to figure out what to do about it. I’ve been running benchmarks and reading code on LXR.
I am using the Web Applications 1.0 spec as my real-world monster test case. (I realize that it is not a stable test case. However, today it was stable enough.)
My algorithm takes 24 seconds to run on the WA 1.0 spec on my dual core 2 GHz PowerMac G5! The code looks like this:
function countNonSpace(str) {
var count = 0;
// looks like JS does not support \p{Zs} etc. :-(
// using characters below space plus Zs, Zp and Zl
var pattern = /[\u0000-\u0020\u00A0\u1680\u180E\u2000-\u2009\u200A\u2028\u2029\u202F\u205F\u3000]/g;
while (pattern(str)) {
count++;
}
return str.length - count;
}
function countTextSizeChars(node, buckets, view) {
if (!node) {
return;
}
var sizeStack = new Array(0);
var styleStack = new Array(0);
var current = node;
var next = null;
for (;;) {
switch (current.nodeType) {
case Node.TEXT_NODE:
case Node.CDATA_SECTION_NODE:
var len = countNonSpace(current.data);
if (len > 0) {
var size = sizeStack[sizeStack.length - 2];
if (size == -1) {
var style = styleStack[styleStack.length - 2];
size = sizeInPx(style.getPropertyValue("font-size"));
if (size > MAX_BUCKET) {
size = MAX_BUCKET;
}
sizeStack[sizeStack.length - 2] = size;
}
buckets[size] += len;
}
break;
case Node.ELEMENT_NODE:
// svg not supported
if (current.namespaceURI == "http://www.w3.org/2000/svg") {
break;
}
// have to call getComputedStyle for every element to
// see which ones are being rendered :-(
var style = view.getComputedStyle(current, null);
styleStack[styleStack.length - 1] = style;
if((style.getPropertyValue("display") == "none") ||
(style.getPropertyValue("visibility") == "hidden")) {
// this element is not being seen
break;
}
var contentDocument = current.contentDocument;
if (contentDocument) {
// frame, object or similar
var height = style.getPropertyValue("height");
var width = style.getPropertyValue("width");
if ((height != "auto") && (width != "auto") &&
((sizeInPx(height) < 130) ||
(sizeInPx(width) < 130))) {
// The frame or object is likely an ad or a tiny iframe for
// some hack like remote DOM events. Staying out.
break;
} else {
// descend recursively
countTextSizeChars(
contentDocument.documentElement,
buckets,
contentDocument.defaultView
);
}
} else {
// not a frame, object or similar
// descend iteratively
if (next = current.firstChild) {
current = next;
sizeStack.push(-1);
styleStack.push(null);
continue;
}
}
break;
case Node.DOCUMENT_FRAGMENT_NODE:
case Node.DOCUMENT_NODE:
if (next = current.firstChild) {
current = next;
sizeStack.push(-1);
styleStack.push(null);
continue;
}
break;
}
for (;;) {
if (next = current.nextSibling) {
current = next;
break;
}
current = current.parentNode;
sizeStack.pop();
styleStack.pop();
if (current == node) {
return;
}
}
}
}
Let’s see if traversing the text content is the problem:
// var len = countNonSpace(current.data);
var len = 5;
Down to a bit over 17 seconds. Obviously, this part needs scrutiny
but it is not the whole story. (I spent time poking around in LXR.
DOM Inspector seems
to be using IsOnlyWhitespace
on nsITextContent
.
I have not figured out yet, how I can treat the contents of a text
node as nsITextContent
from JS.)
But let’s remove everything but the traversal algorithm itself:
function countTextSizeChars(node, buckets, view) {
if (!node) {
return;
}
var current = node;
var next = null;
for (;;) {
switch (current.nodeType) {
case Node.TEXT_NODE:
case Node.CDATA_SECTION_NODE:
break;
case Node.ELEMENT_NODE:
case Node.DOCUMENT_FRAGMENT_NODE:
case Node.DOCUMENT_NODE:
if (next = current.firstChild) {
current = next;
continue;
}
break;
}
for (;;) {
if (next = current.nextSibling) {
current = next;
break;
}
current = current.parentNode;
if (current == node) {
return;
}
}
}
}
Still a bit over 9 seconds. Ouch. It’s pretty bad if the tree traversal itself takes that much time. To get an idea how well the hardware can do, I tried the exact same algorithm on the same document (albeit without comment nodes) in Java. (I happen to have suitable code ready. In fact, the JavaScript code was adapted from my previous Java code.)
Using the Xerces DOM and the GNU JAXP DOM, the traversal took a tad under 7 milliseconds on average. That is, less than one thousandth of what it took in Gecko using JS! With the Crimson DOM, the traversal took a bit over 22 milliseconds on average.
How can Crimson be so much slower? Actually, I expected that. With the Xerces and GNU, the references to the first child and next sibling are stored in each element node. Crimson, on the other hand, stores an array of children.
Gecko also stores an array of children, but it is supposed to do some clever caching to make document order traversal using first child and next sibling fast, which is why I tried this algorithm in the first place even though I knew that the underlying implementation stores an array of children.
Obviously, traversing the content tree in C++ cannot be as slow as in JS, because Gecko can display the WA 1.0 spec reasonably fast. Also, the mere interpretativeness of JS shouldn’t carry a thousand-fold penalty.
My guess for the moment was that crossing the XPConnect boundary between JavaScript and compiled code was really inefficient, so I decided to test in Opera and Safari.
To do that, I made a local copy of the spec and copied the script into it. The script took 1.3 seconds in Opera and 1.7 seconds in Safari. Now I had to test that one in Firefox. It took 2.2 seconds! I loaded it into my test harness that I had been using earlier. Still only 2.5 seconds! Loaded the page from HTTP and ran the script. 9.6 seconds.
Looks like I have been benchmarking the performance of the security pricipals for the whole day. Aaargh!