On Friday 2007-01-12, I went to listen to the thesis defense of Mikko Honkala. The title of his doctoral thesis is Web user interaction – a declarative approach based on XForms. The opponents were Jean Vanderdonckt and Dave Raggett. The custos was Petri Vuorimaa.
Here are some notes that I scribbled down. I got the heads up about the thesis defense 40 minutes before showtime, so I haven’t read the thesis, yet.
First, there was a presentation by Honkala.
Flexible UIs: automatically adaptable to different presentations modes.
Work done between 2001 and 2006.
Use cases for voice: while driving, cell phones. (In particular, blind users were not mentioned at all, which was different from the usual discourse related to voice UIs.)
Google Docs & Spreadsheets and Google Video shown as examples of Web apps. (These are not XForms apps, BTW.)
Use case: You need the calendar in the car while driving. Google Calendar doesn’t speak to you while you drive.
A new richer solution needs to preserve the good sides of the Web.
An extremely simple (two input fields multiplied to an output field) code example was shown in XForms, Swing and “AJAX”. The “AJAX” example was more precisely Web Forms 1.0 and JavaScript. The XForms example was the shortest, but I’m pretty sure that a Web Forms 2.0 plus JavaScript implementation without a Web Forms 1.0 fallback (XForms doesn’t fall back to Web Forms 1.0, either) would have been even shorter.
The three main benefits of XForms were identified (at least these are what I distilled from the presentation):
The model is comprehensible to non-programmers.
A voice UI can be produced by the browser from the model.
Visual development tools of the future will be able to use XForms as their native format, so there will be interop on the form development tool side in the sense that one user can use one tool to edit an UI and another user can use another tool to edit the same form.
The thesis work included extensions to XForms (mainly to integrate it with other languages).
XBL used to encapsulate presentation mode/media-specific widget implementations.
This stuff is implementable. Proof by demonstration: X-Smiles.
Not suitable for particularly complex UIs like games.
Then there was the actual defense part. “H:”, “V:” and “R:” denote points attributed to Honkala, Vanderdonckt and Raggett, respectively. The points are not exact quotes and I hope I haven’t misrepresented anything. My notes are not comprehensive. I wrote down stuff selectively based on what I found interesting.
V: Honkala has presented his work at a conference (I forgot which) three years in a row. Impressive.
R: X-Smiles used the style
attribute.
What are the challenges posed by Selectors? H: The style
attribute was used to get started in a standard-based way when a
CSS engine was missing. There is a CSS engine in X-Smiles now.
H: The engine is like a spreadsheet engine. Algorithm by Knuth. Used in the first spreadsheet app.
H: Author-provided hints needed for speech presentation of repeating structures.
R: Scripting is said to be more expensive for developers. Is there quantitative evidence? H: This is not a computer science issue. It is in the field of usability and psychology. The declarative way may be more difficult for programmers, since programmers are used to thinking in procedural or object-oriented ways. There is place for both declarative and procedural ways. Even hybrids.
H: There’s Google Web Toolkit for hard-core programmers.
R: Are there studies about the need of scripting? H: Not aware. Blogs provide instantaneous personal research that is not real research.
R: What will it take to make people switch from HTML to XForms? H: Might not even happen. XForms where multimodality is needed. New technology needs to be many times more efficient than the existing technology. This is not the case for programmers with XForms.
R: (Something about XBL and device independence.) H: XBL brings back the control over details to the raised abstraction. (You lose detail when you raise the level of abstraction.)
V: Modeling takes more effort than just doing. Benefits? H: Have to model anyway. Benefit: tools. Hard to enforce. People only believe by trying and finding it useful.
V: Form-oriented only? H: Need scope. A general declarative system that is not scoped is a programming language in XML tags. V: What if your app is not form-oriented?
H: The visual mode is the master. No support for composite multimodality in X-Smiles. There is support for aural CSS.
R: Separation of content and style? H: Believes in the separation. R: Separate common content and modality-specific content.
V: XForms vs. XHTML+Voice? H: XForms: write once, X+V: detailed.
H: Integration of declarative and procedural gets ugly at the integration point. Does not believe in a 100% declarative future.
R: “Host” and “parasite” not nice terminology for compound documents. “Embedded” better than “parasite”. H: Seemed brilliant in 2002.
H: SMIL kind of died due to lack of an integration point with the Web.
H: Don’t do everything in XML.
H: Difficult to validate research findings.
What interests me the most is how well the “write once” promise works and how much modality-specific CSS and XBL is needed in practice. Also, an obvious question that comes to mind is: “What does this mean for Web Forms 2.0?” It would be interesting to hear about experience of implementing a voice client for Web Forms 2.0. However, as far as I know, this has not been tried, yet. Like X-Smiles, Web Forms 2.0 user agents are also expected to support CSS and, eventually, XBL. The question becomes: Is UI engine access to a declarative data model essential for multimodality?