Henri Sivonen’s pages
- encoding_rs: a Web-Compatible Character Encoding Library in Rust
- encoding_rs is a high-decode-performance, low-legacy-encode-footprint and high-correctness implementation of the WHATWG Encoding Standard written in Rust.
- How I Wrote a Modern C++ Library in Rust
- Patterns that I used to make encoding_rs appear as a modern C++ library to C++ code.
- It’s Not Wrong that
"🤦🏼♂️".length == 7
- IME Smoke Testing
- In early 2019, I found myself in a situation where I needed to check that I hadn’t broken IME integration code. Later in 2019, I needed to do it again and now I'm testing this again in 2020, so I’m writing this down.
- Why Supporting Unlabeled UTF-8 in HTML on the Web Would Be Problematic
- UTF-8 has won. Yet, Web authors have to opt in to having browsers treat HTML as UTF-8 instead of the browsers Just Doing the Right Thing by default. Why?
- Always Use UTF-8 & Always Label Your HTML Saying So
- To avoid having to deal with escapes (other than for <, >, &, and "), to avoid data loss in form submission, to avoid XSS when serving user-provided content, and to comply with the HTML Standard, always encode your HTML as UTF-8. Furthermore, in order to let browsers know that the document is UTF-8-encoded, always label it as such.
- Activating Browser Modes with Doctype
- A document about the essentials of the layout modes of newer browsers.
- HOWTO Avoid Being Called a Bozo When Producing XML
- Dos and don’ts about producing XML programmatically.
- The Sad Story of PNG Gamma “Correction”
- Why you might not want to use PNG images when you want image colors and CSS colors to match.
- An HTML5 Conformance Checker
- My master’s thesis
- Assembling Web Pages Using Document Trees
- A paper about a template engine that operates on XML document trees. (Source code available.)
- Tag Soup: How Mac IE 5 and Safari handle <x> <y> </x>
- What happens with the DOM in Safari and Mac IE 5 when the nesting of the markup is broken?
- Thoughts About a Print UI for Mozilla
- Some thoughts about printing from a Web browser.
- Digitaalisesta arkistoinnista
- Documents about archiving digital documents (in Finnish)
- Can Anti-DRM Clauses in Content Licenses be Free?
- Are anti-DRM clauses a good idea? Are the current clauses merely badly drafted and an anti-DRM clause in general could be free? Or is any anti-DRM clause inherently non-free?
- Älä käytä Creative Commons 1.0 -lisenssejä
– käytä 2.5-sarjaa
- The Finland version of the Creative Commons
suite of licenses is still at 1.0. The 1.0 series of CC licenses has
three serious known bugs (in Finnish)
- A Web-Compatible Character Encoding Library in Rust. (Used in Firefox.)
- Validation 2.0.
- The Validator.nu HTML Parser
- An implementation of the HTML5 parsing algorithm in Java. (Used in Firefox by the means of automated translation to C++.)
- Photo and Metadata Backup for Flickr
- This is a photo and metadata backup utility for Flickr
written as a self-contained Java command line tool. The
metadata is written is an XML file whose format is an aggregation of
the response data from the Flickr API.
- Autozoom Extension for Firefox®
- When Autozoom is activated, the current document is analyzed for the
dominant font size and the view is zoomed by the factor that makes
the dominant size match your font size preference.
- Photo Group Feed
- Flickr doesn’t provide feeds
for private groups. It doesn’t provide feeds for comments on photos
in a group, either. It is reasonable to want such feeds, so here’s
a script that generates them on your HTTP server.
- View Originl Bookmarklet
- It takes way too many clicks to get from a Flickr photo page to the original JPEG file. I wrote a bookmarklet that does it with just one click.
- Miscellaneous Java Code
- Utility code.
- CMS Stuff
- Papers and code related to a CMS project.
- SaxCompiler is a tool for recording SAX
events as Java code that can play back the events without parsing
- A two-player asteroid shooting network game written in Java.
- HTML Syntax Checker in PHP
- An HTML linter written in PHP.
- UTF-8 to Code Point Array Converter in PHP
- This package contains a PHP include file which provides two functions for converting between UTF-8 strings and arrays of ints representing Unicode code points.
- Rust 2020
- It’s again the time of year when the Rust Core Team calls for blog posts for input into the next year’s roadmap. This is my contribution.
- It’s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++
- I think the C++ standard should adopt the approach of “Unicode-only internally” for new text processing facilities and should not support non-Unicode execution encodings in newly-introduced features. This allows new features to have less abstraction obfuscation for Unicode usage, avoids digging legacy applications deeper into non-Unicode commitment, and avoids the specification and implementation effort of adapting new features to make sense for non-Unicode execution encodings.
- Rust 2019
- The Rust team encouraged people to write blog posts reflecting on Rust in 2018 and proposing goals and directions for 2019. Here’s mine.
- Using cargo-fuzz to Transfer Code Review of Simple Safe Code to Complex Code that Uses
- The Rust team encouraged people to write blog posts reflecting on Rust in 2017 and proposing goals and directions for 2018. Here’s mine.
- I used model-based testing with coverage-guided fuzzing to gain confidence in the correctness of
- No Namespaces in JSON, Please
- I think that experience from Namespaces in XML should lead to the conclusion not to repeat the same (or almost same) thing with JSON. I think the developer community as a whole should not pay the cost of the use cases of the part of the developer community that believes (out of the scope of this post if rightly or wrongly) that identifiers in data formats should fit into a global naming scheme and, more specifically, that naming scheme should make every identifier into a URI. Instead, I think that the part of the developer community that believes that it needs to be able merge data thanks to identifiers being URIs should bear the cost of doing whatever name mangling it needs to do upon data ingest given the information of which format a given ingested piece of JSON was in.
- Julkisesti luotettu varmenne ikidomainille TLS:ää (SSL:ää) varten
- Aiemmin ikidomainille,
hsivonen.iki.fi, on ollut vaikeaa saada julkisesti
luotettua TLS-varmennetta. Uusi voittoa tavoittelematon varmentaja Let’s Encrypt tarkistaa
isäntänimen (hostname) hallinnan ja mahdollistaa näin julkisesti luotetun varmenteen saamisen ikidomaineille. (English summary: Previously it was
impractical to get a publicly trusted TLS certificate for an iki domain (e.g.
hsivonen.iki.fi). Thanks to Let’s Encrypt performing validation
on a per-hostname basis, it’s now practical to get a publicly trusted
certificate for an iki domain.)
- If You Want Software Freedom on Phones, You Should Work on Firefox OS, Custom Hardware and Web App Self-Hostablility
- To achieve full-stack Software Freedom on mobile phones, I think it makes sense to focus on Firefox OS, commission custom hardware and develop self-hostable Free Software Web apps and an easy deployment platform for them.
- Character Encoding Menu in 2014
- This post is about a UI feature that I wish no one would have to use. Happily, it is indeed almost unused. Still, I made it more usable in the case when it is used. (The change was more driven by code removal than usability, though.)
- Thoughts on HTML5 Becoming a W3C Recommendation
- Since I’ve participated in the development of HTML5 for a decade now (since before it was commonly called “HTML5”), I’ve been asked for my thoughts about HTML5 becoming a W3C Recommendation. Hence, I figured I’d post something here.
- Four Finnish Banks Training Users to Give Banking Credentials to Another Site
- A person who turns to me for technical advice was logging in to government service using banking for a bank called Handelsbanken. However, the page that was asking for the Handelsbanken login credentials was not served from
https://*.handelsbanken.fi/! After investigating what was going on, I decided to review how other banks in Finland handle this. Here are my findings.
- What is EME?
- It was suggested at the Mozilla Summit that there isn’t good information around about what Encrypted Media Extensions (EME) actually is. Since I’m on the HTML working group and have been reading the email threads about EME there, I thought that I could provide an introduction that explains things that may not be apparent from the specification itself.
Accept-Charset Is No More
- Now that Firefox 10 has been released,
none of the major browsers send the
Accept-Charset HTTP header.
- WebM-Enabled Browser Usage Share Exceeds H.264-Enabled Browser Usage Share on Desktop (in StatCounter Numbers)
- Looking at StatCounter stats, it occurred to me that they might not match the common narrative about H.264 market share. I decide to run some numbers using StatCounter stats.
- Vendor Prefixes Are Hurting the Web
- I think vendor prefixes are hurting the Web. I think we (people developing browsers and Web standards) should stop hurting the Web.
- HTML5 Parser-Based View Source Syntax Highlighting
- A new implementation of the View Source HTML and XML syntax highlighting has landed in Firefox.
html5.parser.enable Pref is Gone
- Just a quick note to Firefox nightly testers and bug triagers: I pushed
a patch that makes Firefox no longer honor the
- Windows 8 App Support Matrix
- Over the last few days, there’s been quite a bit of speculation about whether Windows 8 on ARM will ship the desktop environment and allow recompiled code written to the legacy Win32 APIs run.
- The Old HTML Fragment Parser is Gone
- Just a quick note to Firefox nightly testers and bug triagers.
- Schema.org and Pre-Existing Communities
- I have been reading tweets and blog posts expressing various
levels of disappointment and unhappiness about schema.org not using
RDFa, not using Microformats or not having been developed in the open
with the community. Since other people’s perspectives differ from
mine, I feel compelled to write
down my take.
- What Could Microsoft Do about IE6?
- Microsoft has started a campaign
to drive down the market share of IE6. Getting rid of IE6 is a
righteous goal. Microsoft’s proposed solution isn’t righteous, though.
is probably the hardest Web page to load. In fact, it is so hard that
in order to turn the HTML5 parser on by default in Firefox last year,
we decided to special-case
to use the old parser in Firefox 4.
- Sergeant Semantics
- So the W3C launched a logo for HTML5. And not just for HTML5-the-spec but for HTML5-the-buzzword. Regardless of the logo itself or what it stands for, I find the choice of the ancillary visual elements weird.
- Vihreiden tekijänoikeuslinja ja teosten tekijöiden eläketurva
- Vihreät julkaisivat äskettäin tekijänoikeuslinjapaperin.
On positiivista, että puolue kiinnittää huomiota aihepiiriin niin
paljon, että siitä julkaistaan erillinen linjapaperi. Minua
kuitenkin häiritsee paperissa suhtautuminen teosten tekijöiden
eläketurvaan. (English summary: I’m unhappy that the newly
released copyright policy paper of the Finnish Green Party suggests
that authors of copyrighted works should get royalties for the
commercial use of the works they have created long after the creation
of the work in order to get money in the pensioner age.)
- HTML5 Script Execution Changes in Firefox 4 Beta 7
- In Firefox 4 beta 7, script execution changed to be more
HTML5-compliant than before. This means that in some cases sites that
sniff for Firefox or Gecko may break. If your site/app works
cross-browser without browser sniffing, you don’t need to read
further. (However, if you triage bugs on bugzilla.mozilla.org, you might still want to read on.)
spacer Element Is Gone
- Today, I landed a patch
that made the HTML5 parser in Gecko unaware of the HTML
- Apple took some of their Safari Technology Demos from their
developer site and published them at http://www.apple.com/html5/
as an “HTML5 Showcase”. Christopher
Blizzard's blog post about the subject says almost everything I'd
have to say, so please read Blizzard's post. I'm posting just my
- SVG and MathML in
text/html in Firefox and Validator.nu
- I enabled SVG and MathML-related stuff recently on both
mozilla-central and on Validator.nu.
- HTML5 Parser Improvements
- As mentioned earlier, there is an ongoing project for replacing Gecko’s old HTML parser with an HTML5 parser. Significant improvements have landed lately, so if you’ve previously tried the HTML5 parser and turned it off due to crashiness or Web compatibility issues, now is a good time to turn it back on.
- Thou Shalt Not Spec a Feature that Might Inadvertently Compete with RDF when Used Contrary to How It Is Designed to Be Used
- From the minutes of the TAG meeting on November 2nd 2009.
- Speculative HTML5 Parsing Landed
- As mentioned earlier, there is an ongoing project for replacing Gecko’s old HTML parser with an HTML5 parser. Today, a significant milestone landed: off-the-main-thread speculative HTML5 parsing.
- Help Test HTML5 Parsing in Gecko
- The HTML5 parsing algorithm is meant to demystify HTML parsing and
make it uniform across implementations in a backwards-compatible way.
The algorithm has had “in the lab” testing, but so far it hasn’t
been tested inside a browser by a large number of people. You
can help change that now!
- An Unofficial Q&A about the Discontinuation of the XHTML2 WG
- Many of the comments on Zeldman’s
post indicate that there are people who are badly misinformed about
the matters surrounding this announcement. To help remedy that,
here’s some quick Q&A for getting informed.
- Browser Technology Stack
- I took a quick attempt at drawing a stack for Web browsing.
- The Last of the Parsing Quirks
- I implemented a single quirk for HTML5 parsing yesterday.
- Testing HTML5 Parsing
- I have been using a browser with an HTML5 parser for both my work
and leisure browsing for a bit over a week now. I think in-browser
HTML5 parsing is now ready to be tested by others as well.
- Extended Uncertainty
- I use myvidoop as my OpenID
delegate. They used to have an EV
certificate. Yesterday, they didn’t.
- Out of Context
- Last week on W3C mailing lists.
- A Lecture about HTML5
- I was invited to give a lecture about HTML5 on a course titled WWW Applications at the Department of Media Technology of Helsinki University of Technology.
- SVG Filter Effects in HTML without External References
- The project of putting an HTML5
parser inside Gecko has progressed. I merged in code from the
trunk in order to experiment with cool new stuff such as SVG
filter effects for HTML.
- HTML5 Parsing in Gecko: A Build
- The effort of putting an HTML5
parser inside Gecko takes a step out of the vaporware land.
- I Want an Affordable Snapshot-Saving Crypto-Backupping RAID NAS
- This week, I lost over one potential work day to HFS+. And it
wasn’t the first time I’ve lost time to HFS+. I want to
make arrangements to avoid losing time to HFS+ in the future.
- Access Blocked
- I followed a link from a message to a spec in the /TR/
space on www.w3.org.
- Not Part of the Technology Stack
- At XTech 2006, I got a W3C brochure entitled Leading the Web
to its Full Potential that had a diagram visualizing the W3C
- Browser Sniffing History in the Chrome UA String
- Google Chrome has the following cruft in the HTTP
- Introducing SAX Tree
- I chose to write yet another XML tree package.
- Lowering memory requirements by replacing Schematron
- For long time, I’ve said is that the Schematron schema in the HTML5 facet of Validator.nu was merely a rapid prototype that should be replaced with custom Java code.
- The Performance Cost of the HTML Tree Builder
- I’ve been thinking about the performance gap between the
Validator.nu HTML Parser and Xerces. What can be attributed to the
“extra fix-ups” that an HTML parser has to do and what can be
attributed to my code being worse than the Xerces code?
- Performance Mistake
- In the spirit of documenting one’s mistakes…
- Validator.nu Gets Out of the Java Trap
- This week, I upgraded the operating system on the Xen
virtual machine that powers
to Ubuntu Hardy.
- Validator.nu Downtime
- Validator.nu was down last week.
- NVDL Support in Validator.nu
- I enabled NVDL today.
- ARIA in HTML5 Integration: Document Conformance (Draft, Take Two)
- Now a runnable suggestion.
- Security Quote of the Day
- Cluelessness and incompetence of epic proportions.
- ARIA in HTML5 Integration: Document Conformance (Draft)
- This is not a spec and has not been endorsed by anyone.
- Reality Distortion Fields
- Where Joel Spolsky’s analysis of the IE version targeting issue goes wrong.
- Almost Precedent
- Why the Gecko Almost Standards Mode shouldn’t be used to justify IE engine version targeting.
- Regular Expressions, Computer Science and Practice
- Disregard of computer science can crash your app.
- Unimpressed by Leopard
- Sadly, Leopard is not a clear improvement over Tiger.
- Built-in Accessibility Roles in HTML5
- A quick table of WAI-ARIA roles and what HTML 5 provides natively for each role as of July 2007.
- Printing Web Apps 1.0
- This is a quick guide for getting a dead-tree version of the Web
Applications 1.0 spec.
- Speaking at XTech
- I’ll be speaking at XTech.
- IM Logs
- Quote of the week.
- EFFI’s Day in Court
mentioned earlier, Electronic
Frontier Finland (EFFI) was suspected of illegal fundraising. The
case was tried today. I went to the court house to observe the
- XHTML and Mobile Devices
Pieters’ mobile XHTML test results need more publicity.
- Social Media Impression Management
- I asked if they had researched the
image formation of social media sites. They hadn’t.
- DTDs Don’t Work on the Web
- Last weekend, Slashdot linked
to an article
that observed that Netscape had removed the RSS
0.91 DTD. I hope this episode has a silver
lining and helps in making people realize that DTDs don’t belong on
- Thesis Defense on XForms
- On Friday 2007-01-12, I went to listen to the thesis defense of
- Maemo Source Code
- To save others the trouble of requesting the source, here are the contents of the package called “2.2006.39-14-srcs”.
- Validator Web Service Interface Ideas
- I am just writing this down so I don’t forget it.
- Three Styles
- Well, four styles if you count the original.
- Charmod Norm Checking
- Charmod Norm is
still in the Working Draft state, but if it were to become a
normative part of (X)HTML5, it would belong to the area of the
conformance checking service that I am working on now, so I
prototyped Charmod Norm enforcement as well.
- Charmod Checking
- Here’s how I have addressed the requirements of Charmod that
apply to content (marked as [C] is Charmod).
- Table Integrity Checker
- The first non-schema checker prototype is a table integrity checker.
- Openmind 2006
- I attended Openmind 2006
last week. Here are some notes.
- ISO Opens Up a Little
- It turns out that ISO now has some standards on the Web. That’s
good, but putting all of them there in a Web-friendly format would be
- Natural Hazards Again
- Looking across the street, I can see that there’s something
extra in the air between where I sit and the house on the other side
of the street.
- The Scientific Method According to Hixie
- Quote of the week from the topic of #developers on irc.mozilla.org
- What to Do with All These Photos?
- I have a lot of photos that aren’t shared properly, which makes
them less useful than they could be. Considering that it has been
possible to publish photos on the Web for over a decade, I find it
interesting and annoying how many unsolved problems there still are.
- Aula 2006
- Yesterday, I went to listen to the public speeches that were part
of Aula 2006 – Movement.
- HOWTO Establish a 100% Literacy Rate
- This is one of my favorite pieces of West
Wing script writing.
- Need a Taxi at a Taxi Station? You Lose!
- A taxi station is the worst place to be in Helsinki when you need a taxi (unless there’s one already there).
- XTech 2006
- I went to the XTech 2006
conference last week.
- Europe Day
- Tuesday 2006-05-09 was the Europe Day. I traveled to Tampere for a show debate.
- So the Makasiinit burned today.
- Comedy is the Real News
- An observation I made last year when watching TV in the U.S.
- Unused Icons
- Unhelpful Microsoft wizardiness
- Lists in Attribute Values
- Whitespace-separation is good.
- How Not to Advertise an Election Candidate
- On Sunday and Monday elections were held at the local congregation in order to select a new vicar. I didn’t like the campaigning.
- Bureaucracy Meets the Web
- Three things from the past week happened to be related to bureaucracy and the Web…
- Who knows prefixed XHTML from a hole in the ground?
- Remember to test prefixed XHTML as well.
- Atom Feed
- I now have an Atom 1.0 feed.
- RFC 2119 Key Words in Management Textbooks
- Just a random observation about the vocabulary of management textbooks.
- Big Brother EU
- On Tuesday 2005-11-22, I went to a public discussion event titled “Big Brother EU”.
- Thoughts on Using SSL/TLS Certificates as the Solution to Phishing
- Comments on Staying Safe From
Phishing With Firefox.
- An Idea About Intermediate Language Trees and Web UI Generation
- An idea about Web UI generation I had when I was studying compiler technology.
- Natural Hazards: NA
- Thoughts about nuclear power plants in stormy situations.
- Names of Browser Engines
- A table of browser names, engine names and script engine names.
- HOWTO Spot a Wannabe Web Standards Advocate
- I have seen this too often. (Aussi disponible en français;
Auch vorhanden auf Deutsch; jest dostępny po polsku)
- ISO-8859-15 on haitallinen
- UTF-8 is the way to go. (In Finnish.)
- 10 Safari 1.0 issues
- Hyatt requested lists like this.
- Is Atom What We Really Need?
- Atom (formerly known as Pie, Echo and Necho) has been created as a cleaner and better-defined alternative to RSS 2.0, which is underspecified. But is a reformulated version of RSS 2.0 really what we need?
- Outlining the “Ultimate” Blogging Server
- I’ve been thinking what a really
good blogging system or a news site content management system would
be like. Here’s my attempt at outlining the “ultimate”
- Kesäkoodi Wrap-Up – 2006-09-19
- The last week of Kesäkoodi stretched to two sparse weeks.
- On Clipboard Formats – 2006-09-15
- This stuff is so underdocumented that it isn’t even funny. This
document is written so that others might find something when they
search the Web.
- Week 35
- The weekly report for week 35.
- Week 34
- The weekly report for week 34.
- Speaking Gig – 2006-08-28
- I have been booked to speak at the Openbyte pre-conference of the
Openmind 2006 event in Tampere
Hall on 2006-10-24.
- Week 33
- The weekly report for week 33.
- Week 32
- The weekly report for week 32.
- Week 31
- The weekly report for week 31.
- Week 30
- The weekly report for week 30.
- Week 27
- The weekly report for week 27.
- Builds, Take Two – 2006-07-07
- The builds
have been respun with fixes for interrupting Expat properly.
- Builds! – 2006-07-06
- Now there is something to test. I am providing builds with my
preliminary patches for four target platforms.
- Oops! I broke MathML – 2006-07-05
- Or, well, one could argue that it was already broken but my
content sink changes and a suitably crafted test case just exposed
the layout issues that were already there.
- Week 26
- The weekly report for week 26.
- The Content Sink Inheritance Diagram – 2006-06-30
- I have discovered that my previous diagram showed only a part of the inheritance graph below
nsIContentSink. There is more.
- Eclipse CDT – 2006-06-27
- After working in TextWrangler (and a bit in XCode) for a couple of
weeks, I really started to miss Eclipse.
- Week 25
- The weekly report for week 25.
- Week 24
- The weekly report for week 24.
- Week 23
- The weekly report for week 23.
- Planning the XML Content Sink Incrementalization Work – 2006-06-10
- I’ve been researching the problem area of bug
- Week 22
- The weekly report for week 22.
- Week 21
- The weekly report for week 21.
- DOM Traversal Performance – 2006-05-26
- Kesäkoodi Starting – 2006-05-23
- So what’s this Kesäkoodi
- A Rust Crate that
Also Quacks Like
a Modern C++ Library
- My RustFest Paris 2018 talk. (Slides about pointers in zero-length slices have been edited after RustFest to avoid spreading out-of-date information.) Video is available.
- An Introduction to Unicode
- PDF slides about Unicode.
- W3C DOM -esittely
- An introduction to the W3C DOM (in Finnish).
These documents are related to the amendments to the Copyright Act and the
Criminal code which were passed in order to implement the EUCD in Finland and have been dubbed
- Karpelan lukkovertaus ontuu
- Anti-circumvention legislation does not make sense, and it is fallacious to compare circumventing DRM to breaking into an apartment. (In Finnish)
- Mustaa valkoisella
- A document request to the Ministry of Education. (In Finnish)
Articles in Need of Updating
- Mac OS X Browser Comparison
- This document is a rough yes/no feature comparison of the Web browsers that run natively on Mac OS X. It does not cover browsers that run on the Classic VM or require an implementation of the X11 windowing system. Severely out of date. For historical reference only!
- Writing Structural Stylable Documents in Mozilla Editor
- The Mozilla Editor is designed around HTML 4 Transitional. If special steps aren’t taken, it is easy to produce presentational documents that lack stylable structure. This document describes some basic good authoring practices for the purpose of writing structural and stylable documents.
- About Points and Pixels as Units
- A document about points being often mistakenly though as pixel units. Points are not pixel units. Defining the font size in points on Web pages is considered harmful. This document needs to be updated.
- About the Hiragino Fonts with CSS
- A short document about a couple of observations on using the Hiragino fonts with CSS. (The Hiragino fonts come with Mac OS X.)
- XHTML—What’s the Point? (Draft, incomplete)
- This document is incomplete, but I put it on the Web in order to avoid retyping the same thing over and over again in newsgroup discussions.
- Things to Take into Account When Moving to Standards-Compliant HTML and CSS Authoring
- This is a mixed collection of a few issues that are worth taking into account when writing Web pages according to the W3C Recommendations.
- Imitating Reflective Caustics in POV-Ray
- A tutorial on imitating reflective caustics in the official distribution of POV-Ray
- Yet another ray tracing gallery page.