Copyright 2006 Henri Sivonen
Copyright 2008 Mozilla Foundation
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
This specification defines a RELAX NG datatype library that allows precise attribute datatyping in RELAX NG schemas for (X)HTML5.
This is a work in progress! In its current form, this document is intended to provide a way for the author to organize and communicate his thoughts. Even though this document is intended to develop into an implementable specification, you should not implement this draft spec. This spec has not been endorsed by anyone.
RELAX NG does not provide a built-in means for constraining the lexical space of attribute values (or the text content of elements) beyond enumerating permissible string literals (with or without whitespace trimming). However, RELAX NG provides extensibility via datatype libraries. RELAX NG validators are expected to provide an API for plugging in implementations of datatype libraries. This way, the conformance to a datatype specification can be checked using a Turing-complete programming language.
Typically RELAX NG validators have a built-in implementation of the XSD datatype library. The XSD library provides the datatypes from W3C XML Schemas for use in RELAX NG schemas. Most notably, the XSD datatype library provides regular expressions for constraining the lexical space of a datatype to a regular language.
The XSD datatype library is not adequate for developing accurate RELAX NG schemas for (X)HTML5. Hence, the library described in this specification is needed.
The datatypes defined herein do not check that the value contains only XML 1.0 characters. That task is left for another layer of software.
The ID-type of the datatypes of this datatype library is null.
Except for the string
type, checking for value equality is not needed for these datatypes in order to be able to write RELAX NG schemas for (X)HTML5. However, in order for implementations of this datatype library to behave consistently under equality tests, the datatypes of this datatype library shall implement the equality test as the strict code point for code point string equality test (except for the string
type).
The datatypes of this datatype library are independent of the namespace mapping context.
Whitespace characters are U+0020, U+0009, U+000D and U+000A. If this datatype library is used with the text/html
serialization of HTML5, form feed should be mapped to a space before exposing a value to this library.
This specification states which values each datatypes shall accept. The datatypes must reject values that they are not defined to accept.
In addition to matching the lexical format, an acceptable value for the date datatypes must be a valid date according to the proleptic Gregorian calendar. For example 2006-02-29
is not a valid value for date
, because 2006 is not a leap year. On the other hand, 1582-10-07
and 1752-09-07
must be treated as valid dates.
Leap seconds are not allowed in times.
browsing-context-or-keyword
This datatype shall accept strings that constitute a valid browsing context name or keyword in HTML5.
browsing-context
This datatype shall accept strings that constitute a valid browsing context name in HTML5.
charset
This datatype shall accept strings that contain only characters allowed according to the Naming Requirements of RFC 2978.
Should this refer to the IANA charset registry instead? Or should this be a explicit list but not the IANA list?
charset-list
Not done.
circle
This datatype shall accept strings that are valid values for the coords
attribute in the circle state in HTML5.
date-or-time-content
This datatype shall accept strings that constitute a date or time strings in content in HTML5.
date-or-time
This datatype shall accept strings that constitute a date or time strings in attributes in HTML5.
date
This datatype shall accept strings that conform to the format specified for date
inputs in Web Forms 2.0.
This datatype must not accept the empty string.
datetime
This datatype shall accept strings that conform to the format specified for datetime
inputs in Web Forms 2.0.
This datatype must not accept the empty string.
datetime-local
This datatype shall accept strings that conform to the format specified for datetime-local
inputs in Web Forms 2.0.
This datatype must not accept the empty string.
datetime-tz
This datatype shall accept strings that conform to the format specified for datetime
attribute of the ins
and del
elements in HTML5.
If the time zone designator is not “Z
”, the absolute value of the time zone designator must not exceed 12 hours.
This datatype must not accept the empty string.
Note that allowing a numeric time zone designator is not the only difference with datetime
. This type requires seconds to be explicitly present.
float
This datatype shall accept strings that constitute a valid floating point number in HTML5.
float-non-negative
This datatype shall accept strings that constitute a valid floating point number in HTML5 and whose parsed value is not negative (zero allowed).
float-positive
This datatype shall accept strings that constitute a valid floating point number in HTML5 and whose parsed value is positive (zero not allowed).
float-exp
This datatype shall accept strings that conform to the format specified for number
inputs in Web Forms 2.0.
This datatype must not accept the empty string.
float-exp-positive
This datatype shall accept strings that conform to the format specified for number
inputs in Web Forms 2.0 and whose value parser to positive number (zero not allowed).
This datatype must not accept the empty string.
hash-name
This datatype shall accept strings that have U+0023 NUMBER SIGN (#
) as the first character.
This datatype must not accept the empty string.
ID
This datatype shall accept any string that consists of one or more characters and does not contain any whitespace characters.
IDREF
This datatype shall accept any string that consists of one or more characters and does not contain any whitespace characters.
IDREFS
This datatype shall accept any string that consists of one or more characters and contains at least one character that is not a whitespace character.
integer
This datatype shall accept strings that constitute a valid integer in HTML5.
integer-non-negative
This datatype shall accept strings that constitute a valid integer in HTML5 and whose parsed value is not negative (zero allowed).
integer-positive
This datatype shall accept strings that constitute a valid integer in HTML5 and whose parsed value is positive (zero not allowed).
iri
Need to turn these into charset-sensitive URLs.
This datatype shall accept any RFC 3987 IRI subject to constraints given below.
If the literal violates a “SHOULD”, it must be rejected. If the literal violates security-sensitive RFC language, it must be rejected. If the literal violates DNS-related constraints, it must be rejected.
Scheme-specific knowledge must be used for the following IRI schemes (as augmented by IDNA):
Scheme | Spec |
---|---|
http | RFC 2616 |
https | RFC 2818 |
ftp | RFC 1738 |
mailto | RFC 2368 |
file | RFC 1738 |
data | RFC 2397 |
Scheme-specific knowledge must not be used for other IRI schemes.
If the literal cannot be converted into an URI, the literal must be rejected. (For example, if schema-specific knowledge tells which part is a host name and it cannot be converted to a conforming Punycode DNS name.)
iri-ref
Need to turn these into charset-sensitive URLs.
This datatype shall accept all the values that the iri
datatype is defined to accept and, additionally, relative IRIs. However, relative IRIs with a scheme that the iri
datatype is defined to have knowledge about must be rejected (e.g. http:/foo
).
language
This datatype shall accept strings that are conforming RFC 3066bis language tags. When a subtag value is not reserved for private use, this datatype shall only accept values that were registered at the time the implementation of this datatype was developed.
When the registry says that a language has a default (“suppressed”) script, this datatype must not accept the version that lists the default script explicitly. For example, “fi-Latn
” must be rejected.
Note that the allowed ALPHA
letters are A–Z and a–z, so U+0130 and U+0131 must not be accepted as case-insensitive versions of i and I. Likewise, “oß
” is not a conforming language tag for Ossetian.
This datatype must not accept the empty string.
Since registered language and country codes change over time, implementations should document when their internal snapshot of registered language and country codes was taken.
The IANA language subtag registry is not Free as in Free Software.
media-query
Media Queries have changes lately. This datatype needs to be reviewed against the new MQ spec.
meta-charset
This datatype shall accept strings that compared ASCII case-insensitively consists of the string “text/html;
”, followed to any number of whitespace characters, followed by the string “charset=
” and finally followed by a string accepted by the charset
datatype.
mime-type
This datatype shall accept strings that conform to the syntax of the value of the MIME Content-Type
header except LWS
is allowed only around the semicolon and after the whole value.
mime-type-list
The accept
attribute on input type=file
. This is still really buggy.
month
This datatype shall accept strings that conform to the format specified for month
inputs in Web Forms 2.0.
This datatype must not accept the empty string.
pattern
This datatype shall accept the strings that are allowed as the value of the Web Forms 2.0 pattern
attribute.
polyline
This datatype shall accept strings that are valid values for the coords
attribute in the polygon state in HTML5.
ratio
This datatype shall accept the strings do not cause steps for finding one or two numbers of a ratio in a string return an error.
rectangle
This datatype shall accept strings that are valid values for the coords
attribute in the rectangle state in HTML5.
refresh
This datatype shall accept strings that are permitted in the content
attribute of the meta
element when the element is in the refresh state.
string
This datatype shall accept all strings.
The equality comparisons for this datatype must be code point for code point, except the ASCII letters A–Z must be treated as equal to the ASCII letters a–z.
time
This datatype shall accept strings that conform to the format specified for time
inputs in Web Forms 2.0.
This datatype must not accept the empty string.
week
This datatype shall accept strings that conform to the format specified for week
inputs in Web Forms 2.0.
This datatype must not accept the empty string.
xml-name
This datatype shall accept the strings that match Name
production in XML 1.0 4th edition.