Abstract

This specification defines the 5th major revision of the core language of the World Wide Web, HTML. In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

If you wish to make comments regarding this document, please send them to public-html-comments@w3.org (subscribe, archives). All feedback is welcome.

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

The publication of this document by the W3C as a W3C Working Draft does not imply that all of the participants in the W3C HTML working group endorse the contents of the specification. Indeed, for any section of the specification, one can usually find many members of the working group or of the W3C as a whole who object strongly to the current text, the existence of the section at all, or the idea that the working group should even spend time discussing the concept of that section.

The W3C HTML Working Group is the W3C working group responsible for this specification's progress along the W3C Recommendation track. This specification is the 22 January 2008 First Public Working Draft.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Stability

Different parts of this specification are at different levels of maturity.

Some of the more major known issues are marked like this. There are many other issues that have been raised as well; the issues given in this document are not the only known issues! There are also some spec-wide issues that have not yet been addressed: case-sensitivity is a very poorly handled topic right now, and the firing of events needs to be unified (right now some bubble, some don't, they all use different text to fire events, etc). It would also be nice to unify the rules on downloading content when attributes change (e.g. src attributes) - should they initiate downloads when the element immediately, is inserted in the document, when active scripts end, etc. This matters e.g. if an attribute is set twice in a row (does it hit the network twice).

Table of contents

1. Introduction

This section is non-normative.

The World Wide Web's markup language has always been HTML. HTML was primarily designed as a language for semantically describing scientific documents, although its general design and adaptations over the years has enabled it to be used to describe a number of other types of documents.

The main area that has not been adequately addressed by HTML is a vague subject referred to as Web Applications. This specification attempts to rectify this, while at the same time updating the HTML specifications to address issues raised in the past few years.

1.1. Scope

This section is non-normative.

This specification is limited to providing a semantic-level markup language and associated semantic-level scripting APIs for authoring accessible pages on the Web ranging from static documents to dynamic applications.

The scope of this specification does not include addressing presentation concerns (although default rendering rules for Web browsers are included at the end of this specification).

The scope of this specification does not include documenting every HTML or DOM feature supported by Web browsers. Browsers support many features that are considered to be very bad for accessibility or that are otherwise inappropriate. For example, the blink element is clearly presentational and authors wishing to cause text to blink should instead use CSS.

The scope of this specification is not to describe an entire operating system. In particular, hardware configuration software, image manipulation tools, and applications that users would be expected to use with high-end workstations on a daily basis are out of scope. In terms of applications, this specification is targeted specifically at applications that would be expected to be used by users on an occasional basis, or regularly but from disparate locations, with low CPU requirements. For instance online purchasing systems, searching systems, games (especially multiplayer online games), public telephone books or address books, communications software (e-mail clients, instant messaging clients, discussion software), document editing software, etc.

For sophisticated cross-platform applications, there already exist several proprietary solutions (such as Mozilla's XUL, Adobe's Flash, or Microsoft's Silverlight). These solutions are evolving faster than any standards process could follow, and the requirements are evolving even faster. These systems are also significantly more complicated to specify, and are orders of magnitude more difficult to achieve interoperability with, than the solutions described in this document. Platform-specific solutions for such sophisticated applications (for example the MacOS X Core APIs) are even further ahead.

1.1.1. Relationship to HTML 4.01, XHTML 1.1, DOM2 HTML

This section is non-normative.

This specification represents a new version of HTML4 and XHTML1, along with a new version of the associated DOM2 HTML API. Migration from HTML4 or XHTML1 to the format and APIs described in this specification should in most cases be straightforward, as care has been taken to ensure that backwards-compatibility is retained.

This specification will eventually supplant Web Forms 2.0 as well. [WF2]

1.1.2. Relationship to XHTML2

This section is non-normative.

XHTML2 [XHTML2] defines a new HTML vocabulary with better features for hyperlinks, multimedia content, annotating document edits, rich metadata, declarative interactive forms, and describing the semantics of human literary works such as poems and scientific papers.

However, it lacks elements to express the semantics of many of the non-document types of content often seen on the Web. For instance, forum sites, auction sites, search engines, online shops, and the like, do not fit the document metaphor well, and are not covered by XHTML2.

This specification aims to extend HTML so that it is also suitable in these contexts.

XHTML2 and this specification use different namespaces and therefore can both be implemented in the same XML processor.

1.1.3. Relationship to XUL, Flash, Silverlight, and other proprietary UI languages

This section is non-normative.

This specification is independent of the various proprietary UI languages that various vendors provide. As an open, vender-neutral language, HTML provides for a solution to the same problems without the risk of vendor lock-in.

1.2. Structure of this specification

This section is non-normative.

This specification is divided into the following important sections:

The DOM The DOM, or Document Object Model, provides a base for the rest of the specification. The Semantics Documents are built from elements. These elements form a tree using the DOM. Each element also has a predefined meaning, which is explained in this section. User agent requirements for how to handle each element are also given, along with rules for authors on how to use the element. Browsing Contexts HTML documents do not exist in a vacuum — this section defines many of the features that affect environments that deal with multiple pages, links between pages, and running scripts. APIs The Editing APIs: HTML documents can provide a number of mechanisms for users to modify content, which are described in this section. The Communication APIs: Applications written in HTML often require mechanisms to communicate with remote servers, as well as communicating with other applications from different domains running on the same client. Repetition Templates: A mechanism to support repeating sections in forms. The Language Syntax All of these features would be for naught if they couldn't be represented in a serialised form and sent to other people, and so this section defines the syntax of HTML, along with rules for how to parse HTML.

There are also a couple of appendices, defining shims for WYSIWYG editors, rendering rules for Web browsers, and listing areas that are out of scope for this specification.

1.2.1. How to read this specification

This specification should be read like all other specifications. First, it should be read cover-to-cover, multiple times. Then, it should be read backwards at least once. Then it should be read by picking random sections from the contents list and following all the cross-references.

1.3. Conformance requirements

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

This specification describes the conformance criteria for user agents (relevant to implementors) and documents (relevant to authors and authoring tool implementors).

There is no implied relationship between document conformance requirements and implementation conformance requirements. User agents are not free to handle non-conformant documents as they please; the processing model described in this specification applies to implementations regardless of the conformity of the input documents.

User agents fall into several (overlapping) categories with different conformance requirements.

Web browsers and other interactive user agents Web browsers that support XHTML must process elements and attributes from the HTML namespace found in XML documents as described in this specification, so that users can interact with them, unless the semantics of those elements have been overridden by other specifications. A conforming XHTML processor would, upon finding an XHTML script element in an XML document, execute the script contained in that element. However, if the element is found within an XSLT transformation sheet (assuming the UA also supports XSLT), then the processor would instead treat the script element as an opaque element that forms part of the transform. Web browsers that support HTML must process documents labelled as text/html as described in this specification, so that users can interact with them. Non-interactive presentation user agents User agents that process HTML and XHTML documents purely to render non-interactive versions of them must comply to the same conformance criteria as Web browsers, except that they are exempt from requirements regarding user interaction. Typical examples of non-interactive presentation user agents are printers (static UAs) and overhead displays (dynamic UAs). It is expected that most static non-interactive presentation user agents will also opt to lack scripting support. A non-interactive but dynamic presentation UA would still execute scripts, allowing forms to be dynamically submitted, and so forth. However, since the concept of "focus" is irrelevant when the user cannot interact with the document, the UA would not need to support any of the focus-related DOM APIs. User agents with no scripting support Implementations that do not support scripting (or which have their scripting features disabled) are exempt from supporting the events and DOM interfaces mentioned in this specification. For the parts of this specification that are defined in terms of an events model or in terms of the DOM, such user agents must still act as if events and the DOM were supported. Scripting can form an integral part of an application. Web browsers that do not support scripting, or that have scripting disabled, might be unable to fully convey the author's intent. Conformance checkers Conformance checkers must verify that a document conforms to the applicable conformance criteria described in this specification. Conformance checkers are exempt from detecting errors that require interpretation of the author's intent (for example, while a document is non-conforming if the content of a blockquote element is not a quote, conformance checkers do not have to check that blockquote elements only contain quoted material). Conformance checkers must check that the input document conforms when scripting is disabled, and should also check that the input document conforms when scripting is enabled. (This is only a "SHOULD" and not a "MUST" requirement because it has been proven to be impossible. [HALTINGPROBLEM]) The term "HTML5 validator" can be used to refer to a conformance checker that itself conforms to the applicable requirements of this specification. XML DTDs cannot express all the conformance requirements of this specification. Therefore, a validating XML processor and a DTD cannot constitute a conformance checker. Also, since neither of the two authoring formats defined in this specification are applications of SGML, a validating SGML system cannot constitute a conformance checker either. To put it another way, there are three types of conformance criteria: Criteria that can be expressed in a DTD. Criteria that cannot be expressed by a DTD, but can still be checked by a machine. Criteria that can only be checked by a human. A conformance checker must check for the first two. A simple DTD-based validator only checks for the first class of errors and is therefore not a conforming conformance checker according to this specification. Data mining tools Applications and tools that process HTML and XHTML documents for reasons other than to either render the documents or check them for conformance should act in accordance to the semantics of the documents that they process. A tool that generates document outlines but increases the nesting level for each paragraph and does not increase the nesting level for each section would not be conforming. Authoring tools and markup generators Authoring tools and markup generators must generate conforming documents. Conformance criteria that apply to authors also apply to authoring tools, where appropriate. Authoring tools are exempt from the strict requirements of using elements only for their specified purpose, but only to the extent that authoring tools are not yet able to determine author intent. For example, it is not conforming to use an address element for arbitrary contact information; that element can only be used for marking up contact information for the author of the document or section. However, since an authoring tools is likely unable to determine the difference, an authoring tool is exempt from that requirement. In terms of conformance checking, an editor is therefore required to output documents that conform to the same extent that a conformance checker will verify. When an authoring tool is used to edit a non-conforming document, it may preserve the conformance errors in sections of the document that were not edited during the editing session (i.e. an editing tool is allowed to round-trip errorneous content). However, an authoring tool must not claim that the output is conformant if errors have been so preserved. Authoring tools are expected to come in two broad varieties: tools that work from structure or semantic data, and tools that work on a What-You-See-Is-What-You-Get media-specific editing basis (WYSIWYG). The former is the preferred mechanism for tools that author HTML, since the structure in the source information can be used to make informed choices regarding which HTML elements and attributes are most appropriate. However, WYSIWYG tools are legitimate, and this specification makes certain concessions to WYSIWYG editors. All authoring tools, whether WYSIWYG or not, should make a best effort attempt at enabling users to create well-structured, semantically rich, media-independent content.

Some conformance requirements are phrased as requirements on elements, attributes, methods or objects. Such requirements fall into two categories; those describing content model restrictions, and those describing implementation behaviour. The former category of requirements are requirements on documents and authoring tools. The second category are requirements on user agents.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

For compatibility with existing content and prior specifications, this specification describes two authoring formats: one based on XML (referred to as XHTML5 ), and one using a custom format inspired by SGML (referred to as HTML5 ). Implementations may support only one of these two formats, although supporting both is encouraged.

XHTML documents (XML documents using elements from the HTML namespace) that use the new features described in this specification and that are served over the wire (e.g. by HTTP) must be sent using an XML MIME type such as application/xml or application/xhtml+xml and must not be served as text/html . [RFC3023]

Such XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification.

According to the XML specification, XML processors are not guaranteed to process the external DTD subset referenced in the DOCTYPE. This means, for example, that using entities for characters in XHTML documents is unsafe (except for <, >, &, " and '). For interoperability, authors are advised to avoid optional features of XML.

HTML documents, if they are served over the wire (e.g. by HTTP) must be labelled with the text/html MIME type.

The language in this specification assumes that the user agent expands all entity references, and therefore does not include entity reference nodes in the DOM. If user agents do include entity reference nodes in the DOM, then user agents must handle them as if they were fully expanded when implementing this specification. For example, if a requirement talks about an element's child text nodes, then any text nodes that are children of an entity reference that is a child of that element would be used as well.

1.3.1. Common conformance requirements for APIs exposed to JavaScript

A lot of arrays/lists/collections in this spec assume zero-based indexes but use the term " index th" liberally. We should define those to be zero-based and be clearer about this.

Unless other specified, if a DOM attribute that is a floating point number type ( float ) is assigned an Infinity or Not-a-Number value, a NOT_SUPPORTED_ERR exception must be raised.

Unless other specified, if a DOM attribute that is a signed numeric type is assigned a negative value, a NOT_SUPPORTED_ERR exception must be raised.

Unless other specified, if a method with an argument that is a floating point number type ( float ) is passed an Infinity or Not-a-Number value, a NOT_SUPPORTED_ERR exception must be raised.

Unless other specified, if a method is passed fewer arguments than is defined for that method in its IDL definition, a NOT_SUPPORTED_ERR exception must be raised.

Unless other specified, if a method is passed more arguments than is defined for that method in its IDL definition, the excess arguments must be ignored.

Unless other specified, if a method is expecting, as one of its arguments, as defined by its IDL definition, an object implementing a particular interface X , and the argument passed is an object whose [[Class]] property is neither that interface X , nor the name of an interface Y where this specification requires that all objects implementing interface Y also implement interface X , nor the name of an interface that inherits from the expected interface X , then a TYPE_MISMATCH_ERR exception must be raised.

Anything else? Passing the wrong type of object, maybe? Implied conversions to int/float?

1.3.2. Dependencies

This specification relies on several other underlying specifications.

XML Implementations that support XHTML5 must support some version of XML, as well as its corresponding namespaces specification, because XHTML5 uses an XML serialisation with namespaces. [XML] [XMLNAMES] XML Base User agents must follow the rules given by XML Base to resolve relative URIs in HTML and XHTML fragments. That is the mechanism used in this specification for resolving relative URIs in DOM trees. [XMLBASE] It is possible for xml:base attributes to be present even in HTML fragments, as such attributes can be added dynamically using script. DOM Implementations must support some version of DOM Core and DOM Events, because this specification is defined in terms of the DOM, and some of the features are defined as extensions to the DOM Core interfaces. [DOM3CORE] [DOM3EVENTS] ECMAScript Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings for DOM Specifications specification, as this specification uses that specification's terminology. [EBFD]

This specification does not require support of any particular network transport protocols, style sheet language, scripting language, or any of the DOM and WebAPI specifications beyond those described above. However, the language described by this specification is biased towards CSS as the styling language, ECMAScript as the scripting language, and HTTP as the network protocol, and several features assume that those languages and protocols are in use.

This specification might have certain additional requirements on character encodings, image formats, audio formats, and video formats in the respective sections.

1.3.3. Features defined in other specifications

Some elements are defined in terms of their DOM textContent attribute. This is an attribute defined on the Node interface in DOM3 Core. [DOM3CORE]

Should textContent be defined differently for dir="" and <bdo>? Should we come up with an alternative to textContent that handles those and other things, like alt=""?

The interface is defined in DOM3 Core. [DOM3CORE]

The term activation behavior is used as defined in the DOM3 Events specification. [DOM3EVENTS] At the time of writing, DOM3 Events hadn't yet been updated to define that phrase.

The rules for handling alternative style sheets are defined in the CSS object model specification. [CSSOM]

See http://dev.w3.org/cvsweb/~checkout~/csswg/cssom/Overview.html?rev=1.35&content-type=text/html;%20charset=utf-8

Certain features are defined in terms of CSS <color> values. When the CSS value currentColor is specified in this context, the "computed value of the 'color' property" for the purposes of determining the computed value of the currentColor keyword is the computed value of the 'color' property on the element in question. [CSS3COLOR]

If a canvas gradient's addColorStop() method is called with the currentColor keyword as the color, then the computed value of the 'color' property on the canvas element is the one that is used.

1.4. Terminology

This specification refers to both HTML and XML attributes and DOM attributes, often in the same context. When it is not clear which is being referred to, they are referred to as content attributes for HTML and XML attributes, and DOM attributes for those from the DOM. Similarly, the term "properties" is used for both ECMAScript object properties and CSS properties. When these are ambiguous they are qualified as object properties and CSS properties respectively.

To ease migration from HTML to XHTML, UAs conforming to this specification will place elements in HTML in the http://www.w3.org/1999/xhtml namespace, at least for the purposes of the DOM and CSS. The term " elements in the HTML namespace ", or " HTML elements " for short, when used in this specification, thus refers to both HTML and XHTML elements.

Unless otherwise stated, all elements defined or mentioned in this specification are in the http://www.w3.org/1999/xhtml namespace, and all attributes defined or mentioned in this specification have no namespace (they are in the per-element partition).

The term HTML documents is sometimes used in contrast with XML documents to mean specifically documents that were parsed using an HTML parser (as opposed to using an XML parser or created purely through the DOM).

Generally, when the specification states that a feature applies to HTML or XHTML, it also includes the other. When a feature specifically only applies to one of the two languages, it is called out by explicitly stating that it does not apply to the other format, as in "for HTML, ... (this does not apply to XHTML)".

This specification uses the term document to refer to any use of HTML, ranging from short static documents to long essays or reports with rich multimedia, as well as to fully-fledged interactive applications.

For readability, the term URI is used to refer to both ASCII URIs and Unicode IRIs, as those terms are defined by RFC 3986 and RFC 3987 respectively. On the rare occasions where IRIs are not allowed but ASCII URIs are, this is called out explicitly. [RFC3986] [RFC3987]

The term root element , when not qualified to explicitly refer to the document's root element, means the furthest ancestor element node of whatever node is being discussed, or the node itself is there is none. When the node is a part of the document, then that is indeed the document's root element. However, if the node is not currently part of the document tree, the root element will be an orphaned node.

An element is said to have been inserted into a document when its root element changes and is now the document's root element.

The term tree order means a pre-order, depth-first traversal of DOM nodes involved (through the parentNode / childNodes relationship).

When it is stated that some element or attribute is ignored , or treated as some other value, or handled as if it was something else, this refers only to the processing of the node after it is in the DOM. A user agent must not mutate the DOM in such situations.

When an XML name, such as an attribute or element name, is referred to in the form prefix : localName , as in xml:id or svg:rect , it refers to a name with the local name localName and the namespace given by the prefix, as defined by the following table:

xml http://www.w3.org/XML/1998/namespace html http://www.w3.org/1999/xhtml svg http://www.w3.org/2000/svg

For simplicity, terms such as shown, displayed, and visible might sometimes be used when referring to the way a document is rendered to the user. These terms are not meant to imply a visual medium; they must be considered to apply to other media in equivalent ways.

Various DOM interfaces are defined in this specification using pseudo-IDL. This looks like OMG IDL but isn't. For instance, method overloading is used, and types from the W3C DOM specifications are used without qualification. Language-specific bindings for these abstract interface definitions must be derived in the way consistent with W3C DOM specifications. Some interface-specific binding information for ECMAScript is included in this specification.

The current situation with IDL blocks is pitiful. IDL is totally inadequate to properly represent what objects have to look like in JS; IDL can't say if a member is enumerable, what the indexing behaviour is, what the stringification behaviour is, what behaviour setting a member whose type is a particular interface should be (e.g. setting of document.location or element.className), what constructor an object implementing an interface should claim to have, how overloads work, etc. I think we should make the IDL blocks non-normative, and/or replace them with something else that is better for JS while still being clear on how it applies to other languages. However, we do need to have something that says what types the methods take as arguments, since we have to raise exceptions if they are wrong.

The construction "a Foo object", where Foo is actually an interface, is sometimes used instead of the more accurate "an object implementing the interface Foo ".

A DOM attribute is said to be getting when its value is being retrieved (e.g. by author script), and is said to be setting when a new value is assigned to it.

If a DOM object is said to be live , then that means that any attributes returning that object must always return the same object (not a new object each time), and the attributes and methods on that object must operate on the actual underlying data, not a snapshot of the data.

The terms fire and dispatch are used interchangeably in the context of events, as in the DOM Events specifications. [DOM3EVENTS]

The term text node refers to any Text node, including CDATASection nodes (any Node with node type 3 or 4).

Some of the algorithms in this specification, for historical reasons, require the user agent to pause until some condition has been met. While a user agent is paused, it must ensure that no scripts execute (e.g. no event handlers, no timers, etc). User agents should remain responsive to user input while paused, however.

1.4.1. HTML vs XHTML

This section is non-normative.

This specification defines an abstract language for describing documents and applications, and some APIs for interacting with in-memory representations of resources that use this language.

The in-memory representation is known as "DOM5 HTML", or "the DOM" for short.

There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.

The first such concrete syntax is "HTML5". This is the format recommended for most authors. It is compatible with all legacy Web browsers. If a document is transmitted with the MIME type text/html , then it will be processed as an "HTML5" document by Web browsers.

The second concrete syntax uses XML, and is known as "XHTML5". When a document is transmitted with an XML MIME type, such as application/xhtml+xml , then it is processed by an XML processor by Web browsers, and treated as an "XHTML5" document. Authors are reminded that the processing for XML and HTML differs; in particular, even minor syntax errors will prevent an XML document from being rendered fully, whereas they would be ignored in the "HTML5" syntax.

The "DOM5 HTML", "HTML5", and "XHTML5" representations cannot all represent the same content. For example, namespaces cannot be represented using "HTML5", but they are supported in "DOM5 HTML" and "XHTML5". Similarly, documents that use the noscript feature can be represented using "HTML5", but cannot be represented with "XHTML5" and "DOM5 HTML". Comments that contain the string " --> " can be represented in "DOM5 HTML" but not in "HTML5" and "XHTML5". And so forth.

2. The Document Object Model

The Document Object Model (DOM) is a representation — a model — of a document and its content. [DOM3CORE] The DOM is not just an API; the conformance criteria of HTML implementations are defined, in this specification, in terms of operations on the DOM.

This specification defines the language represented in the DOM by features together called DOM5 HTML. DOM5 HTML consists of DOM Core Document nodes and DOM Core Element nodes, along with text nodes and other content.

Elements in the DOM represent things; that is, they have intrinsic meaning, also known as semantics.

For example, an ol element represents an ordered list.

In addition, documents and elements in the DOM host APIs that extend the DOM Core APIs, providing new features to application developers using DOM5 HTML.

2.1. Documents

Every XML and HTML document in an HTML UA is represented by a Document object. [DOM3CORE]

Document objects are assumed to be XML documents unless they are flagged as being HTML documents when they are created. Whether a document is an HTML document or an XML document affects the behaviour of certain APIs, as well as a few CSS rendering rules. [CSS21]

A Document object created by the createDocument() API on the DOMImplementation object is initially an XML document, but can be made into an HTML document by calling document.open() on it.

All Document objects (in user agents implementing this specification) must also implement the HTMLDocument interface, available using binding-specific methods. (This is the case whether or not the document in question is an HTML document or indeed whether it contains any HTML elements at all.) Document objects must also implement the document-level interface of any other namespaces found in the document that the UA supports. For example, if an HTML implementation also supports SVG, then the Document object must implement HTMLDocument and SVGDocument .

Because the HTMLDocument interface is now obtained using binding-specific casting methods instead of simply being the primary interface of the document object, it is no longer defined as inheriting from Document .

Since the HTMLDocument interface holds methods and attributes related to a number of disparate features, the members of this interface are described in various different sections.

2.1.1. Security

User agents must raise a security exception whenever any of the members of an HTMLDocument object are accessed by scripts whose origin is not the same as the Document 's origin.

2.1.2. Resource metadata management

The URL attribute must return the document's address .

The domain attribute must be initialised to the document's domain, if it has one, and null otherwise. On getting, the attribute must return its current value. On setting, if the new value is an allowed value (as defined below), the attribute's value must be changed to the new value. If the new value is not an allowed value, then a security exception must be raised instead.

A new value is an allowed value for the document.domain attribute if it is equal to the attribute's current value, or if the new value, prefixed by a U+002E FULL STOP ("."), exactly matches the end of the current value. If the current value is null, new values other than null will never be allowed.

If the Document object's address is hierarchical and uses a server-based naming authority, then its domain is the <host>/<ihost> part of that address. Otherwise, it has no domain.

The domain attribute is used to enable pages on different hosts of a domain to access each others' DOMs, though this is not yet defined by this specification.

we should handle IP addresses here

The referrer attribute must return either the URI of the page which navigated the browsing context to the current document (if any), or the empty string if there is no such originating page, or if the UA has been configured not to report referrers, or if the navigation was initiated for a hyperlink with a noreferrer keyword.

In the case of HTTP, the referrer DOM attribute will match the Referer (sic) header that was sent when fetching the current page.

The cookie attribute must, on getting, return the same string as the value of the Cookie HTTP header it would include if fetching the resource indicated by the document's address over HTTP, as per RFC 2109 section 4.3.4. [RFC2109]

On setting, the cookie attribute must cause the user agent to act as it would when processing cookies if it had just attempted to fetch the document's address over HTTP, and had received a response with a Set-Cookie header whose value was the specified value, as per RFC 2109 sections 4.3.1, 4.3.2, and 4.3.3. [RFC2109]

Since the cookie attribute is accessible across frames, the path restrictions on cookies are only a tool to help manage which cookies are sent to which parts of the site, and are not in any way a security feature.

The lastModified attribute, on getting, must return the date and time of the Document 's source file's last modification, in the user's local timezone, in the following format:

The month component of the date. A U+002F SOLIDUS character ('/'). The day component of the date. A U+002F SOLIDUS character ('/'). The year component of the date. A U+0020 SPACE character. The hours component of the time. A U+003A COLON character (':'). The minutes component of the time. A U+003A COLON character (':'). The seconds component of the time.

All the numeric components above, other than the year, must be given as two digits in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE representing the number in base ten, zero-padded if necessary.

The Document 's source file's last modification date and time must be derived from relevant features of the networking protocols used, e.g. from the value of the HTTP Last-Modified header of the document, or from metadata in the filesystem for local files. If the last modification date and time are not known, the attribute must return the string 01/01/1970 00:00:00 .

The compatMode DOM attribute must return the literal string " CSS1Compat " unless the document has been set to quirks mode by the HTML parser, in which case it must instead return the literal string " BackCompat ". The document can also be set to limited quirks mode (also known as "almost standards" mode). By default, the document is set to no quirks mode (also known as "standards mode").

As far as parsing goes, the quirks I know of are: Comment parsing is different.

p can contain table

can contain Safari and IE have special parsing rules for <% ... %> (even in standards mode, though clearly this should be quirks-only).

2.2. Elements

The nodes representing HTML elements in the DOM must implement, and expose to scripts, the interfaces listed for them in the relevant sections of this specification. This includes XHTML elements in XML documents, even when those documents are in another context (e.g. inside an XSLT transform).

The basic interface, from which all the HTML elements' interfaces inherit, and which must be used by elements that have no additional requirements, is the HTMLElement interface.

As with the HTMLDocument interface, the HTMLElement interface holds methods and attributes related to a number of disparate features, and the members of this interface are therefore described in various different sections of this specification.

2.2.1. Reflecting content attributes in DOM attributes

Some DOM attributes are defined to reflect a particular content attribute. This means that on getting, the DOM attribute returns the current value of the content attribute, and on setting, the DOM attribute changes the value of the content attribute to the given value.

If a reflecting DOM attribute is a DOMString attribute whose content attribute is defined to contain a URI, then on getting, the DOM attribute must return the value of the content attribute, resolved to an absolute URI, and on setting, must set the content attribute to the specified literal value. If the content attribute is absent, the DOM attribute must return the default value, if the content attribute has one, or else the empty string.

If a reflecting DOM attribute is a DOMString attribute whose content attribute is defined to contain one or more URIs, then on getting, the DOM attribute must split the content attribute on spaces and return the concatenation of each token URI, resolved to an absolute URI, with a single U+0020 SPACE character between each URI; and on setting, must set the content attribute to the specified literal value. If the content attribute is absent, the DOM attribute must return the default value, if the content attribute has one, or else the empty string.

If a reflecting DOM attribute is a DOMString whose content attribute is an enumerated attribute, and the DOM attribute is limited to only known values , then, on getting, the DOM attribute must return the value associated with the state the attribute is in (in its canonical case), or the empty string if the attribute is in a state that has no associated keyword value; and on setting, if the new value case-insensitively matches one of the keywords given for that attribute, then the content attribute must be set to that value, otherwise, if the new value is the empty string, then the content attribute must be removed, otherwise, the setter must raise a SYNTAX_ERR exception.

If a reflecting DOM attribute is a DOMString but doesn't fall into any of the above categories, then the getting and setting must be done in a transparent, case-preserving manner.

If a reflecting DOM attribute is a boolean attribute, then the DOM attribute must return true if the attribute is set, and false if it is absent. On setting, the content attribute must be removed if the DOM attribute is set to false, and must be set to have the same value as its name if the DOM attribute is set to true. (This corresponds to the rules for boolean content attributes.)

If a reflecting DOM attribute is a signed integer type ( long ) then the content attribute must be parsed according to the rules for parsing signed integers first. If that fails, or if the attribute is absent, the default value must be returned instead, or 0 if there is no default value. On setting, the given value must be converted to a string representing the number as a valid integer in base ten and then that string must be used as the new content attribute value.

If a reflecting DOM attribute is an unsigned integer type ( unsigned long ) then the content attribute must be parsed according to the rules for parsing unsigned integers first. If that fails, or if the attribute is absent, the default value must be returned instead, or 0 if there is no default value. On setting, the given value must be converted to a string representing the number as a valid non-negative integer in base ten and then that string must be used as the new content attribute value.

If a reflecting DOM attribute is an unsigned integer type ( unsigned long ) that is limited to only positive non-zero numbers , then the behavior is similar to the previous case, but zero is not allowed. On getting, the content attribute must first be parsed according to the rules for parsing unsigned integers, and if that fails, or if the attribute is absent, the default value must be returned instead, or 1 if there is no default value. On setting, if the value is zero, the user agent must fire an INDEX_SIZE_ERR exception. Otherwise, the given value must be converted to a string representing the number as a valid non-negative integer in base ten and then that string must be used as the new content attribute value.

If a reflecting DOM attribute is a floating point number type ( float ) and the content attribute is defined to contain a time offset, then the content attribute must be parsed according to the rules for parsing time ofsets first. If that fails, or if the attribute is absent, the default value must be returned instead, or the not-a-number value (NaN) if there is no default value. On setting, the given value must be converted to a string using the time offset serialisation rules, and that string must be used as the new content attribute value.

If a reflecting DOM attribute is of the type DOMTokenList , then on getting it must return a DOMTokenList object whose underlying string is the element's corresponding content attribute. When the DOMTokenList object mutates its underlying string, the attribute must itself be immediately mutated. When the attribute is absent, then the string represented by the DOMTokenList object is the empty string; when the object mutates this empty string, the user agent must first add the corresponding content attribute, and then mutate that attribute instead. DOMTokenList attributes are always read-only. The same DOMTokenList object must be returned every time for each attribute.

If a reflecting DOM attribute has the type HTMLElement , or an interface that descends from HTMLElement , then, on getting, it must run the following algorithm (stopping at the first point where a value is returned):

If the corresponding content attribute is absent, then the DOM attribute must return null. Let candidate be the element that the document.getElementById() method would find if it was passed as its argument the current value of the corresponding content attribute. If candidate is null, or if it is not type-compatible with the DOM attribute, then the DOM attribute must return null. Otherwise, it must return candidate .

On setting, if the given element has an id attribute, then the content attribute must be set to the value of that id attribute. Otherwise, the DOM attribute must be set to the empty string.

2.3. Common DOM interfaces

2.3.1. Collections

The HTMLCollection , HTMLFormControlsCollection , and HTMLOptionsCollection interfaces represent various lists of DOM nodes. Collectively, objects implementing these interfaces are called collections .

When a collection is created, a filter and a root are associated with the collection.

For example, when the HTMLCollection object for the document.images attribute is created, it is associated with a filter that selects only img elements, and rooted at the root of the document.

The collection then represents a live view of the subtree rooted at the collection's root, containing only nodes that match the given filter. The view is linear. In the absence of specific requirements to the contrary, the nodes within the collection must be sorted in tree order.

The rows list is not in tree order.

An attribute that returns a collection must return the same object every time it is retrieved.

2.3.1.1. HTMLCollection

The HTMLCollection interface represents a generic collection of elements.

interface HTMLCollection { readonly attribute unsigned long length; Element item(in unsigned long index); Element namedItem(in DOMString name); };

The length attribute must return the number of nodes represented by the collection.

The item( index ) method must return the index th node in the collection. If there is no index th node in the collection, then the method must return null.

The namedItem( key ) method must return the first node in the collection that matches the following requirements:

It is an a , applet , area , form , img , or object element with a name attribute equal to key , or,

, , , , , or element with a attribute equal to , or, It is an HTML element of any kind with an id attribute equal to key . (Non-HTML elements, even if they have IDs, are not searched for the purposes of namedItem() .)

If no such elements are found, then the method must return null.

In ECMAScript implementations, objects that implement the HTMLCollection interface must also have a [[Get]] method that, when invoked with a property name that is a number, acts like the item() method would when invoked with that argument, and when invoked with a property name that is a string, acts like the namedItem() method would when invoked with that argument.

2.3.1.2. HTMLFormControlsCollection

The HTMLFormControlsCollection interface represents a collection of form controls.

interface HTMLFormControlsCollection { readonly attribute unsigned long length; HTMLElement item(in unsigned long index); Object namedItem(in DOMString name); };

The length attribute must return the number of nodes represented by the collection.

The item( index ) method must return the index th node in the collection. If there is no index th node in the collection, then the method must return null.

The namedItem( key ) method must act according to the following algorithm:

If, at the time the method is called, there is exactly one node in the collection that has either an id attribute or a name attribute equal to key , then return that node and stop the algorithm. Otherwise, if there are no nodes in the collection that have either an id attribute or a name attribute equal to key , then return null and stop the algorithm. Otherwise, create a NodeList object representing a live view of the HTMLFormControlsCollection object, further filtered so that the only nodes in the NodeList object are those that have either an id attribute or a name attribute equal to key . The nodes in the NodeList object must be sorted in tree order. Return that NodeList object.

In the ECMAScript DOM binding, objects implementing the HTMLFormControlsCollection interface must support being dereferenced using the square bracket notation, such that dereferencing with an integer index is equivalent to invoking the item() method with that index, and such that dereferencing with a string index is equivalent to invoking the namedItem() method with that index.

2.3.1.3. HTMLOptionsCollection

The HTMLOptionsCollection interface represents a list of option elements.

interface HTMLOptionsCollection { attribute unsigned long length; HTMLOptionElement item(in unsigned long index); Object namedItem(in DOMString name); };

On getting, the length attribute must return the number of nodes represented by the collection.

On setting, the behaviour depends on whether the new value is equal to, greater than, or less than the number of nodes represented by the collection at that time. If the number is the same, then setting the attribute must do nothing. If the new value is greater, then n new option elements with no attributes and no child nodes must be appended to the select element on which the HTMLOptionsCollection is rooted, where n is the difference between the two numbers (new value minus old value). If the new value is lower, then the last n nodes in the collection must be removed from their parent nodes, where n is the difference between the two numbers (old value minus new value).

Setting length never removes or adds any optgroup elements, and never adds new children to existing optgroup elements (though it can remove children from them).

The item( index ) method must return the index th node in the collection. If there is no index th node in the collection, then the method must return null.

The namedItem( key ) method must act according to the following algorithm:

If, at the time the method is called, there is exactly one node in the collection that has either an id attribute or a name attribute equal to key , then return that node and stop the algorithm. Otherwise, if there are no nodes in the collection that have either an id attribute or a name attribute equal to key , then return null and stop the algorithm. Otherwise, create a NodeList object representing a live view of the HTMLOptionsCollection object, further filtered so that the only nodes in the NodeList object are those that have either an id attribute or a name attribute equal to key . The nodes in the NodeList object must be sorted in tree order. Return that NodeList object.

In the ECMAScript DOM binding, objects implementing the HTMLOptionsCollection interface must support being dereferenced using the square bracket notation, such that dereferencing with an integer index is equivalent to invoking the item() method with that index, and such that dereferencing with a string index is equivalent to invoking the namedItem() method with that index.

We may want to add add() and remove() methods here too because IE implements HTMLSelectElement and HTMLOptionsCollection on the same object, and so people use them almost interchangeably in the wild.

2.3.2. DOMTokenList

The DOMTokenList interface represents an interface to an underlying string that consists of an unordered set of unique space-separated tokens.

Which string underlies a particular DOMTokenList object is defined when the object is created. It might be a content attribute (e.g. the string that underlies the classList object is the class attribute), or it might be an anonymous string (e.g. when a DOMTokenList object is passed to an author-implemented callback in the datagrid APIs).

interface DOMTokenList { readonly attribute unsigned long length; DOMString item(in unsigned long index); boolean has(in DOMString token); void add(in DOMString token); void remove(in DOMString token); boolean toggle(in DOMString token); };

The length attribute must return the number of unique tokens that result from splitting the underlying string on spaces.

The item( index ) method must split the underlying string on spaces, sort the resulting list of tokens by Unicode codepoint , remove exact duplicates, and then return the index th item in this list. If index is equal to or greater than the number of tokens, then the method must return null.

In ECMAScript implementations, objects that implement the DOMTokenList interface must also have a [[Get]] method that, when invoked with a property name that is a number, acts like the item() method would when invoked with that argument.

The has( token ) method must run the following algorithm:

If the token argument contains any spaces , then raise an INVALID_CHARACTER_ERR exception and stop the algorithm. Otherwise, split the underlying string on spaces to get the list of tokens in the object's underlying string. If the token indicated by token is one of the tokens in the object's underlying string then return true and stop this algorithm. Otherwise, return false.

The add( token ) method must run the following algorithm:

If the token argument contains any spaces , then raise an INVALID_CHARACTER_ERR exception and stop the algorithm. Otherwise, split the underlying string on spaces to get the list of tokens in the object's underlying string. If the given token is already one of the tokens in the DOMTokenList object's underlying string then stop the algorithm. Otherwise, if the last character of the DOMTokenList object's underlying string is not a space character, then append a U+0020 SPACE character to the end of that string. Append the value of token to the end of the DOMTokenList object's underlying string.

The remove( token ) method must run the following algorithm:

If the token argument contains any spaces, then raise an INVALID_CHARACTER_ERR exception and stop the algorithm. Otherwise, remove the given token from the underlying string.

The toggle( token ) method must run the following algorithm:

If the token argument contains any spaces , then raise an INVALID_CHARACTER_ERR exception and stop the algorithm. Otherwise, split the underlying string on spaces to get the list of tokens in the object's underlying string. If the given token is already one of the tokens in the DOMTokenList object's underlying string then remove the given token from the underlying string, and stop the algorithm, returning false. Otherwise, if the last character of the DOMTokenList object's underlying string is not a space character, then append a U+0020 SPACE character to the end of that string. Append the value of token to the end of the DOMTokenList object's underlying string. Return true.

In the ECMAScript DOM binding, objects implementing the DOMTokenList interface must stringify to the object's underlying string representation.

2.3.3. DOM feature strings

DOM3 Core defines mechanisms for checking for interface support, and for obtaining implementations of interfaces, using feature strings. [DOM3CORE]

A DOM application can use the hasFeature( feature , version ) method of the DOMImplementation interface with parameter values " HTML " and " 5.0 " (respectively) to determine whether or not this module is supported by the implementation. In addition to the feature string " HTML ", the feature string " XHTML " (with version string " 5.0 ") can be used to check if the implementation supports XHTML. User agents should respond with a true value when the hasFeature method is queried with these values. Authors are cautioned, however, that UAs returning true might not be perfectly compliant, and that UAs returning false might well have support for features in this specification; in general, therefore, use of this method is discouraged.

The values " HTML " and " XHTML " (both with version " 5.0 ") should also be supported in the context of the getFeature() and isSupported() methods, as defined by DOM3 Core.

The interfaces defined in this specification are not always supersets of the interfaces defined in DOM2 HTML; some features that were formerly deprecated, poorly supported, rarely used or considered unnecessary have been removed. Therefore it is not guarenteed that an implementation that supports " HTML " " 5.0 " also supports " HTML " " 2.0 ".

2.4. DOM tree accessors

The html element of a document is the document's root element, if there is one and it's an html element, or null otherwise.

The head element of a document is the first head element that is a child of the html element, if there is one, or null otherwise.

The title element of a document is the first title element that is a child of the head element, if there is one, or null otherwise.

The title attribute must, on getting, run the following algorithm:

If the root element is an svg element in the " http://www.w3.org/2000/svg " namespace, and the user agent supports SVG, then the getter must return the value that would have been returned by the DOM attribute of the same name on the SVGDocument interface. Otherwise, it must return a concatenation of the data of all the child text nodes of the title element, in tree order, or the empty string if the title element is null.

On setting, the following algorithm must be run:

If the root element is an svg element in the " http://www.w3.org/2000/svg " namespace, and the user agent supports SVG, then the setter must defer to the setter for the DOM attribute of the same name on the SVGDocument interface. Stop the algorithm here. If the head element is null, then the attribute must do nothing. Stop the algorithm here. If the title element is null, then a new title element must be created and appended to the head element. The children of the title element (if any) must all be removed. A single Text node whose data is the new value being assigned must be appended to the title element.

The title attribute on the HTMLDocument interface should shadow the attribute of the same name on the SVGDocument interface when the user agent supports both HTML and SVG.

The body element of a document is the first child of the html element that is either a body element or a frameset element. If there is no such element, it is null. If the body element is null, then when the specification requires that events be fired at "the body element", they must instead be fired at the Document object.

The body attribute, on getting, must return the body element of the document (either a body element, a frameset element, or null). On setting, the following algorithm must be run:

If the new value is not a body or frameset element, then raise a HIERARCHY_REQUEST_ERR exception and abort these steps. Otherwise, if the new value is the same as the body element, do nothing. Abort these steps. Otherwise, if the body element is not null, then replace that element with the new value in the DOM, as if the root element's replaceChild() method had been called with the new value and the incumbent body element as its two arguments respectively, then abort these steps. Otherwise, the the body element is null. Append the new value to the root element.

The images attribute must return an HTMLCollection rooted at the Document node, whose filter matches only img elements.

The attribute must return an HTMLCollection rooted at the Document node, whose filter matches only a elements with href attributes and area elements with href attributes.

The forms attribute must return an HTMLCollection rooted at the Document node, whose filter matches only form elements.

The anchors attribute must return an HTMLCollection rooted at the Document node, whose filter matches only a elements with name attributes.

The getElementsByName( name ) method a string name , and must return a live NodeList containing all the a , applet , button , form , iframe , img , input , map , meta , object , select , and textarea elements in that document that have a name attribute whose value is equal to the name argument.

The getElementsByClassName( classNames ) method takes a string that contains an unordered set of unique space-separated tokens representing classes. When called, the method must return a live NodeList object containing all the elements in the document that have all the classes specified in that argument, having obtained the classes by splitting a string on spaces. If there are no tokens specified in the argument, then the method must return an empty NodeList .

The getElementsByClassName() method on the HTMLElement interface must return a live NodeList with the nodes that the HTMLDocument getElementsByClassName() method would return when passed the same argument(s), excluding any elements that are not descendants of the HTMLElement object on which the method was invoked.

HTML, SVG, and MathML elements define which classes they are in by having an attribute in the per-element partition with the name class containing a space-separated list of classes to which the element belongs. Other specifications may also allow elements in their namespaces to be labelled as being in specific classes. UAs must not assume that all attributes of the name class for elements in any namespace work in this way, however, and must not assume that such attributes, when used as global attributes, label other elements as being in specific classes.

Given the following XHTML fragment: <div id="example"> <p id="p1" class="aaa bbb"/> <p id="p2" class="aaa ccc"/> <p id="p3" class="bbb ccc"/> </div> A call to document.getElementById('example').getElementsByClassName('aaa') would return a NodeList with the two paragraphs p1 and p2 in it. A call to getElementsByClassName('ccc bbb') would only return one node, however, namely p3 . A call to document.getElementById('example').getElementsByClassName('bbb ccc ') would return the same thing. A call to getElementsByClassName('aaa,bbb') would return no nodes; none of the elements above are in the "aaa,bbb" class.

The dir attribute on the HTMLDocument interface is defined along with the dir content attribute.

2.5. Dynamic markup insertion

The document.write() family of methods and the innerHTML family of DOM attributes enable script authors to dynamically insert markup into the document.

bz argues that innerHTML should be called something else on XML documents and XML elements. Is the sanity worth the migration pain?

Because these APIs interact with the parser, their behaviour varies depending on whether they are used with HTML documents (and the HTML parser) or XHTML in XML documents (and the XML parser). The following table cross-references the various versions of these APIs.

Regardless of the parsing mode, the document.writeln(...) method must call the document.write() method with the same argument(s), and then call the document.write() method with, as its argument, a string consisting of a single line feed character (U+000A).

2.5.1. Controlling the input stream

The open() method comes in several variants with different numbers of arguments.

When called with two or fewer arguments, the method must act as follows:

Let type be the value of the first argument, if there is one, or " text/html " otherwise. Let replace be true if there is a second argument and it has the value "replace" , and false otherwise. If the document has an active parser that isn't a script-created parser, and the insertion point associated with that parser's input stream is not undefined (that is, it does point to somewhere in the input stream), then the method does nothing. Abort these steps and return the Document object on which the method was invoked. This basically causes document.open() to be ignored when it's called in an inline script found during the parsing of data sent over the network, while still letting it have an effect when called asynchronously or on a document that is itself being spoon-fed using these APIs. onbeforeunload, onunload If the document has an active parser , then stop that parser, and throw away any pending content in the input stream. what about if it doesn't, because it's either like a text/plain, or Atom, or PDF, or XHTML, or image document, or something? Remove all child nodes of the document. Create a new HTML parser and associate it with the document. This is a script-created parser (meaning that it can be closed by the document.open() and document.close() methods, and that the tokeniser will wait for an explicit call to document.close() before emitting an end-of-file token). Mark the document as being an HTML document (it might already be so-marked). If type does not have the value " text/html " , then act as if the tokeniser had emitted a pre element start tag, then set the HTML parser's tokenisation stage's content model flag to PLAINTEXT. If replace is false, then: Remove all the entries in the browsing context's session history after the current entry in its Document 's History object Remove any earlier entries that share the same Document Add a new entry just before the last entry that is associated with the text that was parsed by the previous parser associated with the Document object, as well as the state of the document at the start of these steps. (This allows the user to step backwards in the session history to see the page before it was blown away by the document.open() call.) Finally, set the insertion point to point at just before the end of the input stream (which at this point will be empty). Return the Document on which the method was invoked.

We shouldn't hard-code text/plain there. We should do it some other way, e.g. hand off to the section on content-sniffing and handling of incoming data streams, the part that defines how this all works when stuff comes over the network.

When called with three or more arguments, the open() method on the HTMLDocument object must call the open() method on the Window interface of the object returned by the defaultView attribute of the DocumentView interface of the HTMLDocument object, with the same arguments as the original call to the open() method, and return whatever that method returned. If the defaultView attribute of the DocumentView interface of the HTMLDocument object is null, then the method must raise an INVALID_ACCESS_ERR exception.

The close() method must do nothing if there is no script-created parser associated with the document. If there is such a parser, then, when the method is called, the user agent must insert an explicit "EOF" character at the insertion point of the parser's input stream.

2.5.2. Dynamic markup insertion in HTML

In HTML, the document.write(...) method must act as follows:

In HTML, the innerHTML DOM attribute of all HTMLElement and HTMLDocument nodes returns a serialisation of the node's children using the HTML syntax . On setting, it replaces the node's children with new nodes that result from parsing the given value. The formal definitions follow.

On getting, the innerHTML DOM attribute must return the result of running the HTML fragment serialisation algorithm on the node.

On setting, if the node is a document, the innerHTML DOM attribute must run the following algorithm:

If the document has an active parser , then stop that parser, and throw away any pending content in the input stream. what about if it doesn't, because it's either like a text/plain, or Atom, or PDF, or XHTML, or image document, or something? Remove the children nodes of the Document whose innerHTML attribute is being set. Create a new HTML parser, in its initial state, and associate it with the Document node. Place into the input stream for the HTML parser just created the string being assigned into the innerHTML attribute. Start the parser and let it run until it has consumed all the characters just inserted into the input stream. (The Document node will have been populated with elements and a load event will have fired on its body element.)

Otherwise, if the node is an element, then setting the innerHTML DOM attribute must cause the following algorithm to run instead:

Invoke the HTML fragment parsing algorithm, with the element whose innerHTML attribute is being set as the context and the string being assigned into the innerHTML attribute as the input . Let new children be the result of this algorithm. Remove the children of the element whose innerHTML attribute is being set. Let target document be the ownerDocument of the Element node whose innerHTML attribute is being set. Set the ownerDocument of all the nodes in new children to the target document . Append all the new children nodes to the node whose innerHTML attribute is being set, preserving their order.

script elements inserted using innerHTML do not execute when they are inserted.

2.5.3. Dynamic markup insertion in XML

In an XML context, the document.write() method must raise an INVALID_ACCESS_ERR exception.

On the other hand, however, the innerHTML attribute is indeed usable in an XML context.

In an XML context, the innerHTML DOM attribute on HTMLElement s and HTMLDocument s, on getting, must return a string in the form of an internal general parsed entity that is XML namespace-well-formed, the string being an isomorphic serialisation of all of that node's child nodes, in document order. User agents may adjust prefixes and namespace declarations in the serialisation (and indeed might be forced to do so in some cases to obtain namespace-well-formed XML). [XML] [XMLNS]

If any of the following cases are found in the DOM being serialised, the user agent must raise an INVALID_STATE_ERR exception:

A DocumentType node that has an external subset public identifier or an external subset system identifier that contains both a U+0022 QUOTATION MARK ('"') and a U+0027 APOSTROPHE ("'").

node that has an external subset public identifier or an external subset system identifier that contains both a U+0022 QUOTATION MARK ('"') and a U+0027 APOSTROPHE ("'"). A node with a prefix or local name containing a U+003A COLON (":").

A Text node whose data contains characters that are not matched by the XML Char production. [XML]

node whose data contains characters that are not matched by the XML production. [XML] A CDATASection node whose data contains the string " ]]> ".

node whose data contains the string " ". A Comment node whose data contains two adjacent U+002D HYPHEN-MINUS (-) characters or ends with such a character.

node whose data contains two adjacent U+002D HYPHEN-MINUS (-) characters or ends with such a character. A ProcessingInstruction node whose target name is the string " xml " (case insensitively) .

node whose target name is the string " " (case insensitively) . A ProcessingInstruction node whose target name contains a U+003A COLON (":").

node whose target name contains a U+003A COLON (":"). A ProcessingInstruction node whose data contains the string " ?> ".

These are the only ways to make a DOM unserialisable. The DOM enforces all the other XML constraints; for example, trying to set an attribute with a name that contains an equals sign (=) will raised an INVALID_CHARACTER_ERR exception.

On setting, in an XML context, the innerHTML DOM attribute on HTMLElement s and HTMLDocument s must run the following algorithm:

The user agent must create a new XML parser. If the innerHTML attribute is being set on an element, the user agent must feed the parser just created the string corresponding to the start tag of that element, declaring all the namespace prefixes that are in scope on that element in the DOM, as well as declaring the default namespace (if any) that is in scope on that element in the DOM. The user agent must feed the parser just created the string being assigned into the innerHTML attribute. If the innerHTML attribute is being set on an element, the user agent must feed the parser the string corresponding to the end tag of that element. If the parser found a well-formedness error, the attribute's setter must raise a SYNTAX_ERR exception and abort these steps. The user agent must remove the children nodes of the node whose innerHTML attribute is being set. If the attribute is being set on a Document node, let new children be the children of the document, preserving their order. Otherwise, the attribute is being set on an Element node; let new children be the children of the the document's root element, preserving their order. If the attribute is being set on a Document node, let target document be that Document node. Otherwise, the attribute is being set on an Element node; let target document be the ownerDocument of that Element . Set the ownerDocument of all the nodes in new children to the target document . Append all the new children nodes to the node whose innerHTML attribute is being set, preserving their order.

script elements inserted using innerHTML do not execute when they are inserted.

2.6. APIs in HTML documents

For HTML documents, and for HTML elements in HTML documents, certain APIs defined in DOM3 Core become case-insensitive or case-changing, as sometimes defined in DOM3 Core, and as summarised or required below. [DOM3CORE].

This does not apply to XML documents or to elements that are not in the HTML namespace despite being in HTML documents.

Element.tagName , Node.nodeName , and Node.localName These attributes return tag names in all uppercase and attribute names in all lowercase , regardless of the case with which they were created. Document.createElement() The canonical form of HTML markup is all-lowercase; thus, this method will lowercase the argument before creating the requisite element. Also, the element created must be in the HTML namespace. This doesn't apply to Document.createElementNS() . Thus, it is possible, by passing this last method a tag name in the wrong case, to create an element that claims to have the tag name of an element defined in this specification, but doesn't support its interfaces, because it really has another tag name not accessible from the DOM APIs. Element.setAttributeNode() When an Attr node is set on an HTML element, it must have its name lowercased before the element is affected. This doesn't apply to Document.setAttributeNodeNS() . Element.setAttribute() When an attribute is set on an HTML element, the name argument must be lowercased before the element is affected. This doesn't apply to Document.setAttributeNS() . Document.getElementsByTagName() and Element.getElementsByTagName() These methods (but not their namespaced counterparts) must compare the given argument case-insensitively when looking at HTML elements, and case-sensitively otherwise. Thus, in an HTML document with nodes in multiple namespaces, these methods will be both case-sensitive and case-insensitive at the same time. Document.renameNode() If the new namespace is the HTML namespace, then the new qualified name must be lowercased before the rename takes place.

3. Semantics and structure of HTML elements

3.1. Introduction

This section is non-normative.

An introduction to marking up a document.

3.2. Common microsyntaxes

There are various places in HTML that accept particular data types, such as dates or numbers. This section describes what the conformance criteria for content in those formats is, and how to parse them.

Need to go through the whole spec and make sure all the attribute values are clearly defined either in terms of microsyntaxes or in terms of other specs, or as "Text" or some such.

3.2.1. Common parser idioms

The space characters , for the purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).

Some of the micro-parsers described below follow the pattern of having an input variable that holds the string being parsed, and having a position variable pointing at the next character to parse in input .

For parsers based on this pattern, a step that requires the user agent to collect a sequence of characters means that the following algorithm must be run, with characters being the set of characters that can be collected:

Let input and position be the same variables as those of the same name in the algorithm that invoked these steps. Let result be the empty string. While position doesn't point past the end of input and the character at position is one of the characters , append that character to the end of result and advance position to the next character in input . Return result .

The step skip whitespace means that the user agent must collect a sequence of characters that are space characters. The step skip Zs characters means that the user agent must collect a sequence of characters that are in the Unicode character class Zs. In both cases, the collected characters are not used. [UNICODE]

3.2.2. Boolean attributes

A number of attributes in HTML5 are boolean attributes . The presence of a boolean attribute on an element represents the true value, and the absence of the attribute represents the false value.

If the attribute is present, its value must either be the empty string or the attribute's canonical name, exactly, with no leading or trailing whitespace, and in lowercase.

3.2.3. Numbers

3.2.3.1. Unsigned integers

A string is a valid non-negative integer if it consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9).

The rules for parsing non-negative integers are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will either return zero, a positive integer, or an error. Leading spaces are ignored. Trailing spaces and indeed any trailing garbage characters are ignored.

Let input be the string being parsed. Let position be a pointer into input , initially pointing at the start of the string. Let value have the value 0. Skip whitespace. If position is past the end of input , return an error. If the next character is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9), then return an error. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9): Multiply value by ten. Add the value of the current character (0..9) to value . Advance position to the next character. If position is not past the end of input , return to the top of step 7 in the overall algorithm (that's the step within which these substeps find themselves). Return value .

3.2.3.2. Signed integers

A string is a valid integer if it consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally prefixed with a U+002D HYPHEN-MINUS ("-") character.

The rules for parsing integers are similar to the rules for non-negative integers, and are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will either return an integer or an error. Leading spaces are ignored. Trailing spaces and trailing garbage characters are ignored.

Let input be the string being parsed. Let position be a pointer into input , initially pointing at the start of the string. Let value have the value 0. Let sign have the value "positive". Skip whitespace. If position is past the end of input , return an error. If the character indicated by position (the first character) is a U+002D HYPHEN-MINUS ("-") character: Let sign be "negative". Advance position to the next character. If position is past the end of input , return an error. If the next character is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9), then return an error. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9): Multiply value by ten. Add the value of the current character (0..9) to value . Advance position to the next character. If position is not past the end of input , return to the top of step 9 in the overall algorithm (that's the step within which these substeps find themselves). If sign is "positive", return value , otherwise return 0- value .

3.2.3.3. Real numbers

A string is a valid floating point number if it consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally with a single U+002E FULL STOP (".") character somewhere (either before these numbers, in between two numbers, or after the numbers), all optionally prefixed with a U+002D HYPHEN-MINUS ("-") character.

The rules for parsing floating point number values are as given in the following algorithm. As with the previous algorithms, when this one is invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will either return a number or an error. Leading spaces are ignored. Trailing spaces and garbage characters are ignored.

Let input be the string being parsed. Let position be a pointer into input , initially pointing at the start of the string. Let value have the value 0. Let sign have the value "positive". Skip whitespace. If position is past the end of input , return an error. If the character indicated by position (the first character) is a U+002D HYPHEN-MINUS ("-") character: Let sign be "negative". Advance position to the next character. If position is past the end of input , return an error. If the next character is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9) or U+002E FULL STOP ("."), then return an error. If the next character is U+002E FULL STOP ("."), but either that is the last character or the character after that one is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9), then return an error. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9): Multiply value by ten. Add the value of the current character (0..9) to value . Advance position to the next character. If position is past the end of input , then if sign is "positive", return value , otherwise return 0- value . Otherwise return to the top of step 10 in the overall algorithm (that's the step within which these substeps find themselves). Otherwise, if the next character is not a U+002E FULL STOP ("."), then if sign is "positive", return value , otherwise return 0- value . The next character is a U+002E FULL STOP ("."). Advance position to the character after that. Let divisor be 1. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9): Multiply divisor by ten. Add the value of the current character (0..9) divided by divisor , to value . Advance position to the next character. If position is past the end of input , then if sign is "positive", return value , otherwise return 0- value . Otherwise return to the top of step 14 in the overall algorithm (that's the step within which these substeps find themselves). Otherwise, if sign is "positive", return value , otherwise return 0- value .

3.2.3.4. Ratios

The algorithms described in this section are used by the progress and meter elements.

A valid denominator punctuation character is one of the characters from the table below. There is a value associated with each denominator punctuation character , as shown in the table below.

Denominator Punctuation Character Value U+0025 PERCENT SIGN % 100 U+066A ARABIC PERCENT SIGN ٪ 100 U+FE6A SMALL PERCENT SIGN ﹪ 100 U+FF05 FULLWIDTH PERCENT SIGN ％ 100 U+2030 PER MILLE SIGN ‰ 1000 U+2031 PER TEN THOUSAND SIGN ‱ 10000

The steps for finding one or two numbers of a ratio in a string are as follows:

If the string is empty, then return nothing and abort these steps. Find a number in the string according to the algorithm below, starting at the start of the string. If the sub-algorithm in step 2 returned nothing or returned an error condition, return nothing and abort these steps. Set number1 to the number returned by the sub-algorithm in step 2. Starting with the character immediately after the last one examined by the sub-algorithm in step 2, skip any characters in the string that are in the Unicode character class Zs (this might match zero characters). [UNICODE] If there are still further characters in the string, and the next character in the string is a valid denominator punctuation character, set denominator to that character. If the string contains any other characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE, but denominator was given a value in the step 6, return nothing and abort these steps. Otherwise, if denominator was given a value in step 6, return number1 and denominator and abort these steps. Find a number in the string again, starting immediately after the last character that was examined by the sub-algorithm in step 2. If the sub-algorithm in step 9 returned nothing or an error condition, return nothing and abort these steps. Set number2 to the number returned by the sub-algorithm in step 9. If there are still further characters in the string, and the next character in the string is a valid denominator punctuation character, return nothing and abort these steps. If the string contains any other characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE, return nothing and abort these steps. Otherwise, return number1 and number2 .

The algorithm to find a number is as follows. It is given a string and a starting position, and returns either nothing, a number, or an error condition.

Starting at the given starting position, ignore all characters in the given string until the first character that is either a U+002E FULL STOP or one of the ten characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE. If there are no such characters, return nothing and abort these steps. Starting with the character matched in step 1, collect all the consecutive characters that are either a U+002E FULL STOP or one of the ten characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE, and assign this string of one or more characters to string . If string contains more than one U+002E FULL STOP character then return an error condition and abort these steps. Parse string according to the rules for parsing floating point number values, to obtain number . This step cannot fail ( string is guarenteed to be a valid floating point number). Return number .

3.2.3.5. Percentages and dimensions

valid positive non-zero integers rules for parsing dimension values (only used by height/width on img, embed, object — lengths in css pixels or percentages)

3.2.3.6. Lists of integers

A valid list of integers is a number of valid integers separated by U+002C COMMA characters, with no other characters (e.g. no space characters). In addition, there might be restrictions on the number of integers that can be given, or on the range of values allowed.

The rules for parsing a list of integers are as follows:

Let input be the string being parsed. Let position be a pointer into input , initially pointing at the start of the string. Let numbers be an initially empty list of integers. This list will be the result of this algorithm. If there is a character in the string input at position position , and it is either U+002C COMMA character or a U+0020 SPACE character, then advance position to the next character in input , or to beyond the end of the string if there are no more characters. If position points to beyond the end of input , return numbers and abort. If the character in the string input at position position is a U+002C COMMA character or a U+0020 SPACE character, return to step 4. Let negated be false. Let value be 0. Let multiple be 1. Let started be false. Let finished be false. Let bogus be false. Parser: If the character in the string input at position position is: A U+002D HYPHEN-MINUS character Follow these substeps: If finished is true, skip to the next step in the overall set of steps. If started is true or if bogus is true, let negated be false. Otherwise, if started is false and if bogus is false, let negated be true. Let started be true. A character in the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE Follow these substeps: If finished is true, skip to the next step in the overall set of steps. Let n be the value of the digit, interpreted in base ten, multiplied by multiple . Add n to value . If value is greater than zero, multiply multiple by ten. Let started be true. A U+002C COMMA character A U+0020 SPACE character Follow these substeps: If started is false, return the numbers list and abort. If negated is true, then negate value . Append value to the numbers list. Jump to step 4 in the overall set of steps. A U+002E FULL STOP character Follow these substeps: Let finished be true. Any other character Follow these substeps: If finished is true, skip to the next step in the overall set of steps. Let negated be false. Let bogus be true. If started is true, then return the numbers list, and abort. (The value in value is not appended to the list first; it is dropped.) Advance position to the next character in input , or to beyond the end of the string if there are no more characters. If position points to a character (and not to beyond the end of input ), jump to the big Parser step above. If negated is true, then negate value . If started is true, then append value to the numbers list, return that list, and abort. Return the numbers list and abort.

In the algorithms below, the number of days in month month of year year is: 31 if month is 1, 3, 5, 7, 8, 10, or 12; 30 if month is 4, 6, 9, or 11; 29 if month is 2 and year is a number divisible by 400, or if year is a number divisible by 4 but not by 100; and 28 otherwise. This takes into account leap years in the Gregorian calendar. [GREGORIAN]

3.2.4.1. Specific moments in time

A string is a valid datetime if it has four digits (representing the year), a literal hyphen, two digits (representing the month), a literal hyphen, two digits (representing the day), optionally some spaces, either a literal T or a space, optionally some more spaces, two digits (for the hour), a colon, two digits (the minutes), optionally the seconds (which, if included, must consist of another colon, two digits (the integer part of the seconds), and optionally a decimal point followed by one or more digits (for the fractional part of the seconds)), optionally some spaces, and finally either a literal Z (indicating the time zone is UTC), or, a plus sign or a minus sign followed by two digits, a colon, and two digits (for the sign, the hours and minutes of the timezone offset respectively); with the month-day combination being a valid date in the given year according to the Gregorian calendar, the hour values ( h ) being in the range 0 ≤ h ≤ 23, the minute values ( m ) in the range 0 ≤ m ≤ 59, and the second value ( s ) being in the range 0 ≤ h < 60. [GREGORIAN]

The digits must be characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), the hyphens must be a U+002D HYPHEN-MINUS characters, the T must be a U+0054 LATIN CAPITAL LETTER T, the colons must be U+003A COLON characters, the decimal point must be a U+002E FULL STOP, the Z must be a U+005A LATIN CAPITAL LETTER Z, the plus sign must be a U+002B PLUS SIGN, and the minus U+002D (same as the hyphen).

The following are some examples of dates written as valid datetimes. " 0037-12-13 00:00 Z " Midnight UTC on the birthday of Nero (the Roman Emperor). " 1979-10-14T12:00:00.001-04:00 " One millisecond after noon on October 14th 1979, in the time zone in use on the east coast of North America during daylight saving time. " 8592-01-01 T 02:09 +02:09 " Midnight UTC on the 1st of January, 8592. The time zone associated with that time is two hours and nine minutes ahead of UTC. Several things are notable about these dates: Years with fewer than four digits have to be zero-padded. The date "37-12-13" would not be a valid date.

To unambiguously identify a moment in time prior to the introduction of the Gregorian calendar, the date has to be first converted to the Gregorian calendar from the calendar in use at the time (e.g. from the Julian calendar). The date of Nero's birth is the 15th of December 37, in the Julian Calendar, which is the 13th of December 37 in the Gregorian Calendar.

The time and timezone components are not optional.

Dates before the year 0 or after the year 9999 can't be represented as a datetime in this version of HTML.

Time zones differ based on daylight savings time.

Conformance checkers can use the algorithm below to determine if a datetime is a valid datetime or not.

To , a user agent must apply the following algorithm to the string. This will either return a time in UTC, with associated timezone information for round tripping or display purposes, or nothing, indicating the value is not a valid datetime. If at any point the algorithm says that it "fails", this means that it returns nothing.

Let input be the string being parsed. Let position be a pointer into input , initially pointing at the start of the string. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly four characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the year . If position is beyond the end of input or if the character at position is not a U+002D HYPHEN-MINUS character, then fail. Otherwise, move position forwards one character. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the month . If month is not a number in the range 1 ≤ month ≤ 12, then fail. Let maxday be the number of days in month month of year year . If position is beyond the end of input or if the character at position is not a U+002D HYPHEN-MINUS character, then fail. Otherwise, move position forwards one character. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the day . If day is not a number in the range 1 ≤ month ≤ maxday , then fail. Collect a sequence of characters that are either U+0054 LATIN CAPITAL LETTER T characters or space characters. If the collected sequence is zero characters long, or if it contains more than one U+0054 LATIN CAPITAL LETTER T character, then fail. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the hour . If hour is not a number in the range 0 ≤ hour ≤ 23, then fail. If position is beyond the end of input or if the character at position is not a U+003A COLON character, then fail. Otherwise, move position forwards one character. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the minute . If minute is not a number in the range 0 ≤ minute ≤ 59, then fail. Let second be a string with the value "0". If position is beyond the end of input , then fail. If the character at position is a U+003A COLON, then: Advance position to the next character in input . If position is beyond the end of input , or at the last character in input , or if the next two characters in input starting at position are not two characters both in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then fail. Collect a sequence of characters that are either characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) or U+002E FULL STOP characters. If the collected sequence has more than one U+002E FULL STOP characters, or if the last character in the sequence is a U+002E FULL STOP character, then fail. Otherwise, let the collected string be second instead of its previous value. Interpret second as a base ten number (possibly with a fractional part). Let that number be second instead of the string version. If second is not a number in the range 0 ≤ hour < 60, then fail. (The values 60 and 61 are not allowed: leap seconds cannot be represented by datetime values.) If position is beyond the end of input , then fail. Skip whitespace. If the character at position is a U+005A LATIN CAPITAL LETTER Z, then: Let timezone hours be 0. Let timezone minutes be 0. Advance position to the next character in input . Otherwise, if the character at position is either a U+002B PLUS SIGN ("+") or a U+002D HYPHEN-MINUS ("-"), then: If the character at position is a U+002B PLUS SIGN ("+"), let sign be "positive". Otherwise, it's a U+002D HYPHEN-MINUS ("-"); let sign be "negative". Advance position to the next character in input . Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the timezone hours . If timezone hours is not a number in the range 0 ≤ timezone hours ≤ 23, then fail. If sign is "negative", then negate timezone hours . If position is beyond the end of input or if the character at position is not a U+003A COLON character, then fail. Otherwise, move position forwards one character. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the timezone minutes . If timezone minutes is not a number in the range 0 ≤ timezone minutes ≤ 59, then fail. If sign is "negative", then negate timezone minutes . If position is not beyond the end of input , then fail. Let time be the moment in time at year year , month month , day day , hours hour , minute minute , second second , subtracting timezone hours hours and timezone minutes minutes. That moment in time is a moment in the UTC timezone. Let timezone be timezone hours hours and timezone minutes minutes from UTC. Return time and timezone .

3.2.4.2. Vaguer moments in time

This section defin