Markup specialists and their predecessors have wasted decades creating works that open possibilities in the short run but close them in the long run. The continuous headaches of versioning and differentiating vocabularies are a symptom of our failure, of the brittleness we have so enthusiastically embraced.

Seven years ago, speaking on an XML panel at a Web conference, I told attendees to go experiment with vocabularies, and try new paths. The browser universe was too constrained, I said, too bound up with ideas about validation, whether HTML or XML or something else. No one seemed enthusiastic about that advice, and I had startled myself to have recommended it so seriously.

There was no escape, though - it was the right advice, and continues to be.

Much of the markup world has actually turned to experimenting, building very different structures around their work. They mix social organization that distributes decision-making more widely with technical approaches, many of them applying old tools but reflecting enhancements to processing environments.

However, as the limitations and intrusions of XML Schema became clearer, alternate approaches appeared. RELAX NG was in many ways a simpler and better thought-out approach to creating comprehensive schemas, while Schematron's rule-based approach offered a very different kind of testing. Examplotron built on Schematron's ideas to create a very different model of a schema, re-establishing the value of using sample documents for conversations about document interchange.

As schema proponents had hoped, the technology took off, becoming a central component in a rapidly growing "Web Services" ecosystem, largely built with tools that made it (relatively) easy to bind XML data to program structures using the type information provided by schemas. Schemas served not only as documentation and validation tools but as tools for structuring code. (They also served as configuration for XML editors of various kinds.)

XML Schema had a key place in TimBL's own vision of the Semantic Web, as shown at [ Semantic Web Architecture ]. Perhaps namespaces were more important to him, given their foundation in URIs, but XML Schema also helped drive namespaces deeper into XML with qname content. Over time, however, Berners-Lee has largely lost interest in XML, and the Semantic Web has regularly looked elsewhere for syntax.

It wasn't just that XML Schemas would "make good on the promises of extensibility and power", but, as later specifications demonstrated, that they would provide a foundation for further work in processing XML as strongly typed data. The Post-Schema Validation Infoset, effectively a type-annotated version of documents that passed validation, became the foundation on which XSLT 2.0 and XQuery would build. XML itself didn't require that you use schemas of any kind, but the core toolset incorporated more and more assumptions based on schema capabilities, without any separation of concerns. "XML" practice clearly incorporated XML schema practice.

By bringing datatypes to XML, XML Schema increases XML's power and utility to the developers of electronic commerce systems, database authors and anyone interested in using and manipulating large volumes of data on the Web. By providing better integration with XML Namespaces, it makes it easier than it has ever been to define the elements and attributes in a namespace, and to validate documents which use multiple namespaces defined by different schemas.

"XML Schema makes good on the promises of extensibility and power at the heart of XML," said Tim Berners-Lee, W3C Director. "In conjunction with XML Namespaces, XML Schema is the language for building XML applications."

When XML is used to exchange technical information in a multi-vendor environment, schemas will allow software to distinguish data governed by industry-standard and vendor-specific schemas....

For electronic commerce, schemas can be used to define business transactions within markets and between parties, and to provide rules for validating business documents.

Databases must, for example, communicate detailed information about the legal values of particular fields in the data being exchanged.

While press releases are not usually a great place to learn about the details of specifications, they are an excellent place to learn about what those specifications are meant to do. In the case of XML Schema, there are even a few to choose from. At the outset, 1999's press release was excited about the potential for establishing standards for many kinds of transactions:

The W3C responded with XML Schema, a pair of specifications defining a language for specifying deterministic document structures and content, supporting associated processing. Like its DTD predecessor, XML Schema's validation process modified the document, adding default values for attributes (the classic case). Going beyond what DTDs had done, it also annotated the infoset of the reported document with type information accessible to later processing.

As more and more developers took Bosak and Bray's promises seriously, XML reached field after field, migrating far beyond the document-centered territory SGML had considered its home. While the ideas of agreement and validation were popular in many areas, DTDs did not feel like an appropriate answer to many developers. Many developers' expectations had been shaped by databases, strongly typed languages, and fields that wanted more intricate specifications of content.

Architectural forms were a limited set of transformations, still deeply intertwined with the contractual expectations of DTDs. The stomping they received at the W3C, however, was a sign of static expectations to come.

While there was some work done to bring architectural forms to XML - most visibly David Megginson's XML Architectural Forms - architectural forms lost a bitter battle inside the W3C. While confidentiality makes it difficult to tell precisely what happened, public sputtering suggests that architectural forms' vision of adaptation to local schemas did not not appeal to the Director of the W3C's intention of building globally-understood vocabularies identified by URIs. Instead, the XML world was given a mandate to create vocabularies with globally unique identifiers.

Architectural forms permit DTD writers to use their own element type names for HyTime structures. Not only is the architectural form notion fundamental to HyTime, it is a new and useful SGML coding technique that can, if used wisely, ease the standardization of tagging practices by steering a route between the Scylla of excessing rigidity and the Charybdis of excessive freedom that threaten such standards when they must serve large, disparate groups of users.

As the SGML community had ventured into hypertext, they too had found difficulties in sharing structures across vocabularies and recognizing those structures. They lacked pretensions of building a single global system, however, and had proposed a very different route: architectural forms.

The world of agreements wasn't enough for XML's keepers at the W3C. Tim Berners-Lee's Semantic Web visions required globally unique identifiers for vocabularies. In the period when Berners-Lee considered XML a key foundation for that work, that meant spackling URIs into markup to create globally unique markup identifiers and build them into vocabularies.

Businesses and developers took up that challenge, and thousands of committees blossomed. XML "solved" the syntactic layer, and it was time to invest millions in defining structure.

Such agreements will be made, because the proliferation of incompatible computer systems has imposed delays, costs and confusion on nearly every area of human activity. People want to share ideas and do business without all having to use the same computers; activity-specific interchange languages go a long way toward making that possible. Indeed, a shower of new acronyms ending in "ML" testifies to the inventiveness unleashed by XML in the sciences, in business, and in the scholarly disciplines.

What XML does is less magical but quite effective nonetheless. It lays down ground rules that clear away a layer of programming details so that people with similar interests can concentrate on the hard part—agreeing on how they want to represent the information they commonly exchange. This is not an easy problem to solve, but it is not a new one, either.

Phrasing things more positively, Jon Bosak and Tim Bray described the work ahead and why people were eager to do it:

The best remedy is to codify private ontologies that serve to identify the active context of any document. This is the ideal role for a well-tempered DTD. Consider two newspapers with specific in-house styles for bylines, captions, company names, and so on. Where they share stories on a wire service, for example, they can identify it as their story, or convert it according to an industry-wide stylebook.

For any document to communicate successfully from author to readers, all parties concerned must agree that words all choose them to mean. [sic] Semantics can only be interpreted within the context of a community. For example, millions of HTML users worldwide agree that <B> means bold text, or that <H1> is a prominent top-level document heading...

Those battles - or perhaps it is nicer to say negotiations - carried over into the XML world. Although XML allowed documents to go forth boldly naked without a DTD, the expected approach for its use still involved prior agreement. Citing Lewis Carroll's Humpty Dumpty, the W3C's Dan Connolly warned of the dangers of diverse semantics:

But while processing should not be constrained, document structure must be. SGML documents must contain a DOCTYPE declaration, which must in turn reference (or include) a DTD, and those DTDs rapidly became the battleground over which the users of SGML fought.

Indeed, there are publishing situations where an SGML application can be useful with no processing specifications at all (not even application-specific ons [sic]), because each user will specify unique processing in a unique system environment. The historical explanation for this phenomenon is that in publishing (unlike, say, word processing before the laser printer), the variety of potential processing is unlimited and should not be constrained.

4.279 SGML Application: Rules that apply SGML to a text processing application. An SGML application includes a formal specification of the markup constructs used in the application, expressed in SGML. It can also include a non-SGML definition of semantics, application conventions, and/or processing.

SGML required documents to come with an declaration of their structure. Brittleness was built into the system. Documents were not meant to exist by themselves, but rather as part of an SGML application, defined as such:

Markup made mistakes early. The culture of agreements first, processing later, appeared in the earliest standards, indeed in the ISO approach to standardization. As Len Bullard described the attitude more recently:

System-B is also the worldview that dominates computing. There are occasional counterculture moments and corners that resist System-B. However, even in a software world that at least seems like it should be more flexible than its explicitly industrial hardware side, the march toward the mass production of identical and mildly configurable products (and the standards that facilitate them) continues inexorably.

In Alexander's telling, System-B grows from mass production and the ideologies of classicism and industrialism that Ruskin and Morris blasted generations before. System-B has spread from the Victorian factories to every aspect of building construction (and computing as well). System-A, Alexander's preferred system, is older but has largely been discarded in the race to industrialize everything. His more detailed telling reveals another dimension to the problem of System-B: it is not only profit-seeking, but its adherents have been so surrounded by it that they have a difficult time imagining that anything else could work.

The pressure to use such a system comes mainly from the desire to make a profit, and from the desire to do it at the highest possible speed.

System-B is, on the contrary, dedicated to an overwhelmingly machinelike philosophy. The components and products are without individual identity and most often alienating in their psychological effect.

System-A is a system of production in which local adaptation is primary. Its processes are governed by methods that make each building, and each part of each building, unique and uniquely crafted to its context.

Changing this is not easy, as Alexander learns repeatedly. His books have become more biting over time, reflecting the hostility of established practice to his different approaches. A world in which inspectors demand to approve signed drawings before allowing construction is not especially compatible with an approach that defines itself in terms of conversation and techniques. His latest book, The Battle for the Life and Beauty of the Earth, makes that conflict explicit, describing it as a "necessary confrontation" between two incompatible approaches to building: System-A and System-B.

It has been little understood how vast the effect of this has been on housing: how enormous the degree of control achieved, unintentionally, by these components and the demands of their assembly. Yet, as anyone who has intimate knowledge of building knows, these components are merciless in their demands. They control the arrangement of details. They prohibit variation. They are inflexible with respect to ornament, or whimsy, or humor, or any little human touch a person might like to make."

Today's systems of housing production almost all rely, in one form or another, on standardized building components. These components may be very small (electrical boxes, for instance), or intermediate (2x4 studs), or very large (precast concrete rooms); but regardless of their size, buildings are understood to be assembled out of these components. In this sense then, the actual construction phase of the housing production process has become an assembly phase: an occasion where prefabricated components are assembled, on site, to produce the complete houses.

While reliance on drawings is one aspect of the problem, the industrial model of component construction adds an entirely new level of potential trouble. Standards limit choices, reinforcing the mistakes created by separation of concerns:

And the opposite is true also. In modern times, the contractor and his crew are deeply and sadly alienated from the buildings they produce. Since the buildings are seen as "products," no more than that, and since they are specified to the last nail by the architect, the process of building itself becomes alienated, desiccated, a machine assembly process, with no love in it, no feeling, no warmth, and no humanity.

"The great complexity needed by a human settlement cannot be transmitted via paper; and the separation of functions between architect and builder is therefore out of the question. The complexity can only be preserved if the architect and contractor are one. All this makes clear that the architect must be the builder.

What has created that wasteland? While Alexander's early writings focus mostly on the positive he hopes to encourage, his later works, works from the field, cannot avoid dealing with a world structured to make his style of work difficult, if not impossible. As the scale of construction has grown, specialization and centralization have led to ever-more detached and standardized approaches that cannot help but produce bad work.

Alexander's approach obliterates the traditional separation between designers and builders, refusing to cooperate with a model he believes creates a "breakdown of the physical environment... It is hardly possible to experience a profound relationship with these places. So the landscape of our era has become, and continues to become, a wasteland." [ Alexander 2012 , page 80.]

it is axiomatic, for us that the people who build the houses must be active, mentally and spiritually, while they are building, so that of course they must have the power to make design decisions while they are building, and must have an active relation to the conception of the building, not a passive one. This makes it doubly clear that the builders must be architects.

The conversation doesn't end when construction starts, either. Patterns apply at all levels of development, from regional planning to finished details. Construction is an opportunity to make changes along the way, as the reality of earlier decisions becomes clears. This approach not only involves the users of the building, but transforms the role of the architect.

For the Eishin Campus, Alexander's team and a large group of administrators, teachers, and students developed 110 patterns specific to that project, informed by the broader list in A Pattern Language but moving beyond it.

Once we have learned to take a reading of people's true desires and feelings, we can then describe the patterns that are needed to generate a profound campus environment. The system, or "language" of these patterns can give the community the beautiful world they need and want, the physical organization that will make their world practical, beautiful, life-enhancing, and truly useful.

However, in other projects, he had better luck developing pattern languages based on input from the community:

Under normal circumstances, the architect-builder of a particular area would also modify and refine these patterns, according to local custom. In this particular project, we were so occupied by the demands of construction that we had little time to undertake work of this sort.

the families became enthusiastic about the project as they began to see the richness inherent in the patterns. However, our efforts to get them to modify the language, to contribute other patterns of their own, were disappointing.

It didn't all go smoothly, however, as one additional angle, the creation of new patterns, didn't materialize in the earlier work to shape the overall cluster of houses:

this pattern language... allowed us to produce a variety of houses, each one essentially a variant of a fundamental house "type" (defined by the twenty-one patterns together), and yet each one personal and unique according to the special character of the family who used it.

this language has the amazing capacity to unify the generic needs which are felt by every family, and which make a house functional and sensible, with the unique idiosyncrasies that make every family different, and thus to produce a house which is unique and personal, but also one which satisfies the basic needs of a good house.

"In order to get a reasonable house which works well and which nevertheless expresses the uniqueness of each family, the families all used an instrument we call the pattern language... The particular pattern language contained twenty-one patterns...

Unlike most models for including users in design, Alexander's process keeps the conversation going throughout the creation of works, and includes the workers and the users in that conversation. He has learned (perhaps from Ruskin) that treating workers as automata imposes deep costs, and recognized the quality of construction that people have achieved over centuries, even in financially poor environments, without the aid of architects. His building process allows for the layering of detail to respond to particular circumstances rather than laying out rules which must be applied in all cases.

What Alexander actually offers is not a "design process deterministic and repeatable," but tools for conversation. The "level of expertise" is partially aesthetic, but in many ways social. He takes seriously "all the people in society" from the prior quote, and the job of the architect is less to design and more to facilitate. A Pattern Language is not a set of fixed rules that has emerged from practice, but a constantly evolving and shifting foundation that must combine with other local patterns to be of use.

It is hard to imagine a more bizarre misreading of Alexander: a projection of top-down design assumptions applied to a text whose primary purpose is to overturn them. Gamma, Helm, et al. provide an unfortunately perfect demonstration of how developers borrow badly from architecture. (He recognizes the misreading in Alexander 1996 .)

When Alexander claims you can design a house simply by applying his patterns one after another, he has goals similar to those of object-oriented design methodologists who give step-by-step rules for design. Alexander doesn't deny the need for creativity; some of his patterns require understanding the living habits of people who will use the building, and his belief in the "poetry" of design implies a level of expertise beyond the pattern language itself. But his description of how patterns generate designs implies that a pattern language can make the design process deterministic and repeatable.

The conclusion of Design Patterns, unfortunately, repeats and amplifies its error, in a way that perhaps only computer programmers could:

It is shown [in The Timeless Way of Building] that towns and buildings will not be able to become alive, unless they are made by all the people in society, and unless these people share a common pattern language, within which to make these buildings, and unless this pattern language is alive itself.

However, Alexander argues that the conversation must be broader. At the top of that same page x, one finds:

Even at this stage, however, they have already over-simplified Alexander's approach to patterns, seeing a top-down approach that isn't there. "A solution to a problem in context" is the goal of most non-fiction writing, but the writers of Design Patterns have forgotten the critical question of who solves those problems and how. They seem to have assumed that since Alexander is an architect, these patterns are meant to be applied by architects.

Christopher Alexander says, "Each pattern describes a problem which occurs over and over in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice." [AIS+ 77, page x] Even though Alexander was talking about patterns in buildings and town, what he says is true about object-oriented design patterns. Our solutions are expressed in terms of objects and interfaces instead of walls and doors, but at the core of both kinds of patterns is a solution to a problem in context.

Programmers have indeed bought A Pattern Language, but for mostly the wrong reasons. The classic text, Design Patterns, cites Alexander as its inspiration and brought him to the wide attention of the computing community:

Christopher Alexander has the strange distinction of being an architect more revered and imitated in computing than in his own field. When, during an XML conference in Philadelphia, I stopped at the American Institute of Architects to buy A Pattern Language and The Timeless Way of Building, the cashier informed me that I must be a programmer, because "only programmers buy those books. Architects don't."

Architecture has largely ignored these concerns, and computing, alas, has taken its lessons from that ignorance. Let us instead take Ruskin's worker free of alienation - valuing savageness and supporting changefulness - as a goal worth achieving.

For the lesson which Ruskin here teaches us is that art is the expression of man's pleasure in labour; that it is possible for man to rejoice in his work, for, strange as it may seem to us to-day, there have been times when he did rejoice in it; and lastly, that unless man's work once again becomes a pleasure to him, the token of which change will be that beauty is once again a natural and necessary accompaniment of productive labour, all but the worthless must toil in pain, and therefore live in pain. So that the result of the thousands of years of man's effort on the earth must be general unhappiness and universal degradation; unhappiness and degradation, the conscious burden of which will grow in proportion to the growth of man's intelligence, knowledge, and power over material nature.

To some of us when we first read it, now many years ago, it seemed to point out a new road on which the world should travel. And in spite of all the disappointments of forty years, and although some of us, John Ruskin amongst others, have since learned what the equipment for that journey must be, and how many things must be changed before we are equipped, yet we can still see no other way out of the folly and degradation of Civilization.

Ruskin's friend William Morris, in his Preface to The Nature of Gothic, warns of the costs of that style of work.

The concerns which apply to architecture also apply to markup. Markup process, often deliberately, parallels that of the contemporary design approach. Define a problem. Develop a shared vision for solving it. Hire experts who create plans specifying that vision in detail, and send them to the "workmen" to be built.

In the virtual world, markup creates the spaces in which we interact. It creates bazaars, agoras, government buildings, and even churches. Markup builds the government office, the sales floor, the loading dock. Markup offers us decorations and distractions. Markup is our architecture.

But our higher instincts are not deceived. We take no pleasure in the building provided for us, resembling that which we take in a new book or a new picture. We may be proud of its size, complacent in its correctness, and happy in its convenience. We may take the same pleasure in its symmetry and workmanship as in a well-ordered room, or a skillful piece of manufacture. And this we suppose to be all the pleasure that architecture was ever intended to give us.

Our architects gravely inform us that, as there are four rules of arithmetic, there are five orders of architecture; we, in our simplicity , think that this sounds consistent, and believe them. They inform us also that there is one proper form for Corinthian capitals, another for Doric, and another for Ionic. We, considering that there is also a proper form for the letters A, B, and C, think that this also sounds consistent, and accept the proposition. Understanding, therefore, that one form of the capitals is proper and no other, and having a conscientious horror of a impropriety we allow the architect to provide us with the said capitals, of the proper form, in such and such a quantity, and in all other points to take care that the legal forms are observed; which having done, we rest in forced confidence that we are well housed.

Experience, I fear, teaches us that accurate and methodical habits in daily life are seldom characteristic of those who either quickly perceive or richly possess, the creative powers of art; there is, however, nothing inconsistent between the two instincts, and nothing to hinder us from retaining our business habits, and yet fully allowing and enjoying the noblest gifts of Invention. We already do so, in every other branch of art except architecture, and we only do not so there because we have been taught that it would be wrong.

Wherever the workman is utterly enslaved, the parts of the building must of course be absolutely like each other; for the perfection of his execution can only be reached by exercising him in doing one thing, and giving him nothing else to do. The degree in which the workman is degraded may be thus known at a glance, by observing whether the several parts of the building are similar or not; and if, as in Greek work, all the capitals are alike, and all the mouldings unvaried, then the degradation is complete; if, as in Egyptian or Ninevite work, though the manner of executing certain figures is always the same, the order of design is perpetually varied, the degradation less total; if, as in Gothic work, there is perpetual change both in design and execution, the workman must have been altogether set free....

I have already enforced the allowing independent operation to the inferior workman, simply as a duty to him, and as ennobling the architecture by rendering it more Christian. We have now to consider what reward we obtain for the performance of this duty, namely, the perpetual variety of every feature of the building.

...Enough, I trust, has been said to show the reader that the rudeness or imperfection which at first rendered the term “Gothic” one of reproach is indeed, when rightly understood, one of the most noble characters of Christian architecture, and not only a noble but an essential one. It seems a fantastic paradox, but it is nevertheless a most important truth, that no architecture can be truly noble which is not imperfect. And this is easily demonstrable. For since the architect, whom we will suppose capable of doing all in perfection, cannot execute the whole with his own hands, he must either make slaves of his workmen in the old Greek, and present English fashion, and level his work to a slave’s capacities, which is to degrade it; or else he must take his workmen as he finds them, and let them show their weaknesses together with their strength, which will involve the Gothic imperfection, but render the whole work as noble as the intellect of the age can make it.

We have much studied and much perfected, of late, the great civilized invention of the division of labor; only we give it a false name. It is not, truly speaking, the labor that is divided; but the men:—Divided into mere segments of men—broken into small fragments and crumbs of life; so that all the little piece of intelligence that is left in a man is not enough to make a pin, or a nail, but exhausts itself in making the point of a pin, or the head of a nail. Now it is a good and desirable thing, truly, to make many pins in a day; but if we could only see with what crystal sand their points were polished,—sand of human soul, much to be magnified before it can be discerned for what it is,—we should think there might be some loss in it also.

Let me not be thought to speak wildly or extravagantly. It is verily this degradation of the operative into a machine, which, more than any other evil of the times, is leading the mass of the nations everywhere into vain, incoherent, destructive struggling for a freedom of which they cannot explain the nature to themselves....

...go forth again to gaze upon the old cathedral front, where you have smiled so often at the fantastic ignorance of the old sculptors: examine once more those ugly goblins, and formless monsters, and stern statues, anatomiless and rigid; but do not mock at them, for they are signs of the life and liberty of every workman who struck the stone; a freedom of thought, and rank in scale of being, such as no laws, no charters, no charities can secure; but which it must be the first aim of all Europe at this day to regain for her children.

SAVAGENESS. I am not sure when the word "Gothic" was first generically applied to the architecture of the North but I presume that, whatever the date of its original usage, it was intended to imply reproach, and express the barbaric character of the nations among whom that architecture arose... As far as the epithet was used scornfully, it was used falsely; but there is no reproach in the word, rightly understood; on the contrary, there is a profound truth, which the instinct of mankind almost unconsciously recognizes. It is true, greatly and deeply true, that the architecture of the North is rude and wild; -but it is not true, that, for this reason, We are to condemn it, or despise. Far otherwise: I believe it is in this very character that it deserves our profoundest reverence.

Writing a century before computing's emergence, John Ruskin's The Nature of Gothic (a chapter of The Stones of Venice) offered a very different view of the proper relationship between craftsman and architect. This is long, but even Ruskin's asides have parallels to computing practice, so please pause to study this quote more than its Victorian prose might otherwise tempt you to do:

New fields like to flatter themselves by styling themselves after older ones. While computing is often engineering (or plumbing) minus the structured training, it more typically compares itself to architecture. Like architecture, it hopes to achieve some form of grace in both its visible and invisible aspects, creating appealing structures that will remain standing. While we may have calmed down a bit from the heroic architect model of Howard Roark in The Fountainhead, markup language creators still expect to be able to lay things out as plans and have them faithfully executed by others who will live up to our specifications.

John Ruskin in the 19th century and Christopher Alexander in the 20th offer an alternative to industrial models, an opportunity to humanize practice. Unsurprisingly for work centered on human capabilities, conversation is a key tool. Ruskin extends the building conversation to include the lowliest workers, while Alexander pushes further to include current and future users of buildings and structures.

Markup emerged from the industrial mind-set common to even the most idealized computing models. Its creators had grown up in a world dominated by industrial models of production, and computers themselves matched that command-and-control drive toward efficiency. Despite the general triumph of the industrial model, it has never really bothered to answer its critics. It hasn't had to - material plenty and the race to keep up have distracted us - but those critics still have things to teach us, even about markup.

The model of prior agreement, of prior structure, isn't unique to markup. It emerged from bureaucratic models that had grown in both commerce and government during the Industrial Revolution, a period that mixed social tumult with insistence on standardization of products and process. "Top-down" approaches became the norm in a world where manufacturing and engineering reorganized themselves around design and calculation.

New Magic, from Clark and Crockford to the Present

While much of the markup world is infused with the System-B concepts that Alexander encourages us to reject, there are corners, influences, and opportunities that can help us cast aside markup's traditional model of design-first-and-then-execute. None of these pieces by itself is exactly a revolution, and some may even seem contradictory. Some of them are indeed side effects of the schema-based approach. While they may seem familiar, combining them offers a path to a different approach and new conversations. All of them point to opportunities for a shift in markup culture.

Not Required Ever since XML 1.0 permitted documents without a DOCTYPE declaration, it has at least been possible to work with XML in the absence of a schema. While in most of my travels I have only found people using this freedom for experimental purposes or for very small-scale work, conversation on xml-dev did turn up some people who simply avoid using schemas. They do, however, seem to use standard formats, but test and document them by other means. These practices are often criticized, especially when the content leaves those often closed systems [Beck 2011]. Even in the best of these cases, though, throwing off schemas is not enough if the expectations of fixed vocabularies remain behind, as Walter Perry warned over a decade ago in [Perry 2002]

Leaving Gaps Even the most obsessively controlling schema vocabularies allow developers to leave some space for growth. ANY in DTDs and xs:any in XML Schema are the classics, and variations allow a mixture of precision and openness. Support for these gaps, however, varies wildly in practice. Both tools and culture push back against the open models. Tools that came from the strictly defined expectations of object definitions have a difficult time dealing with "underspecified" markup. The culture of interoperability testing often encourages maximum agreement. Do open spaces make it easier to create new options, or do they just create new headaches when it's time to move from one version of the a defined vocabulary to another? Gaps create tension with many supposed best practices.

Generic Parsers, not Parser Generators Although it is certainly possible to write parser generators that lock tightly onto a particular vocabulary and parse nothing else, it happens less often than might seem likely. There are tools for creating parser generators, like [XML Booster], and there are certainly cases where it is more efficient or more secure than processing the results of a generic parser. However, judging from job listings and general conversation, parser generation has had a much smaller role in XML than it has had, for example, in ASN.1. (I've had a difficult time in ASN.1 conversations even convincing developers that generic parsers were possible and useful.) Data binding tools, of course, can produce tight bonds even when run on top of a generic parser, but XML's explicit support for generic parsing has at least created the opportunity for looser coupling.

Peace Through Massive Overbuilding Some vocabularies have taken an "everything but the kitchen sink" approach. Two of the most popular document vocabularies, DocBook and the Text Encoding Initiative (TEI), both include gigantic sets of components. While both provide support through tools for those components, many organizations use subsets like the DocBook subset used for this paper. While subsets vary, having a common foundation generally makes transformations easy and fallback to generic tools an option. The TEI pizza chef [TEI Pizza Chef], which served up custom DTDs, typically a TEI subset, stands out as a past highlight of this approach. Building a vocabulary so large that most people work with it only through subsets may seem excessive, but it opens the way to conversation among users of different subsets. In many ways, this is similar to (though operating in the reverse direction of) Rick Jelliffe's suggestion that: "In particular, rather than everyone having to adopt the same schema for the same content type, all that is necessary is for people to revise (or create) each schema so that they are dialects (in the sense above) of the same language. That "language" is close to being the superset information model." — Jellife 2012. So long as the superset model is broad enough, peace can be maintained.

Peace Through Sprinkling Rather than building supersets, some groups have focused on building the smallest thing that could possibly work for their use cases. Dublin Core [DCMI] is probably the most famous of these, though a variety of annotations from [WAI-ARIA] to [HTML5 Data Attributes]. These approaches offer a range of techniques for adding a portion of information used by a certain kind of processor to other vocabularies. They allow multiple processors to see different pieces of a document, though frequently there is still a unified vision of the document managed by those who sprinkle in these components.

Peace Through Conflict While I cited Connolly 1997 above, I halted the quote at a convenient point. There is more to that argument - still schema (DTD) focused, but acknowledging conversation beyond the original creation point: As competing DTDs are shared among the community, semantics are clarified by acclamation [15]. Furthermore, as DTDs themselves are woven into the Web, they can be discovered dynamically, further accelerating the evolution of community ontologies. — Connolly 1997, page 121. While there has been more visible competition among schema languages than among vocabularies specified with schemas, there are constant overlaps among vocabulary projects as well as some direct collisions. "Acclamation" may be too strong a word, as steady erosion seems a more typical process, but there is certainly motion.

Accepting Failure Resisting System-B is easiest, perhaps perversely, in a corner of the software universe that has long hoped to make System-B's "design by specification" possible: functional and declarative programming. These styles of software development remove features that add instability to imperative approaches, often in the pursuit of mathematical provability, reliability and massive scale. These design constraints, though intended to maximize industrial-scale processing of information, also make possible a wide range of more flexible approaches to handling information. The paradigmatic application of these tools in the markup world lies in the technologies we call stylesheets, or style sheets, depending on who is editing at any given moment. While Cascading Style Sheets (CSS) and Extensible Stylesheet Language (XSL) were frequently seen as competitors when XML first arrived, both offer similar capabilities in this regard. They both are (or at least can be) excellent at tolerating failure, with little harm done. The key to that tolerance is pattern matching - selectors for CSS, XPath for XSLT. If patterns don't match, they don't match, and the process goes on. XSLT offers many more features for modifying results, and is more malleable, but neither of them worry much if a document matches their expectations. At worst they produce empty results. XSLT is capable of operating more generically, and of working with content it didn't match explicitly. The XSLT toolset can support reporting and transformation that goes beyond the wildest dreams of schema enthusiasts - and can do much more useful work than validation and annotation along the way. Pattern matching is also central to a number of explicitly functional languages. While they were built for things like mathematical provability, "nine nines" reliability, and structured management of state, those constraints actually give them the power needed to go beyond XSLT's ability to process individual documents. Erlang's "let it crash" philosophy, for example, makes it (relatively) easy to build robust programs that can handle flamingly unexpected situations without grinding to a halt. Failures can be picked up and given different processing, discarded or put in a queue for different handling. A calm response to the unexpected opens many new possibilities.

Valuing Errors Years ago, Walter Perry said in a talk that often the most interesting communications were the ones that broke the rules. They might be mistakes, but they might also be signs of changing conditions, efforts to game the system, or an indication that the system itself was flawed. Errors and flaws have become more popular since. While much of the effort poured into test-driven development is about making sure they don't happen, a key side effect of that work is new approaches to providing meaningful error messages when they do happen. "Test failed" is useful but incomplete. In distributed systems, errors aren't necessarily just bugs, an instant path to the discard bin. While the binary pass/fail of many testing approaches encourage developers to stomp out communications that aren't quite right, turning instead to the meaningful error messages (and error handling) side of that conversation can be much more fruitful.

Mechanical Turks After decades of trying to isolate computing processes from human intervention, some developers are now including humans in the processing chain. After all, it's not difficult to treat such conversations as just another asynchronous call, especially in an age of mobile devices. Not everything has to be processed instantly. Amazon developed the [Amazon Mechanical Turk] service, named after an 18th chess-playing "machine" that turned out to have a person inside of it. It looked like brilliant technology, and was, if humans count. Amazon adds digital management to the approach, distributing "Human Intelligence Tasks" across many anonymous workers. Facebook uses similar if more centralized approaches to censor photos. [Facebook Censorship] The Mechanical Turk model has led to some dire work situations [Cushing 2012] in which humans are treated as cheap cogs in a computing machine, as a System B industrial approach seeks cheap labor to maximize profit. Horrible as some of these approaches are, they make it very clear that even large-scale digital systems can pause to include humans in the decision-making process. It isn't actually that difficult. Connecting these services to markup processing, however, requires interfaces for letting people specify what should be done with unexpected markup. "Keep it, it's okay" with a note, or an option to escalate to something stronger (perhaps even human-to-human conversation) may be an acceptable start.

JSON Shakes it Up While XML seemed to be conquering the communications universe, even finally reaching the Web as the final X in AJAX, many developers dreamed of an escape from its strange world of schemas, transformations, and seemingly endless debates about data representation. Douglas Crockford found an answer uniquely well-suited to the Web, extracted in fact from the JavaScript programming language itself. [JSON] (JavaScript Object Notation) rapidly became popular with JavaScript developers. JSON had an innate advantage in that it could bypass same-origin requirements, but its use has spread far beyond those situations. JSON uses a different syntax, but much more importantly, the nature of the conversation also shifted. Perhaps because it comes from a free-wheeling JavaScript background, expectations of structure have always been loose. Coordination can happen, but reuse and modification is a more common pattern than formal structuring. Many JSON formats are created by single information hubs, rather than across groups of providers, and conversion to internal formats is just a normal fact of life for JSON data consumption. JSON, while somewhat less readable to humans than markup, was both easy to work with in a JavaScript context and compatible with (actually a subset of) the [YAML] data serialization supported by a few other languages. JSON was just a data format, a means for developers to pass information from one program to another. Although JSON schemas and JSON transformation tools exist, they are relatively minor corners of JSON culture. Despite those glaring absences, JSON use continues to expand rapidly. It replaced XML as a default format in Ruby on Rails, and dominates current Ajax usage. Perhaps more striking, it is becoming more common in public use, exactly the territory where prior agreement was deemed most important [Bye XML]. It hasn't replaced XML in that space yet, but is claiming a larger and larger share. Documentation and samples, it seems, is enough. So why stick with markup and not just leap to JSON's more open approach? Mostly because of the tools described in the previous section. Markup understands transformation and decoration better than JSON. Despite its largely schema-free world, JSON is still primarily about tight binding to program structures. The schemas are invisible, often unspecified but they still exist when a document is loaded. However, JSON programmers do plenty of transformation internally. They base their expectations of schemas more on sources than on documents, and some have gone so far as to establish simple source-testing regimens that warn them of change. Source-based versioning is also common in the world of JSON APIs. Rather than the URI of a namespace changing or the details of a schema, new versions of APIs are often simply hosted at new URLs, with the changed content coming from a new location to give developers time to adapt. JSON's curly braces may earn sneers from XML developers who prefer their angle brackets, but JSON is doing more with less.

Polyfills XML was supposed to reach the browser. Mostly, it didn't, but the current state of the browser has much to teach XML. Some of that is ugly, of course. Even without a hard focus on schemas, the power of the browser vendors and the continuing insistence on standardization have limited possibilities. Here we see that it is not only the formalization of schemas but the cultural values surrounding them that create brittleness and stifle experiments. However, that very brokenness, failed versioning, and the lingering hangovers of old software led to the development of a new software pattern, the polyfill [Sharp 2010]. Polyfills are JavaScript code (sometimes combined with HTML and CSS) that quietly extend a browser to support JavaScript libraries that are missing or markup that it doesn't understand. Building cross-browser polyfills is tricky, but [Osmani 2011] not that much more difficult than creating cross-browser frameworks. Even the limited world of HTML5-specific polyfills is vast [Polyfills List]. While the HTML+CSS+JavaScript architecture is extremely flexible, there are some barriers, largely created by browser makers' efforts to optimize bandwidth and processing. Efforts to create a picture element - combining concerns of responsive design and human interaction - have faced challenges around timing, pre-loading in particular. Media processing is (as usual) one of the hardest challenges in working with the browser environment. Establishing communications and respect between vendors and those using their products in perhaps even more difficult. The W3C and browser vendors are working to address those challenges as well as create new frameworks that make polyfills easier to build and more efficient. The Shadow DOM [Shadow DOM] and Web Components [Web Components] work both aim directly at making polyfills a more standard part of web development environments. Google's recent work on [Polymer] is an example of a browser vendor pushing hard in this space. Separately, the Extensible Web Manifesto [Manifesto 2013] encourages this work as a way to shift much work done in JavaScript to work done in markup: We want web developers to write more declarative code, not less. This calls for eliminating the standards bottleneck to introducing new declarative forms, and giving library and framework authors the tools to create them. While XML processing models are typically different, offering no tools to extend processors at the document creator's discretion, this approach could prove useful in situations where processors have chosen to place more trust in users. It also clearly offers an option to developers tired of waiting for the W3C, the WHATWG, and the browser vendors to add markup functionality to HTML5.

Relational Decline Structured data has had a difficult few decades in general. While XML's schemas defined structure, relational database purists (most notably Fabian Pascal) heaped scorn on XML's daring to step outside the sharply defined boundaries of RDBMS tables and joins. Much of the pressure for XML Schema's insistence on deterministic structures and strongly typed data came from communities who considered the constraints in 1990s RDBMS practice to be a good thing - but XML's very success was a key factor in making clear that the relational model was not the only possible story for data. The challenges of scaling within the constraints of Atomicity, Consistency, Isolation, and Durability (ACID), led to several rapid generations of change in the database community. While there are probably more relational databases deployed today than there were when XML appeared, the NoSQL movement has ended the era when developers only chose among relational databases unless their project was extremely unusual. This shift has little direct effect on markup processing, but it does reduce the cultural pressures to only create data structures conforming to a well-known schema.

RESTfulness REST is a communications style based on HTTP's limited number of methods, treating those constraints as a virtue teaching us to build with few verbs and many nouns. There is nothing in REST-based work specific to schemas - schemas (or the lack thereof) are a detail of the work that happens inside the local processors. However, in contrast to their more RPC-based predecessors, which emerged from the CORBA and object-oriented worlds, this lack of specification is still a significant opening. A minimal set of verbs makes it much easier to process a much larger set of nouns, with fewer expectations set up front.

Strictly Local Uses of Schemas Some developers and organizations see schemas as a limited-use tool, applied primarily in a local context to reflect local quality assurance and document creation needs. Since the late 1990s, I've suggested that my consulting customers think of a schema not as an integral part of XML document/data exchange, but as a special auxiliary file that can fill up to two supporting roles: 1. A special stylesheet that renders a boolean value (valid/not valid) for QA. 2. A template for rare and high-specialized structured-authoring applications. If you don't need at least one of those, then you probably don't need a schema. — David Megginson [Correspondence 2013] So long as authoring applications expect schemas as input, schemas will be necessary. There are other ways to do quality assurance, of course, but schemas are common. Will developers resist the temptation to apply schemas more broadly than these cases, when tools and practice point them that direction?

Transition Components - DSDL and MCE Much of the best thinking in markup schemas has worked under the banner of DSDL. RELAX NG, Schematron, and some more obscure pieces demonstrate more flexible alternatives to the W3C's XML Schema. Namespace Validation Dispatching Language (NVDL) finally offers tools for mixing validation approaches based on the namespace qualifiers applied to content. Document Schema Renaming Language (DSRL) offers a simple transformation approach for comparing documents to local schemas. These parts are still tightly bound to schema validation approaches, but they at least add flexibility and add more options. Markup Compatibility and Extensibility (MCE), coming out of the Open Office XML work, finally asks hard questions about different degrees of "understanding" a document: Attributes in the Markup Compatibility namespace shall be either Ignorable, ProcessContent, ExtensionElements, or MustUnderstand. Elements of the Markup Compatibility namespace shall be either AlternateContent, Choice, or Fallback. As Rick Jelliffe describes it: This is a kind of having your cake and eating it too, you might think; the smart thing that gives it a hope of working is that MCE also provides some attributes PreserveElements and PreserveAttributes which let you (the standards writer or the extended document developer) list the elements that do not need to be stripped when modifying some markup. I think standards developers who are facing the cat-herding issue of multiple implementations and the need for all sorts of extensions should seriously consider the MCE approach. — Jellife 2009