If you think that you need professional help to build a static HTML Web site, tell yourself "The abused 10-year-old got his site to work; I think I can, too."

You May Already Have Won $1 Million

My Samoyed is really hairy.

That is a perfectly acceptable HTML document. Type it up in a text editor, save it as index.html, and put it on your Web server. A Web server can serve it. A user with Netscape Navigator can view it. A search engine can index it.

Suppose you want something more expressive. You want the word really to be in italic type:

My Samoyed is <I>really</I> hairy.

HTML stands for Hypertext Markup Language. The <I> is markup. It tells the browser to start rendering words in italics. The </I> closes the <I> element and stops the italics If you want to be more tasteful, you can tell the browser to emphasize the word really:

My Samoyed is <EM>really</EM> hairy.

Most browsers use italics to emphasize, but some use boldface and browsers for ancient ASCII terminals (e.g., Lynx) have to ignore this tag or come up with a clever rendering method. A picky user with the right browser program can even customize the rendering of particular tags.

There are a few dozen more tags in HTML. You can learn them by choosing View Source from a Web browser when visiting sites whose formatting you admire. You can also work through a comprehensive HTML guide, e.g., http://www.w3schools.com/html/html_reference.asp (Web) and HTML & XHTML: The Definitive Guide by Musciano and Kennedy (O'Reilly, 2002; print).

Document Structure

Another structure issue is that you should try to make sure that you close every element that you open. So if your document has a <BODY> it should have a </BODY> at the end. If you start an HTML table with a <TABLE> and don't have a </TABLE>, a Web browser may display nothing. Tags can overlap, but you should close the most recently opened before the rest, e.g., for something both boldface and italic:

My Samoyed is <B><I>really</I></B> hairy.

Something that confuses a lot of new users is that the <P> element used to surround a paragraph has an optional closing tag </P>. Browsers by convention assume that an open <P> element is implicitly closed by the next <P> element. This leads a lot of publishers (including lazy old me) to use <P> elements as paragraph separators.

Here's the HTML template from which documents at philip.greenspun.com start out:

<html> <head> <title>New Doc</title> </head> <body bgcolor=white text=black> <h2>New Doc</h2> by <a href="/">Philip Greenspun</a>, revised April 1, 2003 <hr> introductory text <h3>First Subhead</h3> more text <p> yet more text <h3>Second subhead</h3> concluding text <hr> <a href="mailto:philg@mit.edu"> <address>philg@mit.edu</address> </a> </body> </html>

Let's go through this document piece by piece (see for how it looks rendered by a browser).

The <HTML> element at the top says "I'm an HTML document". Note that this tag is closed at the end of the document. It turns out that this tag is unnecessary. We've saved the document in the file "basic.html". When a user requests this document, the Web server looks at the file's ".html" extension and adds a MIME header to tell the user's browser that this document is of type "text/html".

The <HEAD> element's primary purpose in this document is so that one can legally use the <TITLE> element to give this document a name. Whatever text is placed between <TITLE> and </TITLE> will appear at the top of the user's browser window, on the menu that pops up when the user clicks on the Back button, and in his bookmarks menu should he bookmark this page. After closing the head with a </HEAD>, the body of the document is opened with a <BODY> element, to which are added some optional parameters to set the background to white and the text to black. Some Web browsers default to a gray background, and the resulting lack of contrast between background and text is sufficiently offensive that it may be worth changing the default colors. This is a violation of some of the principles articulated in this book because it potentially introduces an inconsistency in the user's experience of the Web. However, one need not feel too guilty about it because (1) a lot of browsers use a white background by default, (2) enough other publishers set a white background that white pages won't seem inconsistent, and (3) it doesn't affect the core user interface the way that setting custom link colors would.

Just below the body, there is a headline, size 2, wrapped in an <H2> element. This will be displayed to the user at the top of the page. One could alternatively use <H1> but browsers typically render that in a ridiculously huge font. Underneath the headline, it makes sense to indicate authorship, link to a parent work, and specify the revision date. The authorship link shows that someone is taking responsibility for the content. The link to the parent work, e.g., a book table of contents if the file is one chapter, helps users who've landed on this page from a public search engine. The revision date is important because Web pages often linger forgotten by the author but still available to the public long after they are obsolete. Notice in this example that the authorship phrase "Philip Greenspun" is a hypertext anchor which is why it is wrapped in an A element. The <A HREF= says "this is a hyperlink." If the reader clicks anywhere from here up to the </A> the browser should send him to the root page on the server ("/").

After the headline, author, and optional navigation, the template adds a horizontal rule tag: <HR>. Don't overuse these big lines across the window: Real graphic designers use whitespace for separation. This template uses <H3> headlines in the text to separate sections and <HR>s at the very top to separate the document contents from the headline information and at the very bottom to separate the document contents from the author's email link.

Underneath the last <HR>, the document is signed with "philg@mit.edu". The <ADDRESS> element usually results in an italics rendering. Readers expect that they can scroll to the bottom of a browser window and find out who is responsible for what they've just read. Note that this one is wrapped in an anchor tag. If the user clicks on the anchor text (my email address), the browser will pop up a "send mail to philg@mit.edu" window. It is generally a good idea to wrap every email address on a Web page in a "mailto" tag. Sadly in Age of Spam it may not be a good idea to put any email address on a Web page. An alternative to the author's personal email address would be a form that a reader could use to send a message to the author or editor.

Tarting Up Your Pages

Older browsers on PCs will ignore them; every browser knows how to render a headline, level 3. Not every browser understands a directive to use a specific Microsoft font that ships with the Windows operating system.

Newer browsers will ignore them; mobile phones and palmtops are some of the most interesting devices attached to the Web and they only understand basic HTML.

When you change your graphic designer, you have to edit 10,000 .html documents.

body {margin-left: 3% ; margin-right: 3%} P { margin-top: 0pt; text-indent : 0.2in } P.stb { margin-top: 12pt } P.mtb { margin-top: 24pt; text-indent : 0in} P.ltb { margin-top: 36pt; text-indent : 0in} p.marginnote { background-color: #E0E0E0 } p.paperonly { background-color: #E0E0E0 } li.separate { margin-top: 12pt }

How does one use this style sheet? Park it somewhere on a Web server in a file with the extension ".css". This extension will tell the Web server program to MIME-type it "text/css". Inside each document that uses the cascading style sheet, put the following LINK element inside the document HEAD, just above the TITLE:

<LINK REL=STYLESHEET HREF="/books/philg.css" TYPE="text/css">

Okay, now the browser knows where to get the style sheet and that a small thematic break should be rendered with an extra bit of whitespace. How do we tell the browser that a particular paragraph is "of class stb"? Instead of "<P>", we use

<P CLASS="stb">

Book designers have all kinds of clever ways of setting off margin notes, body notes, and footnotes. Not being a book designer or especially clever, I simply defined a couple of styles that get rendered with a gray background ( "p.marginnote { background-color: #E0E0E0 }" ). This alerts readers that margin notes aren't part of the main text.

The final new subclass (" li.separate { margin-top: 12pt } ") is directed at making lists with whitespace between each bullet item. It worked nicely in Microsoft Internet Explorer circa 1998 but failed in Netscape Navigator (if you're under the age 20, ask your parents about Netscape) so the book doesn't use it (instead the chapters use two line-break tags, <BR><BR> ).

For a complete guide to all the Cascading Style Sheet directives, look in HTML & XHTML: The Definitive Guide and Cascading Style Sheets: The Definitive Guide (Meyer 2000; O'Reilly).

Now That You Know How to Write HTML, Don't

"Owing to the neglect of our defences and the mishandling of the German problem in the last five years, we seem to be very near the bleak choice between War and Shame. My feeling is that we shall choose Shame, and then have War thrown in a little later, on even more adverse terms than at present."

-- Winston Churchill in a letter to Lord Moyne, 1938 ( Churchill: A Life ; Gilbert 1991)

Eventually the Web will work like a naïve user would expect it to. You ask your computer to find you the cheapest pair of blue jeans being hawked on the World Wide Web and ten seconds later you're staring at a photo of the product and being asked to confirm the purchase. You see an announcement for a concert and click a button on your Web browser to add the date to your calendar; the information gets transferred automatically. More powerful formatting isn't far off, either. Eventually there will be browser-independent ways to render the average novel readably.

None of this will happen without radical changes to HTML, however. We'll need semantic tags so that publishers can say, in a way that a computer can understand, "This page sells blue jeans," and "The price of these jeans is $25 U.S." Whether we need them or not, we are sure to get new formatting tags with every new generation of browser. (Personally I can't wait to be able to caption photographs and figures, a common feature of word processing programs in the 1960s.)

Back in 1994, a lowly graduate student wrote a paper titled "We have Chosen Shame and Will Get War" (http://philip.greenspun.com/research/shame-and-war) presenting a scheme for embedding semantic markup in HTML documents so that it wouldn't break old browsers (e.g., NCSA Mosaic!). More importantly, the paper suggested that we needed to develop a common set of document classes, e.g., "advertisement", "novel", "daily-newspaper-article", so that programmers could write software to make life easier for authors and readers.

This paper was rejected from the Web Consortium's 1994 conference, apparently because the idea was too brilliant, radical, and forward-looking for its time. The idea of semantic markup in documents had barely been tested. Charles Goldfarb, Raymond Lorie, and Edward Mosher tried it out in 1969 with Generalized Markup Language (GML). They got their company to use it for about 90 percent of its document production. But this was only at one little company so not too many Web standards experts would have noticed. Oh yes, the company name was "International Business Machines."

The American National Standards Institute (ANSI) published its first draft of Standard Generalized Markup Language (SGML) in 1980. A few small organizations, such as the United States Department of Defense, the Internal Revenue Service, and the Securities and Exchange Commission, began using the semantic markup features of the new language.

The most bizarre thing about HTML is that it borrows the (uninteresting) syntax of SGML:

<element> ... stuff being marked up ... </element>

To Web publisher and Web users who read "We have Chosen Shame and Will Get War", it seemed natural to me that the folks who set the Web standards would see the importance of semantic markup and machine processing of documents on behalf of users. A student of Max Weber, however, would not have been surprised that this paper was rejected and that the whole semantic markup issue was ignored for six years. People who write Web standards and go to Web conferences are not doing it because they have a passion for Web publishing or Web surfing. They have a passion for sitting on conference committees, sitting on standards committees, and escaping the boredom of their hometowns by going on company-paid trips to wherever these committee meetings happen to be taking place. The people who are passionate about publishing are busy building on-line applications. The people who are passionate about surfing are at home with their cable modems.

It has been nine years since the "Shame and War" paper was published. Has there been any progress since then? Yes and no. The Extensible Markup Language (XML) has been standardized by the World Wide Web Consortium (W3C). Described by Dan Lyke as "the subset of SGML that Microsoft's developers could understand", XML addresses the need for semantic markup but not the requirement that publishers agree on a common set of classes for semantic markup to be useful. With XML, each publisher or community of publishers can agree on some new document types and concomitant sets of tags. Internet Explorer can render XML. A variety of server-side tools are available for parsing XML, generating XML from databases, converting XML to HTML, and authoring XML. What does all of these XML tags mean though? More or less nothing, which is why there is another project at the Web Consortium: The Semantic Web (http://www.w3.org/2001/sw/).

If you are publishing structured data, does it make sense to use HTML or XML files? Neither. XML will let you store and exchange structured data. But that doesn't mean it addresses the same problems as database management systems. With XML, you can certainly keep a catalog of products for sale in a file system directory and easily write a computer program to pull out the price of an item. But you can't easily build an index to facilitate rapid retrieval of all the blue items or all the items available in size 6. XML lets you store how many items are left in inventory but you won't get any support for writing a program that subtracts 1 when an order is placed (and, more importantly, making sure that 10 simultaneous subtractions from different users won't collide). An XML document is like one record or a series of records in a database management system. XML is therefore useful if you want to ship a record from one database to another, but it doesn't really help you build the entire database.

For most publishers it is most sensible to keep their information in whatever database management system they're accustomed to and write scripts to generate either HTML or XML pages. With such an architecture a change in language standards or publishing requirements could be met by rewriting a couple of scripts rather than editing thousands of XML or HTML files. A "database"? Does that mean a relational database management system as discussed later in this book? No. If you aren't updating your data in real-time, an ordinary text file is fine.

For example, suppose that you are putting a company phone directory on the Web. You can define a structured format like this:

first name|last name|department|office number|home number|location

There is one line for each person in the directory. Fields are separated by vertical bars. So a file at MIT might look like this:

Philip|Greenspun|eecs|253-8574|864-6832|ne43-414 Rajeev|Surati|athletics|253-8581|555-1212|dupont gym ...

A public Web service offering names and office phone numbers for everyone at the university

A public Web page for each department showing names and office phone numbers

A private Web page for each department showing names and home phone numbers

When the XML wave has finally broken on the beach and someone comes up with a document type for phone listings, you can generate a set of private and public XML files containing names and phone numbers. People downloading an XML file will be able to tell their computer to dial the phone number automatically, since the number will be encased in a VOICE_PHONE_NUMBER element.

The high level message here is that you should think about the structure of the information you are publishing first. Then think about the best way to build an investment in that structure and preserve it. Finally, devote a bit of time to the formatting of the final HTML or XML that you generate and ship to users over the Web.

It's Hard to Mess Up a Simple Page

CD-ROMs are faster, cheaper, more reliable, and a more engaging audio/visual experience than the Web. Why then do they sit on the shelf while users greedily surf the slow, unreliable, expensive Web? Stability of user interface.

There are many things wrong with HTML. It is primitive as a formatting language and it is almost worthless for defining document structure. Nonetheless, the original Web/HTML model has one big advantage: All Web pages look and work more or less the same. You see something black, you read it. You see something gray, that's the background. You see something blue (or underlined), you click on it.

When you use a set of traditional Web sites, you don't have to learn anything new. Every CD-ROM, on the other hand, has a sui generis user interface. Somebody thought it would be cute to put a little navigation cube at the bottom right of the screen. Somebody else thought it would be neat if you clicked on the righthand page of an open book to take you to the next page. Meanwhile, you sit there for 15 seconds feeling frustrated, with no clue that you are supposed to do anything with that book graphic on the screen. The CD-ROM goes back on the shelf.

The beauty of the browsers built since 1995 is that they allow the graphic designers behind Web sites to make their sites just as opaque and hard to use as CD-ROMs. Graphic designers are not user interface designers. If you read a book such as Macintosh Human Interface Guidelines (Apple Computer, Inc.; Addison-Wesley, 1993), you will appreciate what kind of thought goes into a well-designed user interface. Most of it has nothing to do with graphics and appearance. Pull-down menus are not better than pop-up menus because they look prettier; they are better because you always know exactly where to find the Print command.

Some of the bad things a graphic designer can do with a page were possible even way back in the days of Netscape 1.1. A graphic designer might note that most of the text on a page was hyperlinks and decide just to make all the text black (text=#000000, link=#000000, vlink=#000000). Alternatively, he or she might choose a funky color for a background and then three more funky colors for text, links, and visited links. Either way, users have no way of knowing what is a hyperlink and what isn't. Often designers get bored and change these colors even for different pages on the same site.

There is probably a place in this world for Web sites that are pretty rather than functional. Nonetheless it is worth weighing the prettiness of a new design against the cold shock of unfamiliar user interface that greets the user.



Java and Flash -- The BLINK Tag Writ Large

"Glad to hear that your company is so profitable," We responded. "Since you're able to hire 50 tech support people for your Web site then you must be raking in the bucks."

"What do you mean 50 tech support people?!?!" he asked.

"If people can't get a plug-in to work on their Windows machine or can't figure out how to record from a microphone on their PCs then surely you must have a plan for dealing with all the support emails that they'll be sending to your webmaster," we said.

"Uh... well, I guess we have to think about that...," he mumbled as he wandered off.

Before you spend money on animation, Java, or authoring content for a plug-in, think about whether you couldn't buy the on-line rights to an interesting book on your Web site's subject. Remember that search engines don't recognize animations, Java applets, or graphic design. Search engines index text (Google does photos too but that is a separate and comparatively seldom-used application). Therefore an on-line book is going to pull a tremendous number of people into your site.

Maybe you have infinite money and can buy the book plus a raft of multimedia authors. It still might be worth remembering what brought users to the Web in the first place: control and depth. Software such as Java and Flash enables you to lead users around by the nose. Flash them a graphic here, play them a sound there, roll the credits, and so on. But is that really why they came to your site? If they want to be passive, how come they aren't watching TV or going to a lecture?

This may seem like an obvious point, but worth mentioning because there are so many tools to convert PowerPoint presentations into Web sites. The whole point of a PowerPoint-style presentation is that you have a room full of people all of whose thoughts are to be herded in a common direction by the speaker. Ideas are condensed to the barest bones because there is such limited time and space available and because the speaker is going to embroider them. The whole point of the Web is that each reader finds his own path through a site. There is unlimited time and space for topics in which the reader has a burning interest.

A Java applet can make a good site great in the following situations:

You need a richer user interface than you can get with HTML forms.

You need to respond to user input without network delays--mouse movements, for example.

You need to give the user real-time updates.

Richer User Interface

Don't get too excited by the possibility of offering a rich custom user interface with Java. Adobe PhotoShop has a beautiful user interface but it took Adobe hundreds of person-years to perfect it. It takes Adobe hundreds of person-years to test each new version. It costs Adobe millions of dollars to write documentation and prepare tutorials. It takes users hours to learn how to use the program. You don't have a huge staff of programmers to concentrate on a single application. You don't have a full-time quality assurance staff. You don't have a budget for writing documentation and tutorials. Even if you did have all of those things, your users don't have extra hours to spend learning how to use the Web site that you build. Either they are experienced Web users and they want something that works like other sites or they are naïve users who want their effort in learning to use their browser and your site to pay off when they visit other sites.

Real-time Response

You would not want to use a drawing tool that needed to go out to the network to add a line. An HTML forms-based game might be fun for your brain but it probably won't have the visceral excitement of a first-person shooter game on Xbox. Anything remotely like a video game requires code executing on the user's local processor.

Real-time Updates

<META HTTP-EQUIV=REFRESH CONTENT="60; URL=update.cgi">

The user's browser will fetch "update.cgi" 60 seconds after grabbing the page with this element.

In 1995 some folks at Boston Children's Hospital built a Web-based real-time patient monitoring system using Java applets. Data from instruments attached to people in the intensive care unit (ICU) would be streamed back to doctors who could be in another area of the ICU, working at another institution, or relaxing at home. See "A real time patient monitoring system on the World Wide Web" by K. Wang, I. Kohane, et al. in Proceedings AMIA Annual Fall Symposium 1996;:729-32.

Oh yes, it will crash the user's browser

"Java on the client doesn't work, and we at Netscape have done an about-turn on client-side Java in recent months."

-- Marc Andreessen, VP Products at Netscape (Quoted in a trade journal, July 1998)

Why Graphic Designers Just Don't Get It

Creating Killer Web Sites

Siegel is making some implicit assumptions: that there are no users with text-only browsers; that users are willing to wait several minutes before getting to the content of a site; that there is some obvious place to put these tunnels on a site with thousands of pages. Even if all of those things are true, if the internal pages do indeed contain any content, the public search engines will roar through and wreck everything. People aren't going to enter the site by typing in "http://www.greedy.com" and then let themselves be led around by the nose by a designer. They will find the site by using a search engine and typing a query string that is of interest to them. Google does not think a Dave Siegel (TM) "entry tunnel" is "killer". In fact, it might not even bother to index a page that is just one image (search engines can't read text that is inside GIF or JPEG image files).

If you intend to get radical by putting actual content on your Web server, then it is probably a good idea to make each URL stand on its own. Making a URL stand on its own has implications for site navigation design. Each document will need a link to the page's author, the service home page, and the next page in a sequence if the document is part of a linear work. Remember, the Web is not there so that publishers can impose what they think is cool on readers. Each reader has his own view of the Web. Maybe that view is returned by a search engine in response to a query string. Maybe that view is links from a friend's home page. Maybe that view is a link from a personalization service that sweeps the Internet every night to find links and stories that fit the reader's interest profile.

Our task as Web publishers is to produce works that will fit seamlessly not into the Web as we see it, but into the many Webs that our readers see.

An Information Designer Who Got It

The screen should contain information, not navigation or administration icons. The information should become the interface, i.e., clicking on a word that was itself informational should take you to a screen with more detailed information.

Give users broad flat overviews of the information (e.g., tables of contents) rather than forcing them through sequential screens of choices.

Organize your data according to expected user interest rather than mimicking the internal structure of your organization [see the university research lab example in Chapter 1].

Why use icons for navigation when words are clearer and take up less screen space?.

The Alexander Nevsky of the Long-Suffering Users

"Time how long it takes to download the home page."

63 seconds. "Now time how long it takes to get the first results back from a search."

90 seconds.

"What do you guys plan to do about this?" asked the president.

"Uh... Well.. we could get a faster server," responded the Web expert.

"Great. Thanks. You're all fired."

Our friend was specifically asked by the president to do a site with no animation and no Java. The focus would be entirely on a fast search of a server-side database.

Multi-Page Design and Flow

Let's look at general design principles that can be applied to different kinds of Web applications.

One of the things that users love about the Web is the way in which computation is discretized. A desktop application is generally a complex miasma in which the state of the project is only partially visible. Despite software vendors having added multiple-level Undo commands to many popular desktop programs, the state of those programs remains opaque to users.

The first general principle is Don't break the browser's Back menu. Users should be able to go forward and back at any time in their session with a site. For example, consider the following flow of pages:

choose a book

enter shipping address

enter credit card number

confirm

thank-you

The second general principle is Have users pick the object first and then the verb. For example, consider the customer service area of an ecommerce site. Assume that Jane Consumer has already identified herself to the server. The merchant can show Jane a list of all the items that she has ever purchased. Jane clicks on an item (picking the object) and gets a page with a list of choices, e.g., "return for refund" or "exchange". Jane clicks on "exchange" (picking the verb) and gets a page with instructions on how to schedule a pickup of the unwanted item and pages offering replacement goods.

How original is this principle? It is lifted straight from the Apple Macintosh circa 1984 and is explicated clearly in Macintosh Human Interface Guidelines (Apple Computer, Inc.; Addison-Wesley, 1993). Originality is valorized in modern creative culture but it was not a value for medieval authors and it does not help users. The Macintosh was enormously popular to begin with and then Microsoft went on to monopolize the desktop with a copy of the Macintosh. Web publishers can be sure that the vast majority of their users will be intimately familiar with the "pick the object then the verb" style of interface. Sticking with a familiar user interface cuts down on user time and confusion at a site.

What happens when publishers ignore these guidelines?

Date: Wed, 5 Aug 1998 23:20:33 -0400 (EDT) From: Garrett Wollman <wollman@khavrinen.lcs.mit.edu> To: philg@MIT.EDU Subject: Another bad Web user interface example I thought it was a really great idea when BankBoston [ed: now Fleet] replaced their clunky modem-only terminal-based home banking system with one that works over the Internet. Not only is the user interface painfully slow (downloading images that are different for every page over a 33.6 modem and an encrypted connection), but it totally disregards the perfectly good user interface built into my browser. In particular, if you try to actually navigate anywhere and then do something, you get this: Error Description 2300004 - Screen Error You only need to click once on your selection. Please do not double-click, use the buttons on your browser, or open a second window while logged on to BankBoston. You can use the buttons on the screen or Short Cut to move around the system easily. Oops! What's worse, once you get it into this sort of a state, it is totally unable to unwedge itself, and going back to the login screen gives only the helpful message: Error Description 1501002 - Invalid Card Number This Card Number is currently logged on. Please make sure you have logged off the system and try again. Um, hello?! The only way to communicate with them is through the feedback function, which I can't use because their system won't talk to me. (Of course, the whole thing is run under Windows NT, judging by the file names. I think I should probably be very concerned about that.) Eventually, I was able to ``log in'' again, and got to the `feedback' section to send them a message. I spent about ten minutes composing my diatribe, hit the ``images'' button to figure out which inline image was hiding the ``send'' function, and then hit it. It comes back with another error message -- the date I had given it did not fit its simple-minded notions of what a date should look like, so it gave me another error message. Of course, I didn't want to lose my carefully composed diatribe, so I hit Meta-Back to get back to the form. (I then added another paragraph about what kind of idiot would give users a blank text field to enter a date without any indication that the simple-minded program would only accept one form.) You can of course guess what happened: I changed the format of the date to the one it wanted, hit ``send'', and it gave me the original error message again. Oops... better find something to waste ten minutes doing until it times out again! All in all, it took me a good hour to finally send my message, and I never did manage to pay my bills.

According to the July 27, 1997 issue of PC Week , BankBoston's Web service [now Fleet Bank] was developed by Sapient, a consulting firm, using virtually the entire panoply of technologies that were fashionable in corporate IT departments at the time: WebObjects, Windows NT, and a C++ and CORBA middleware layer.

Summary

Learning basic HTML shouldn't take more than a few minutes.

The more HTML you know, the uglier and harder to use your site is likely to be.

HTML is not powerful enough to express the most interesting structural characteristics of your documents

XML is powerful enough to represent structure, but XML documents represent records in a database, not a database management system

You may want to keep your content in a database management system of some kind instead and generate HTML and XML pages programmatically

If you have a limited budget, spend it on content that search engines can index rather than style and flash.

Don't forget that using Java applets or plug-ins will get you into the business of educating and supporting users.

Because a search engine can send users to any document at your site, every document on your site should have navigation links to the rest of your content.

More

philg@mit.edu

Reader's Comments

The point about a consistent user interface is very important. As a university student I have come into contact with WebCT Vista. This program is a web based program that is supposed to enable students and teachers to interact and to enable students to find information (at least I think that is what it is meant to do). It is a god awful program that uses Java and Javascript and other nasties, breaks the user interface of the web browser (for example the back button) and is not internally consistent. An example of an internal inconsistency (regarding the user interface), there are two separate home buttons that are not distinguishable , one that links to the students home on WebCT Vista, and one to the home of the subject. I hate the program and have only talked to one person who likes it. Rather they tolerate it 'cause they are blind and have been taught to use it.



-- Anonymous Smith, January 20, 2007

I'm interested in marking-up existing HTML documents for my own use for my research.--highlight, add comment, etc. I can't afford acrobat, and have also tried to save the page as a pdf and import to word. Any suggestions?



-- Ellen Frick, February 6, 2007