Read HTML The Definitive Guide Online
Authors: Chuck Musciano Bill Kennedy
2.5 The Flesh on an HTML Document
Except for the ,
, , and2.5.1 Comments
Like computer-programming source code, a raw HTML document, with all its embedded tags, can quickly become nearly unreadable. We strongly encourage that you use HTML comments to guide your composing eye.
Although it's part of your document, nothing in a comment, including the body of your comment that goes between the special starting tag "" gets included in the browser display of your document. Now you see a comment in the source, like in our simple HTML
example, and now you don't on the display, as evidenced by our comment's absence in Figure 2.1.
Anyone can download the source text of the HTML document and read the comments, though, so be
careful what you write. [Comments, 3.4.3]
2.5.2 Text
If it isn't a tag or a comment, it's text. The bulk of content in most of your HTML documents - the part readers see on their browser displays - is text. Special tags give the text structure, such as headings, lists, and tables. Others advise the browser how the content should be formatted and displayed.
2.5.3 Multimedia
What about images and other multimedia elements we see and hear as part of our web browser displays? Aren't they part of the HTML document? No. The data that comprise digital images, movies, sounds, and other multimedia elements that may be included in the browser display are in documents separate from the HTML document. You include references to those multimedia elements via special tags in the HTML document. The browser uses the references to load and integrate other types of documents with your HTML text.
We didn't include any special multimedia references in the previous example simply because they are separate, nontext documents you can't just type into a text processor. We do, however, talk about and
give examples on how to integrate images and other multimedia in your HTML documents later in this chapter, as well as in extensive detail in subsequent chapters.
2.4 HTML Skeleton
2.6 HTML and Text
2.6 HTML and Text
Text-related HTML tags comprise the richest set of all in the standard language. That's because HTML emerged as a way to enrich the structure and organization of text.
HTML came out of academia. What was and still is important to those early developers was the ability of their mostly academic, text-oriented documents to be scanned and read without sacrificing their ability to distribute documents over the Internet to a wide diversity of computer display platforms. (ASCII text is the only universal format on the global Internet.) Multimedia integration is something of an appendage to HTML, albeit an important one.
And page layout is secondary to structure in HTML. We humans visually scan and decide textual relationships and structure based on how it looks; machines can only read encoded markings. Because HTML documents have encoded tags that relate meaning, they lend themselves very well to computer-automated searches and recompilation of content - features very important to researchers.
It's not so much
how
something is said in HTML as
what
is being said.
Accordingly, HTML is not a page-layout language. In fact, given the diversity of user-customizable browsers as well as the diversity of computer platforms for retrieval and display of electronic documents, all HTML strives to accomplish is to
advise,
not dictate, how the document might look when rendered by the browser. You cannot force the browser to display your document in any certain way. You'll hurt your brain if you insist otherwise.
2.6.1 Appearance of Text
For instance, you cannot predict what font and what absolute size - 8-or 40-point Helvetica, Geneva, Subway, or whatever - will be used for a particular user's text display. Okay, so the latest browsers now support HTML style sheets and other desktop publishing-like features that let you control the layout and appearance of your documents. But users may change their browser's display characteristics and override your carefully laid plans at will; quite a few of the older browsers out there don't support these new layout features; and some browsers are text-only with no nice fonts at all. What to do? Concentrate on content. Cool pages are a flash in the pan. Deep content will bring people back for more and more.
Nonetheless, style does matter for readability, and it is good to include it where you can, as long as it doesn't interfere with content presentation. You can attach common style attributes to your text with
physical style
tags like the italic tag in the simple example. More importantly and truer to the language's original purpose, HTML has
content-based
style tags that attach
meaning
to various text passages. And you can alter text display characteristics, such as font style and size, color, and so on, with Cascading Style Sheets.
All of today's graphical browsers recognize the physical and content-related text style tags and change the appearance of their related text passage to visually convey meaning or structure. You just can't predict exactly what that change will look like.
2.6.1.1 Content-based text styles
Content-based style tags indicate to the browser that a portion of your HTML text has a specific usage or meaning. The tag in our simple example, for instance, means the enclosed text is some sort of citation - the document's author, in this case. Browsers commonly, although not universally,
display the citation text in italic, not as regular text. [Content-based Style Tags, 4.4]
While it may or may not be obvious to the current reader that the text is a citation, someday, someone might create a computer program that searches a vast collection of HTML documents for embedded tags and compiles a special list of citations from the enclosed text. Similar software agents already scour the Internet for HTML-embedded information to compile listings, such as the infamous Webcrawler and the AltaVista database of web sites.