CHAPTER 1 SGML/XML:

TERMS, DEFINITIONS,
AND INFORMATION



By Ivan Bajic
For
The Education Center on
Computational Science and Engineering





You can move through the document by using your browser's scroll bars.
(Or, to quickly jump down to a particular definition, just click on it in the following list.
To quickly jump back up to this list, click on any of the document's "up arrow" icons.)


1. Markup Languages   5. "Valid" and "Well-Formed" Documents   9. CSS Positioning   13. The W3C
2. SGML   6. Style Sheets   10. DHTML   14. Bibliography
3. XML   7. CSS   11. DSSSL    
4. DTD   8. CSS1, CSS2   12. XSL    





[UP] MARKUP LANGUAGES

Markup languages specify the insertion of markup tags into electronic documents. In general, markup tags are structure and content tags, such as <poemtitle>, but style tags are sometimes used too, such as HTML's <B> bold-text tag.

[UP] SGML (STANDARD GENERALIZED MARKUP LANGUAGE)

Not really a markup language, but rather a "metalanguage." SGML is the international standard for creating markup languages. "A basic design goal of SGML was to ensure that documents encoded according to its provisions should be transportable from one hardware and software environment to another without loss of information." **[ REF 1 ]** SGML is independent of computer type (it is "platform independent"). It is an international standard (ISO 8879), not a privately-owned spec for electronic documents. It has high archival value. It is "extensible," meaning that it allows and can accomodate change and growth. Unlike HTML, it allows users to specify the complex hierarchies associated with database schema or "object-oriented" work. SGML and XML derive from a philosophy that "data belongs to its creators and that content providers are best served by a data format that does not bind them to particular script languages, authoring tools, and delivery engines but provides a standardized, vendor-independent, level playing field upon which different authoring and delivery tools may freely compete." **[ REF 2 ]** SGML and XML make possible "the ability to capture and transmit semantic and structural data." For example, "An installation sheet that carries warnings in multiple languages can be made to show just the ones in the language selected by the user." Or, a document which is displayed sorted by last name, can instantly be displayed sorted by first name. "A document containing many annotations can be switched from a mode that shows only the text, to a mode that shows only the annotations, to a mode that shows both, just by making a menu selection." (This is particularly useful in humanities research.) **[ REF 2 ]** A large list of SGML projects in academia, including many humanities computing projects, can be found at http://www.sil.org/sgml/acadapps.html A large list of SGML projects in government and industry can be found at http://www.sil.org/sgml/gov-apps.html A large list of general SGML projects and applications can be found at http://www.sil.org/sgml/gen-apps.html

[UP] XML (EXTENSIBLE MARKUP LANGUAGE)

An "abbreviated" version of SGML, since the full SGML spec is somewhat intimidating. XML makes it easy for people to use, understand, and create markup languages, particularly for Web-based documents. XML is developed by a working group of the World Wide Web Consortium (W3C). The W3C gave "Recommendation" status to the XML (version 1.0) spec in February 1998. "For example, XML will allow online booksellers to use tags such as 'price,' 'number of pages,' and 'author.' A customer with an XML browser then can use these specific criteria to sort through the inventory and arrange the results on the desktop. Combined with Java programs as well as new techniques known collectively as 'dynamic HTML,' XML will make browsers much more flexible." "XML was created and developed by the W3C XML Working Group, which includes key industry players such as Adobe, DataChannel, Hewlett-Packard, Microsoft, Netscape Communications, and Sun Microsystems, as well as experts in structured documents and electronic publishing." "Several of those companies have released or announced plans [early 1998] to release Internet software that reads XML." **[ REF 3 ]** "The XML solution is system-independent, vendor-independent, and proven by over a decade of SGML implementation experience. XML merely extends this proven approach, to document interchange over the Web." **[ REF 2 ]** "XML provides a standard way for information providers to add custom markup to information-rich documents, so that complex documents can be rendered (and published) in a dynamic way." "XML provides the means to publish and receive any information, regardless of format or origin, in any way desired." **[ REF 4 ]** Example: A hypothetical XML-based "NPACI Markup Language" document might look as follows. Notice the high degree of similarity between a NPACI-ML document and an HTML document. <?XML VERSION="1.0" ?> <!DOCTYPE NPACI SYSTEM "npaci.dtd"> <NPACI> <EDCENTER> <PERSON> <DIRECTOR> Dr. Kris Stewart </DIRECTOR> </PERSON> <PERSON> <ASSISTANT> Dolores Candelario </ASSISTANT> </PERSON> </EDCENTER> </NPACI> If no DTD were associate with this document, the first line would have been replaced with <?XML VERSION="1.0" STANDALONE="YES" ?> and the second line would have been omitted. In addition to users being able to view NPACI-ML documents on the Web, NPACI-ML could also be used to "instantly" determine the entire status of the NPACI project -- e.g., an XML parser could quickly find and count the number of directors and assistants, or even the number and types of computer systems. Or, the <person> elements might contain short <biography> information, which could be called up with just a click of the mouse. Much better illustrations of SGML/XML's power can be found in IATH's humanities archives, which will be discussed in Chapter 4 of this Project #1 write-up.

[UP] DTD (DOCUMENT TYPE DEFINITION)

"SGML introduces the notion of a 'document type', and hence a 'document type definition.' Documents are regarded as having types. The type of a document is formally defined by its constituent parts and their structure." **[ REF 1 ]** The most well-known document type is the "HTML" document. SGML's and XML's specs allow users to create their own document types. For example, one humanities researcher created an XML-based "PLAY" document type, for Shakespeare's plays. Typical markup tags embedded into the text of the plays include <persona>, <scene>, <speaker> and <speech>. SGML requires that each document be associated with a DTD. For XML documents, DTDs are optional, but only if the documents are "well formed" (this will be defined shortly). DTDs specify allowable tags, such as <poemtitle>, and allowable nesting of tags. The purpose of a DTD is to formally define a new document type, and to make it possible for software to properly parse a document's content. Example: The following DTD is for a hypothetical XML-based "NPACI Markup Language." The DTD can be embedded in-line in a NPACI document. Or, the DTD can be saved in a file "npaci.dtd", and the NPACI document can link to it remotely (as shown in the "Example" document in the "XML" Section above). <!DOCTYPE NPACI [ <!ELEMENT NPACI (EDCENTER)*> <!ELEMENT EDCENTER (PERSON)> <!ELEMENT PERSON (DIRECTOR,ASSISTANT)> <!ELEMENT DIRECTOR (#PCDATA)> <!ELEMENT ASSISTANT (#PCDATA)> ]> Example: The following DTD is for a sample SGML-based literary document type which is given in *[REF 1]* <!DOCTYPE anthology [ <!ELEMENT anthology - - (poem+) > <!ELEMENT poem - - (title?, stanza+) > <!ELEMENT stanza - O (line+) > <!ELEMENT (title | line) - O (#PCDATA) > ]>

[UP] "VALID" AND "WELL-FORMED" DOCUMENTS

SGML requires that each document be associated with a DTD. DTDs are optional for XML documents, provided that the documents are "well formed." (This is one of the lax rules that makes XML easier to use than SGML.) An XML document is "WELL-FORMED" if..... .....it uses starting tags <tag> and ending tags </tag> .....attribute values are in quotes.....empty element tags such as <ParagraphBreak> are rendered as either <ParagraphBreak/> or else with a corresponding ending tag </ParagraphBreak> added immediately after it .....tags are nested properly, not overlapping.....A few other, more obscure rules. A document is "VALID" if it is associated with a DTD, adheres to the DTD's markup rules, and is well-formed. "A well-formed XML document is unambiguous, so that a browser or editor can read the tags and create a tree of the hierachical structure without having to read its Document Type Definition." **[ REF 4 ]**

[UP] STYLE SHEETS

A style sheet is a file or set of instructions which tell a Web browser how a document's elements are to be displayed -- e.g., <header> text should be displayed in green color and with 80-point Times-Roman font, etc. Example: H1 { font-size:80pt; color:green; font-weight:bold; } When properly embedded in a file, or properly linked to as a remote file, this will cause <H1> header text to be displayed in 80-point, green, bold font.

[UP] CSS (CASCADING STYLE SHEETS)

CSS are the style sheets generally associated with HTML documents. However, the XML in Microsoft's Internet Explorer uses CSS, rather than XSL style sheets. Netscape Navigator/Communicator 5.0's XML, due out later this year, will also use CSS to convert XML data into HTML display, but Netscape says that they are working to integrate XSL into the browser as soon as possible. Style sheets are found only in the 4+ browsers (Netscape Navigator 4 and later, and Microsoft Internet Explorer 4 and later). These are the "CSS-capable" browsers. "The HTML4 spec states that CSS can be written inline or specified as an external file." **[ REF 5 ]** "As the term cascading style sheets implies, more than one style sheet can be used on the same document, with different levels of importance. If you define conflicting styles for the same HTML tag, the innermost definition -- the one closest to the tag -- wins." **[ REF 6 ]** Appendix 1 (of this Project #1 write-up) is a primer on writing CSS.

[UP] CSS1, CSS2

"Cascading Style Sheets, Level 1" and "CSS, Level 2." CSS2 is the newest spec for style sheets. "In December 1996, the CSS1 specification became a W3C Recommendation." **[ REF 7 ]** The term "cascading" implies that "more than one style sheet can "cascade" together to produce the final look of the document: individual style sheets from different sources can be combined." **[ REF 7 ]** "In May 1998, the CSS2 specification was released as a W3C Recommendation. CSS2 gives content creators, designers and readers the powerful tools they need to realize the full potential of their HTML and XML documents. CSS2 includes all the power of CSS1, and adds enhancements in several areas to make the Web more appealing for both content providers and users. Although originally developed for HTML, CSS has been designed to allow you to style XML documents also." "The CSS2 Recommendation is based upon CSS1 and is a prerequisite for the Document Object Model (DOM), W3C's platform- and language-neutral interface, which allows programs and scripts to dynamically access and update the content, structure, and style of documents." **[ REF 7 ]** CSS1 is supported by the 4+ browsers (Navigator 4+ and MSIE 4+). However, "there are some disparities where CSS1 features have not been fully or correctly implemented." **[ REF 7 ]** A condensed version of W3C's CSS1 spec can be found in *[ REF 6 ]*. It lists the most important information about each CSS property and its available values. This "CSS Reference Table" is located at http://builder.cnet.com/Authoring/CSS/table.html

[UP] CSS POSITIONING

"An extension to CSS1 that lets you control an object's precise position on the page. You can also specify whether an object is visible or hidden, and even layer objects on the page. Although not yet a W3C- recommended standard, CSS positioning is a working draft, stable enough for public discussion, although it may still change before it's final." **[ REF 6 ]** "CSS positioning is already supported by both Netscape Navigator 4.0 and Microsoft Internet Explorer 4.0." **[ REF 6 ]** "It works like this. When a browser renders an object on the page with CSS positioning, it places it into an invisible rectangle, called a bounding box. You can set the box's exact distance from the top and/or left edges of the browser window, or you can offset the box from other elements on the page. You can also specify the height and width of the box. You can even layer objects on top of one another. And since objects can overlap, CSS positioning includes clipping features that let you cut off an area of an element -- for example, to have a top object reveal another one beneath it. Finally, you can make entire objects visible or invisible." **[ REF 6 ]** Example: H1 { position: absolute; top: 150px; width: 200px; height: 200px } Will place <H1> header text inside a box which is 150 pixels from the top of the page and is 200 pixels wide by 200 pixels high. **[ REF 6 ]** Example: A better example uses in-line styles and <DIV> and <SPAN> tags. <DIV style="position:absolute; top:150px; width:200px; height:200px; background-color:red">This is text in a red 200-by-200-pixel box that is 150 pixels from the top of the window.<DIV> **[ REF 6 ]**

[UP] DHTML (DYNAMIC HTML)

"Combines CSS, CSS positioning and -- most critically -- the document object model (DOM) to let you create dynamic and interactive pages. Of these, only CSS1 is a W3C-recommended standard, so right now a dynamic page designed for IE won't work in Communicator, and vice versa. Standardization of the DOM -- the interface that allows developers to control individual objects on a Web page -- is under way but by no means complete." **[ REF 6, late 1997 ]**

[UP] DSSSL (DOCUMENT STYLE AND SEMANTICS SPECIFICATION LANGUAGE)

The international standard for style sheets for SGML documents. SGML uses DSSSL the same way that HTML uses CSS.

[UP] XSL (EXTENSIBLE STYLE SHEET LANGUAGE)

The syntax for style sheets for XML documents. "We expect that CSS will be used to display simply-structured XML documents, and XSL will be used where more powerful formatting capabilities are required or for formatting highly structured information such as XML structured data or XML documents that contain structured data." **[ REF 8, 1997 ]** XSL is much more powerful than CSS. However, according to the W3C, whose Working Groups are developing XSL, "the syntax for XSL is still under development and subject to change." **[ REF 7, June 1998 ]** Note: W3C published the first "Working Draft" of the XSL 1.0 spec, on August 18, 1998. (After attaining "Working Draft" status, a W3C spec must then attain "Proposed Recommendation" and "Recommendation" status.)

[UP] THE WORLD WIDE WEB CONSORTIUM (W3C)

http://www.w3.org They develop the "recommended" specs for many Web-related things, such as XML, XSL, CSS, etc. "The W3C was founded in October 1994 to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. We are an international industry consortium, jointly hosted by the MIT Laboratory for Computer Science (in the U.S.), the Institut National de Recherche en Informatique et en Automatique (in Europe), and the Keio University Shonan Fujisawa Campus (in Japan). "Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users; reference code implementations to embody and promote standards; and various prototype and sample applications to demonstrate use of new technology. "Initially, the W3C was established in collaboration with CERN, where the Web originated, with support from DARPA and the EC. "The Consortium is led by Tim Berners-Lee, Director and creator of the World Wide Web, and Jean-Francois Abramatic, Chairman. "W3C is funded by Member organizations, and is vendor neutral, working with the global community to produce specifications and reference software that is made freely available throughout the world." "The Consortium attempts to find common specifications for the Web so that through dramatic and rapid evolution, many organizations can work in their own fields and build on top of the global information space which is the web." **[ REF 9 ]** "Specifications developed within the Consortium must be formally approved by the Membership. Consensus is reached after a specification has proceeded through the review stages of Working Draft, Proposed Recommendation, and Recommendation. **[ REF 10 ]**

[UP] BIBLIOGRAPHY

REF 1 "A Gentle Introduction to SGML" http://www-tei.uic.edu/orgs/tei/sgml/teip3sg/index.html REF 2 "XML, Java, and the future of the Web" By Jon Bosak, Sun Microsystems http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm REF 3 "W3C Makes XML a Standard" Erich Luening, CNET NEWS.COM http://www.news.com/News/Item/0,4,19008,00.html REF 4 "A Brief Introduction to XML" http://????????? REF 5 "WebDeveloper.com's Guide to Cascading Style Sheets" By Scott Clark http://www.webdeveloper.com/categories/html/html_css_1.html REF 6 "Get Started With Cascading Style Sheets" By Matt Rotter and Charity Kahn http://www.cnet.com/Content/Builder/Authoring/CSS/ REF 7 "Style Sheets Activity" By the W3C, June 1998 REF 8 "A Proposal for XSL" By Microsoft Corp., ArborText, and Inso Corp., 1997 REF 9 "About The World Wide Web Consortium" By the World Wide Web Consortium http://www.w3.org/Consortium/ REF 10 "W3C Backgrounder : Recommendation Process" By the World Wide Web Consortium http://www.w3.org/Press/Backgrounder.html

BACK TO: Project #1, Table of Contents
Current page was last modified/altered on: 8-31-98