CHAPTER 3 SGML/XML:

WHY SGML/XML?




WHY SGML/XML? EXAMPLES HOW XML BROWSERS WORK SOME MARKUP LANGUAGES SGML AND XML SOFTWARE





[UP]WHY SGML/XML?


As mentioned in Chapter 2, SGML and XML specs allow people from all fields or academic subjects to create their own document types and markup languages, to better exchange information specific to their field.

XML is an "abbreviated" version of SGML, designed to simplify the creation and presentation of Web-based documents.



[ The following is reprinted from Chapter 1 ]

"A basic design goal of SGML was to ensure that documents encoded according to its provisions should be transportable from one hardware and software environment to another without loss of information."
**[ REF 1 ]**

SGML is independent of computer type (it is "platform independent"). It is an international standard (ISO 8879), not a privately-owned spec for electronic documents. It has high archival value. It is "extensible," meaning that it allows and can accomodate change and growth. Unlike HTML, it allows users to specify the complex hierarchies associated with database schema or "object-oriented" work.

SGML and XML derive "from a philosophy that data belongs to its creators and that content providers are best served by a data format that does not bind them to particular script languages, authoring tools, and delivery engines but provides a standardized, vendor-independent, level playing field upon which different authoring and delivery tools may freely compete."
**[ REF 2 ]**

SGML and XML make possible "the ability to capture and transmit semantic and structural data."
For example, "An installation sheet that carries warnings in multiple languages can be made to show just the ones in the language selected by the user." Or, a document which is displayed sorted by last name, can instantly be displayed sorted by first name.
"A document containing many annotations can be switched from a mode that shows only the text, to a mode that shows only the annotations, to a mode that shows both, just by making a menu selection." (This is particularly useful in humanities research.)
**[ REF 2 ]**

"XML provides a standard way for information providers to add custom markup to information-rich documents, so that complex documents can be rendered (and published) in a dynamic way."
"XML provides the means to publish and receive any information, regardless of format or origin, in any way desired."
**[ REF 4 ]**








EXAMPLES

[The following is reprinted from Chapter 2]

A powerful example of SGML's capability can be found in IATH's "William Blake Archive," which features SGML-encodings of the writings and pictorial works of English poet and painter William Blake (1757-1827). The SGML markup allows users to create advanced database queries of the Archive's contents (both written works and pictorial works), with the results returned quickly to the user. Advanced database capability is another benefit of SGML's content markup tags.

IATH's "Rossetti Archive" features SGML-encodings of the writings and paintings of 19th-century poet and painter D.G. Rossetti. (Pictorial works are marked up with tags which specify medium, dimensions, frame, etc.). The SGML markup includes full scholarly annotations and notes. The Archive's web site even features a simple "virtual reality" model of Rossetti's studio, allowing students to "walk around" inside the studio (assuming, of course, that a VRML plug-in has been added to the user's web browser). The Rossetti Archive's web page features a search function, which relies on the Archive's SGML markup.

IATH's "The World of Dante" web site features an advanced structured search capability for Dante's Inferno. The site also features a virtual reality (VRML) wire-frame model of Dante's Inferno, with colored triangles representing persons, creatures, etc. The user chooses which types of inhabitants are to be displayed, even specifying subclasses (e.g., mythical persons, historical persons, etc.). The SGML markup allows these choices to be located quickly in the text, and the appropriate inhabitants are then injected into the virtual reality model. The user can then "fly around" inside the model. This allows students to get a visual feel for how the inhabitants being studied are situated and arranged in Dante's work.

(The above sites will be examined in detail, in Chapter 4.)








HOW XML BROWSERS WORK

XML's specs allow users and developers to create customized content-labeling tags suited to a particular type of electronic document (poetry, chemistry, etc.)

An XML-spec document is viewed with an XML-capable Web browser.
The browser uses an "XML parser" program to locate the document's markup tags, which then allows the browser to establish how the content tags are nested within one another, which then allows the browser to establish a hierarchical "tree" of the document's contents.

Once the document's contents have been parsed in this way, the proper styling/layout of the elements is determined by the "style sheet" associated with the document. E.g., all <poemtitle> text should be centered and in large print.

The browser then displays the XML document's contents, in the format specified by the style sheet (or style sheets).








SOME MARKUP LANGUAGES

HTML is the most well-known markup language.

MathML (or MML, the Mathematics Markup Language) is an XML application which is being improved and fine-tuned. This is also true for CML (Chemical Markup Language, another XML-based markup language). MathML and CML browsers are currently in their infancy; e.g., JUMBO is a pilot XML browser geared toward CML, but it is said to contain many bugs.

(SGML/XML software is still in its infancy, since the specs for XML, Style Sheets, etc., are so new. For instance, XML capability in Netscape's Communicator 5.0 is being stalled because XSL's spec is not yet finalized. Standardization of specs and implementations is expected by the end of this year, or early next year.)

Other markup languages include JSML (Java Speech Markup Language) which was developed at Sun Microsystems, and STML (Spoken Text Markup Language) which was developed at Bell Labs.

SGML is not only useful for Web-based presentation. As mentioned earlier, SGML's specs are being used to facilitate information storage and exchange.
For example, a Medical Markup Language is being developed to exchange medical records quickly, efficiently, and -- most importantly -- in a platform-independent (computer-independent) manner. SGML's content tags are particularly useful here, since tags such as <allergy> neatly tag critical medical information.








SGML AND XML SOFTWARE

Microsoft's Internet Explorer 4.0 and 5.0 have basic XML capability, Netscape's Communicator 5.0 (due later this year) will have XML capability, Word Perfect 8 includes extensive SGML software, and many major companies are currently developing SGML/XML products.

Some of IATH's humanities SGML software is listed in Chapter 4.

Chapter 5 will list important URLs, including URLs for SGML/XML software and resources.





BACK TO: Project #1, Table of Contents
Current page was last modified/altered on: 8-31-98