CHAPTER 3 SGML/XML:
WHY SGML/XML?
WHY SGML/XML?
As mentioned in Chapter 2, SGML and XML specs allow people from all fields
or academic subjects to create their own document types and markup languages,
to better exchange information specific to their field.
XML is an "abbreviated" version of SGML, designed to simplify the creation
and presentation of Web-based documents.
[ The following is reprinted from Chapter 1 ]
"A basic design goal of SGML was to ensure that documents encoded according
to its provisions should be transportable from one hardware and software
environment to another without loss of information."
**[ REF 1 ]**
SGML is independent of computer type (it is "platform independent"). It
is an international standard (ISO 8879), not a privately-owned spec for
electronic documents. It has high archival value. It is "extensible,"
meaning that it allows and can accomodate change and growth. Unlike HTML,
it allows users to specify the complex hierarchies associated with database
schema or "object-oriented" work.
SGML and XML derive "from a philosophy that data belongs to its creators
and that content providers are best served by a data format that does not
bind them to particular script languages, authoring tools, and delivery
engines but provides a standardized, vendor-independent, level playing
field upon which different authoring and delivery tools may freely compete."
**[ REF 2 ]**
SGML and XML make possible "the ability to capture and transmit semantic
and structural data."
For example, "An installation sheet that carries warnings in multiple
languages can be made to show just the ones in the language selected by
the user." Or, a document which is displayed sorted by last name, can
instantly be displayed sorted by first name.
"A document containing many annotations can be switched from a mode that
shows only the text, to a mode that shows only the annotations, to a mode
that shows both, just by making a menu selection." (This is particularly
useful in humanities research.)
**[ REF 2 ]**
"XML provides a standard way for information providers to add custom markup
to information-rich documents, so that complex documents can be rendered
(and published) in a dynamic way."
"XML provides the means to publish and receive any information, regardless
of format or origin, in any way desired."
**[ REF 4 ]**
EXAMPLES
[The following is reprinted from Chapter 2]
A powerful example of SGML's capability can be found in IATH's "William
Blake Archive," which features SGML-encodings of the writings and pictorial
works of English poet and painter William Blake (1757-1827). The SGML markup
allows users to create advanced database queries of the Archive's contents
(both written works and pictorial works), with the results returned quickly
to the user. Advanced database capability is another benefit of SGML's
content markup tags.
IATH's "Rossetti Archive" features SGML-encodings of the writings and
paintings of 19th-century poet and painter D.G. Rossetti. (Pictorial works
are marked up with tags which specify medium, dimensions, frame, etc.). The
SGML markup includes full scholarly annotations and notes. The Archive's web
site even features a simple "virtual reality" model of Rossetti's studio,
allowing students to "walk around" inside the studio (assuming, of course,
that a VRML plug-in has been added to the user's web browser). The Rossetti
Archive's web page features a search function, which relies on the Archive's
SGML markup.
IATH's "The World of Dante" web site features an advanced structured
search capability for Dante's Inferno. The site also features a
virtual reality (VRML) wire-frame model of Dante's Inferno, with
colored triangles representing persons, creatures, etc. The user chooses
which types of inhabitants are to be displayed, even specifying subclasses
(e.g., mythical persons, historical persons, etc.). The SGML markup allows
these choices to be located quickly in the text, and the appropriate
inhabitants are then injected into the virtual reality model. The user can
then "fly around" inside the model. This allows students to get a visual
feel for how the inhabitants being studied are situated and arranged in
Dante's work.
(The above sites will be examined in detail, in Chapter 4.)
HOW XML BROWSERS WORK
XML's specs allow users and developers to create customized
content-labeling tags suited to a particular type of electronic document
(poetry, chemistry, etc.)
An XML-spec document is viewed with an XML-capable Web browser.
The browser uses an "XML parser" program to locate the document's
markup tags,
which then allows the browser to establish how the content tags are nested
within one another, which then allows the browser to establish a hierarchical
"tree" of the document's contents.
Once the document's contents have been parsed in this way, the proper
styling/layout of the elements is determined by the "style sheet" associated
with the document. E.g., all <poemtitle> text should be centered
and in large print.
The browser then displays the XML document's contents, in the format
specified by the
style sheet (or style sheets).
SOME MARKUP LANGUAGES
HTML is the most well-known markup language.
MathML (or MML, the Mathematics Markup Language) is an XML application
which is being improved and fine-tuned. This is also true for CML (Chemical
Markup Language, another XML-based markup language). MathML and CML browsers
are currently in their infancy; e.g., JUMBO is a pilot XML browser geared
toward CML, but it is said to contain many bugs.
(SGML/XML software is still in its infancy, since the specs for XML,
Style Sheets, etc., are so new. For instance, XML capability in Netscape's
Communicator 5.0 is being stalled because XSL's spec is not yet finalized.
Standardization of specs and implementations is expected by the end of this
year, or early next year.)
Other markup languages include JSML (Java Speech Markup Language) which
was developed at Sun Microsystems, and STML (Spoken Text Markup Language)
which was developed at Bell Labs.
SGML is not only useful for Web-based presentation. As mentioned earlier,
SGML's specs are being used to facilitate information storage and exchange.
For example, a Medical Markup Language is being developed to exchange
medical records quickly, efficiently, and -- most importantly -- in a
platform-independent (computer-independent) manner. SGML's content tags are
particularly useful here, since tags such as <allergy> neatly tag
critical medical information.
SGML AND XML SOFTWARE
Microsoft's Internet Explorer 4.0 and 5.0 have basic XML capability,
Netscape's Communicator 5.0 (due later this year) will have XML capability,
Word Perfect 8 includes extensive SGML software, and many major companies
are currently developing SGML/XML products.
Some of IATH's humanities SGML software is listed in Chapter 4.
Chapter 5 will list important URLs, including URLs for SGML/XML software
and resources.
BACK TO: Project #1, Table of Contents
Current page was last modified/altered on: 8-31-98