SGML document introducing you to SGML |
Please scroll down to read the "Document Instance" section of this page, first.
To see what the document instance looks like when passed through a simple reformatting process, which translates the marked up SGML text into a normal web page, click here.
To read another brief essay introducing the idea of SGML and why it is important: ...*Darwin Among the Machines* (Susanne Langer and SGML), click here.
In addition to the billions of HTML pages on the World Wide Web, there are also some native SGML pages on the web (e.g., American Civil War period literary archives at The University of North Carolina). --Up until about August, 1998, SoftQuad Corporation used to freely distribute trial versions of a Netscape SGML viewer plugin, which enabled anyone to view native SGML Internet pages in a Netscape web browser (read the historic document in which Yuri Rubinsky announced this software!).
Then SoftQuad sold its SGML product line to Interleaf, and I don't know if there is now any way to get free software to view native SGML Internet pages. If you can find a copy of Yuri Rubinsky and Murray Maloney's book: SGML on the WEB: Small steps beyond HTML (Prentice Hall, 1997), it includes a compact disk (CD) with Panorama Pro 2.0 software which not only displays existing SGML pages, but enables you to create and format your own. This software is worth far more than the price of the book, but the book seems at present (October, 1999) to be out of print. The actual content of the book: how to build SGML pages for web publication, is also very important, but, unfortunately, it describes a great vision of a future for The Internet, which now never will happen. (For a little more information about SoftQuad's divestiture of its SGML products, click here)
For various reasons, SGML seems to have become (what one the SGML community's own newsletters has called:) "a dead language, like Latin". But SGML has been reborn as XML (eXtended Markup Language), with an enthusiasm in the computer industry that SGML itself never achieved. To read some thoughts about XML, including a polemic against XML, and second thoughts I had after attending the GCA XML 98 conference (14-20 Nov 98, Chicago), click here. (To examine some XML pages I have developed, and which can be viewed with Microsoft Internet Explorer 5 or newer, click here.)
Best wishes! Thank you.
<!-- Note: this line is an SGML comment -->
<!-- Note that all SGML markup is human-readable: no unprintables -->
<!DOCTYPE heuristic [
<!ELEMENT heuristic - - (purpose, inote?, realization, docinfo)>
<!-- The "top level" element for a -->
<!-- document structured according -->
<!-- to this DTD is: "heuristic". -->
<!-- The "heuristic" element must -->
<!-- contain exactly one "purpose" -->
<!-- element, optionally followed -->
<!-- by an "inote", obligatorily -->
<!-- followed by one "realization" -->
<!-- element, followed by one -->
<!-- "docinfo" element, and -->
<!-- nothing else! -->
<!ATTLIST heuristic
version CDATA #FIXED "1.52"
security (unrestricted | internal.use | confidential |
need.to.know) #REQUIRED >
<!ELEMENT purpose - - (#PCDATA) >
<!-- "#PCDATA" means: parsable -->
<!-- character data, i.e., text -->
<!-- in which the "<" character -->
<!-- is treated as a start-of- -->
<!-- tag character, etc. -->
<!ELEMENT inote - - (#PCDATA) >
<!ELEMENT realization - - (point+) > <!-- a "realization" -->
<!-- consists of 1 or more "points"-->
<!ELEMENT point - - (title, text+) > <!-- a "point" -->
<!-- must contain one "title", -->
<!-- followed by 1 or more "text" -->
<!-- elements -->
<!ELEMENT title - - (#PCDATA) > <!-- a "title" element -->
<!-- can contain only characters, -->
<!-- with no further embedded -->
<!-- elements -->
<!ELEMENT text - - (#PCDATA) +(quote | institution | ruler |
list | product | link | country) >
<!-- a text element can contain -->
<!-- zero or any number of quote, -->
<!-- institution, ruler, list... -->
<!-- elements, as well as -->
<!-- character text.... -->
<!ELEMENT docinfo - - (url, author, auth.eaddr, rev.date) >
<!ELEMENT url - - (server, account, page.id) >
<!ELEMENT server - - (#PCDATA) >
<!ELEMENT account - - (#PCDATA) >
<!ELEMENT page.id - - (#PCDATA) >
<!ELEMENT author - - (given.name,sur.name,credential*) >
<!ELEMENT given.name - - (#PCDATA) >
<!ELEMENT sur.name - - (#PCDATA) >
<!ELEMENT credential - - (#PCDATA) >
<!ELEMENT auth.eaddr - - (#PCDATA) >
<!ELEMENT rev.date - - (rev.yr, rev.mo, rev.da) >
<!ELEMENT rev.yr - - (#PCDATA) >
<!ELEMENT rev.mo - - (#PCDATA) >
<!ELEMENT rev.da - - (#PCDATA) >
<!ELEMENT quote - - (#PCDATA) +(quote | institution |
product | ruler | country) >
<!ATTLIST quote
type (copied.text | hearsay | memory | ipse.dixit |
idiom | fictive | text.as.object |
other) #REQUIRED
source CDATA #IMPLIED >
<!ELEMENT link - - (#PCDATA) >
<!ATTLIST link
tgt CDATA #REQUIRED >
<!ELEMENT institution - - (#PCDATA) >
<!ATTLIST institution
type (corp | school | other) #REQUIRED >
<!ELEMENT product - - (#PCDATA) >
<!ELEMENT ruler - - (#PCDATA) >
<!ATTLIST ruler
type (king | prince | queen | president |
prime.minister | other) #REQUIRED >
<!ELEMENT country - - (#PCDATA) >
<!ELEMENT list - - (item+) >
<!ELEMENT item - - (#PCDATA) +(quote | institution |
product | ruler | list) >
<!ELEMENT i - - (#PCDATA) >
<!ENTITY % THINK "include" >
<!ENTITY % TAGS "ignore" >
<!ENTITY % NOTAGS "include" >
<!ENTITY lt "<" >
<?STYLESPEC "Style1" "heuristic.ssh" >
]>
<!-- Note: this line is an SGML comment --> <heuristic security="unrestricted"> <purpose>This page aims to introduce you to <![%THINK;[SGML]]> </purpose> <!-- Note: the slash ('/') is SGML's "end-of-element" --> <!-- indicator. E.g., the /purpose tag ends the --> <!-- purpose element, and the text between the --> <!-- two tags is the content of the purpose element --> <![%NOTAGS[<inote>When this document is formatted for display as an HTML web page, with all the SGML structure-describing tags reduced to HTML style-descriptors (with, in some cases, the tagging just being discarded and not translated into anything...), it looks nicer, but information is lost.</inote>]]> <realization> <point> <title>What is the idea of SGML?</title> <text>The idea of SGML is to thematize the structure of text, by adding explicit structure descriptors -- markup tags -- to the text. For example, if we wish to communicate something about a product, instead of saying something like: "Netscape's Navigator is not related to Portugal's Henry the Navigator", we can say: <quote type="fictive"><institution type="corp">Netscape</institution>'s <product>Navigator</product> is not related to <country>Portugal</country>'s <ruler type="prince">Henry the Navigator</ruler>.</quote> <![%NOTAGS;[Note that, in the preceding sentence, when the structure tags are reduced to simple HTML page formatting, the tagged up sentence looks pretty much the same as the untagged plain text version.]]></text> <text>Texts, like physical buildings, can either be products of habit and custom (<quote type="idiom">vernacular architecture</quote>), or be products of conscious, self-accountable creativity and critique (<institution type="school">The Bauhaus</institution>, structural engineering, etc.). I hypothesize SGML may portend a <quote type="memory" source="Henry Adams">change of phase</quote> in our relation to language, such as previously was effected by alphabetic writing and uniform printed editions. Among other things, SGML is what <quote type="idiom">diagramming sentences</quote> in high-school English should have been but wasn't: SGML realizes, as effective and proactive social activity, and not merely as social-scientists' dissociated theorizing, a notion of <quote type="idiom">generative grammar</quote>! </text></point> <point> <title>What is the value of adding explicit structure descriptors to text?</title> <text>The rewards of adding explicit structure descriptors to text are many, including: <list> <item>Stimulating reflection on the logical structure of what one is saying, leading to: <list> <item>Better style: Making what one is trying to say clearer to the persons who will read and try to understand it</item> <item>Clearer thinking: Forcing oneself to become clearer what one is trying to say</item> <item>Discovery: Discovering things about both the content and the form of expression which one had not previously thought of, through the process of articulating what one thinks one wants to say</item> </list></item> <item>Facilitating computer programs to process the text (it's much easier for a computer program to find all the references to rulers of countries if they are all labelled something like: <quote type="text.as.object"><ruler type="..."> ... </ruler></quote>, than if their names simply appear -- like the reference to Henry the non-Netscape Navigator in this sentence -- as undistinguished sub-strings of homogenous character strings.</item> </list> </text></point> <point> <title>Once you've "tagged up" your text with these structural descriptors, what can you do with it?</title> <text>Adding structural descriptors to text is the first step of what, in general, is an at least two-step process. Persons do not generally directly read the tagged text<![%TAGS;[ (e.g., what you are here reading now...)]]>. What generally happens is that one also writes a <quote type="idiom">style sheet</quote>, which a computer program reads, along with the tagged text file: The computer program then generates a formatted document, as paper printout, online web pages [<link tgt="heuristic.html">see example</link>], spoken text for the blind, etc.</text> <text>One of the benefits of SGML is that, once you have produced your tagged text document, which, in a sense, has no format (other than being a human- and computer-readable ASCII text file), anyone can generate output in any desired form, by simply writing different style sheets for appropriate processing programs. Example: In a hospital, insurance providers, oversight groups, doctors and nurses, patients themselves, and others perhaps not yet foreseen, may wish to examine patient records in all sorts of different ways. If the hospital maintains its patient records in SGML, then everybody can put the data into the form they want, without the hospital having to do anything except make the raw source data available. If a doctor wishes to see patient information on a palm-top computer, the data can be transformed by a web browser (or browser plug-in, such as <institution type="corp">SoftQuad</institution>'s <product>Panorama Viewer</product>), into display pages, even selecting only the particular kinds of data of interest to the doctor (e.g., the course of symptoms, but not billing information). An insurance provider, on the other hand, can download the billing information over a high-speed data link, and either process it as-is, or run it through a program which loads it into their database management system (DBMS). </text></point> <point> <title>What is an SGML document like, exactly?</title> <text>You have been looking at an SGML document all the time you have been reading this. An SGML document consists of two parts: <list> <item>A document type description (DTD). A DTD defines the pattern for a certain kind of document. It specifies what kind of elements can exist in a document coded according to its logical form, which of those elements can be contained in which other elements, and the order in which different elements must occur. The present document's DTD [<link tgt="#DTD">see above</link>] specifies that quotes can appear in text elements, but not (e.g.) in titles or in the document's meta-document information (the <quote type="text.as.object">docinfo</quote> element). On the other hand, an author element can only occur inside a docinfo element, there must be one and only one author element, and it must immediately follow a url element in the docinfo block....</item> <item>The document instance itself, i.e.: the text, tagged to explicitly articulate its structure according to the specified document type description.</item> </list> Note that a given document type description (DTD) can be used as a template for an indefinite number of document instances (to facilitate this, the DTD is generally kept in a separate computer file, apart from the document instances which use it).</text> <text>Also note that, if, as is generally done, one edits one's document using an SGML-aware text editing program (such as <institution type="corp">ArborText</institution>'s <product>AdeptEditor</product>, the text editor can make sure the document instance conforms to its type, by not permitting entry of elements (tags) except where they are allowed. Conversely, such an editor can aid the writing process -- possibly even helping overcome writers' blocks -- by telling the writer what kinds of items can be entered at any place in the document being edited. </text></point> <point> <title>But isn't SGML a straitjacket, then?</title> <text>No! SGML is not like a straitjacket (or even like dictionaries and grammar books), because you can create your own document type definitions (DTDs), and, as you write your documents according to a DTD you have created, if you find it doesn't let you do what you want the way you want to do it, you can change the DTD to make it more suited to your purposes. A side-effect of this process is that, because every change you make must be explicitly declared in your DTD, you always have an up-to-date record of your conception of your document's structure, without having to make any extra effort to write extrinsic documentation. A well-written SGML document largely documents itself! </text></point> <point> <title>What's the difference between SGML, HTML, and XML?</title> <text>HTML looks like SGML, but HTML tags are mostly style descriptors rather than structure descriptors. Example: the HTML <quote type="text.as.object"><b></quote> tag says: make the following text appear in bold-face. The HTML <quote type="text.as.object"><address></quote> element, on the other hand, is an example of a structure-describing tag: it doesn't prescribe what the text it contains is supposed to look like, but rather it articulates what that content means.</text> <text>XML is a simplified, <quote type="idiom">dumbed-down</quote> variant of SGML. XML tags should generally be structure-describing, but SGML is functionally much richer than XML. Also, XML does not require an explicit document type (document structure) specification (DTD). This makes XML easier to code, but it also provides less motivation and assistance to think about the structure of one's document, since, in an XML document instance, you can make up new (or inconsistent) elements as you go along, e.g., calling a quotation here a: <quote type="text.as.object"><quote></quote>, there a: <quote type="text.as.object"><quotation></quote>, elsewhere a: <quote type="text.as.object"><quot></quote>, etc. </text></point> <point> <title>Conclusion</title> <text>To paraphrase Martin Luther: <quote type="memory">A mighty fortress is our SGML!</quote> </text></point> </realization> <docinfo> <url><server>www.cloud9.net</server><account>bradmcc</account> <page.id>WhatIsSGML.html</page.id></url> <author><given.name>Brad</given.name><sur.name>McCormick</sur.name> <credential>Ed.D.</credential></author> <auth.eaddr>bradmcc@cloud9.net</auth.eaddr> <rev.date><rev.yr>1999</rev.yr><rev.mo>06</rev.mo> <rev.da>01</rev.da></rev.date> </docinfo> </heuristic>
If you are interested in eXtensible Markup Language (XML) -- the new avatar of SGML --, and you are using Microsoft Internet Explorer 5 (or newer) web browser, you can click here to examine some experimental XML pages I am developing. |
Go to sample HTML formatted version of SGML on this page. Go to Panorama Viewer version of this page (requires plug-in!).
|
Go to website Table of Contents. Return to Brad McCormick's home page. Return to site map. |
http://www.cloud9.net/~bradmcc/WhatIsSGML.html page generated by: heuristic.pl, ver: 17 January 2009 (v06.07) Copyright © 1998-2006 Brad McCormick, Ed.D. bradmcc@cloud9.net 01 June 1999 (ver: 1.52) |
| ||||||||