CSci 220 - Lecture 27
Web Page Design
© Morris Firebaugh
The Emergence of XML
A. Meta-languages & XML
What is the relationship between SGML, HTML, and XML?
- HTML = HyperText Markup Language = Tag-based WWW Page Description Language
- SGML = Standard Generalized Markup Language --> Meta-language
- XML = eXtensible Markup Language --> Meta-language
All right, then, what is a meta-language?
- A meta-language is a language for describing (or writing) other languages.
- So, XML is a language for writing other languages.
Visions inspiring XML:
- Better formatting of Web pages
- Better searching of the Web
- Better data exchange for electronic commerce
- Global information reuse and exchange across the Web
The grand vision of XML is the creation of a worldwide collection of data objects that are fully addressable and fully open to borrowing, reuse, and repackaging by anybody on the net--in short, everything that the copyright laws have strived for centuries to prevent.
XML is a standard for the creation of tagging languages. It sets out a collection of rules that govern how a parser is to behave. An XML parser that follows these rules can parse any document tagged in an XML compatible language. This means you can make up your own language and not have to write any code to parse it. You can concentrate on writing code that processes the information in useful ways. . . .
The advantages of XML is that it allows you to define your own data structures. When you use any particular language defined in XML, you are no longer enjoying the advantages of XML, you are enjoying the advantages of the particular markup language you have chosen.
What XML is:
- XML is a meta-language
- XML is a standard for defining tagging languages
- XML is an extremely simple technology
What XML is NOT:
- XML is not "HTML on steroids"
- XML is not "SGML light," but it should be a subset of SGML
A very good introduction to XML has been written by the W3 Consortium with commentary by Tim Bray
B. Design Goals of XML
XML was developed by an SGML Editorial Review Board formed under the auspices of the World Wide Web Consortium (W3C) in 1996 and chaired by Jon Bosak of Sun Microsystems, with the very active participation of an SGML Working Group also organized by the W3C. Dan Connolly served as the ERB's contact with the W3C.
The design goals for XML are:
"XML is primarily intended to meet the requirements of large-scale Web content providers for industry-specific markup, vendor-neutral data exchange, media-independent publishing, one-on-one marketing, workflow management in collaborative authoring environments, and the processing of Web documents by intelligent clients. It is also expected to find use in certain metadata applications. XML is fully internationalized for both European and Asian languages, with all conforming processors required to support the Unicode character set in both its UTF-8 and UTF-16 encodings. The language is designed for the quickest possible client-side processing consistent with its primary purpose as an electronic publishing and data interchange format." [971208 W3C press release]"XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form the character data in the document, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure. A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application." [adapted from the Proposal]
1. XML shall be straightforwardly usable over the Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML documents.
5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6. XML documents should be human-legible and reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance.
C. The Syntax of XML
XML supports two levels syntax:
- Well-formed XML documents [lower level]
- Valid XML documents specified by Document Type Definitions (DTD) [higher level]
An XML document is a systematic set of containers called elements
- Containers can contain other containers as well as content
- Three structural components include:
- prolog [optional in well-formed documents, required in valid documents]
- root element
- epilog [optional in both types]
General Syntax for well-formed documents
- Every tag must have a corresponding end tag
- Empty tags (e.g., <IMG ...) must terminate with />
- All attributes must be enclosed in quotes, either (' ') or (" ")
- Tags are case-sensitive
Purpose of Document Type Definitions (DTDs)
- Define tags to be used in XML document
- Describe allowed structural relationships between tags, e.g. <page> must appear inside <book>
- Specify sequence, if any, in which tags must appear, e.g., <preface> before <chapter>
- List properties (attributes) allowed or required for tags
- Specify everything else required for the markup language grammar
Advantages of Document Type Definitions (DTDs)
- A DTD allows an XML parser (interpreter) to validate your document
- A DTD helps a human reader to quickly learn the structure of a particular document
- DTDs allow definition of entities [equivalent to variable definition]
<!ENTITY me "Dmitry Kirsanov, St.Perersburg, Russia">
This document was created by &me; on April 21, 1999
- DTDs may also be "cascaded" as in CSS
<!DOCTYPE HTML SYSTEM "http://www.foo.com/myfiles/html3x.dtd"
<!- - your DTD goes here - ->
- See, local DTD definition "win" over more general ones defined in html3x.dtd
D. Examples of XML
Example 1: A Well-formed XML Document
<ARTICLE TYPE ="INDEFINITE">a</ARTICLE>
Example 2: A DTD Document
The Root Document Element Definition from play.dtd
<!ELEMENT play (title, fm, personae, scndescr, playsubt, induct?,prologue?,act+,epilogue?)>
Note: the "?" means "optional item", and the "+" means multiple occurences
The play.dtd also shows the drelationship between tags:
<!ELEMENT speech (speaker+, (line|stagedir|subhead)+)>
<!ELEMENT speaker (#PCDATA)>
<!ELEMENT line (stagedir |#PCDATA)+)>
<!ELEMENT stagedir (#PCDATA)>
<!ELEMENT subhead (#PCDATA)>
This is a SPEECH element using the DTD definitions above:
<LINE><STAGEDIR>Aside</STAGEDIR> The Duke of Milan </LINE>
<LINE>And his more braver daughter could control thee, </LINE>
<LINE>If now 'twer fir to do't. At the first sight</LINE>
<LINE>They have changed eyes. Delicate Ariel, </LINE>
<LINE>I'll set thee free for this. </LINE>
<STAGEDIR> To FERDINAND </STAGEDIR>
<LINE>A word, good sir;</LINE>
<LINE>I fear you have done yourself some wrong; a word. </LINE>
Another excellent resource on XML Files is the XML Magazine
E. Linking Capabilities of XML
One of the greatest strengths of XML is its more general and abstract Linking Capabilities
- A link may be assigned parameters including:
- Strings describing the role of the link, e.g. "display in status line"
- Parameter telling whether to replace current document with linked document
- Parameter for automatic or user-activated link
- A link can specify search strings or substrings for another XML document
- Syntax for extended locators, e.g., "everything from the third paragraph to the end of the chapter"
Prospects for XML
- New XML Tools and Browser Capabilities are emerging
- XML Provides more power through abstraction
- This is particularly important for the integration of on-line databases
- This, in turn, is particularly important for eCommerce
Problems of XML
- Still "bleeding edge" technology
- Many browsers still do not support XML - Microsoft IE 5.0 Does
- Multimedia support is minimal and requires customizing
- Tremendous amount of HTML in place
- Capabilities of "conventional" HTML continues to expand [cf. the immanent death of FORTRAN]