Introduction to XML DTD. XML Schema Languages

Document Type Definition (DTD) declares valid building blocks XML document. It specifies the document structure with a list of valid elements and attributes.

DTD can be declared both in the code of the XML document itself and in external file connecting it to an XML document.

Internal DTD declaration

If the DTD is declared internally XML file and then it must be enclosed in a special declaration tag DOCTYPE, which has the following syntax:

Example XML document with internal DTD:

]> Tove Jani Reminder Don't forget about me this weekend

The DTD in the example above is interpreted as follows:

!DOCTYPE note specifies that the root element of the document is note
!ELEMENT note determines that the element note contains four elements: to, from, heading, body
!ELEMENT to determines that the element to must be
!ELEMENT from determines that the element from must be of type "#PCDATA"
!ELEMENT heading determines that the element heading must be of type "#PCDATA"
!ELEMENT body determines that the element body must be of type "#PCDATA"

External DTD Declaration

If the DTD is declared in an external file, then the connection is made as follows:

Below is the same XML document as before, but with an external DTD declaration:

Tove Jani Reminder Don't forget about me on the weekend

And here is what is contained in the "note.dtd" file, which declares the DTD:

What are DTDs used for?

With a DTD, each of your XML files can carry its own format description.

With DTD, different, unrelated groups of people can agree on standards for the data they exchange.

With DTD you can be sure that what you get from external sources the data will be correct.

You can also use DTDs to perform validation checks on your own data.

A DTD is a set of syntax rules against which the structure of an XML document is verified. A DTD explicitly defines the structure of an XML document, specifies elements and their attributes, and provides other information that applies to all XML documents generated from the DTD.

Please note that the presence of a DTD is not mandatory. If a DTD exists, the XML system uses it to interpret the XML document. If there is no DTD, the XML system is expected to interpret the document according to its own rules. However, it is still recommended to create a DTD for XML documents, as this makes them easier to interpret and check the structure.

The DTD can be included directly in the XML document, referenced by URL, or a combination of the two. When you include a DTD directly in an XML document, the DTD definition appears immediately after the prologue:

The root_element_name attribute matches the name of the root element in the tags containing the entire XML document. The "other declarations" section contains definitions of elements, attributes, etc.

You might prefer to place the DTD in separate file to provide a modular program structure. Let's see what a link to an external DTD looks like in an XML document. The problem can be solved with one simple command:

As with the internal DTD declaration, the root_element_name must match the name of the root element in the tags containing the entire XML document. The SYSTEM attribute indicates that some_dtd.dtd is located on local server. However, the some_dtd.dtd file can also be referenced by its absolute URL. Finally, the URL of the external DTD, located on a local or remote server, is specified in quotes.

So how do you create a DTD for Listing 14.1? First, we're going to create a reference to the external DTD in the XML document. As mentioned in the previous section, a DTD reference looks like this:

Returning to Listing 14.1, we see that cookbook is the name of the root element, and cookbook.dtd is the name of the DTD file. The contents of the DTD are shown in Listing 14.2, and below are detailed descriptions all lines.

Listing 14.2. DTD for listing 14.1(cookbook.dtd)

] >

What does this mysterious document mean? Despite its apparent complexity, it is actually quite simple. Let's loop through the entire contents of Listing 14.2:

Before us is the XML prologue, which was already mentioned above.

The third line describes the XML element, in in this case- root element cookbook. It is followed by the word recipe, enclosed in parentheses. This means that cookbook tags contain a subtag called recipe. The + sign indicates that the parent cookbook tags contain one or more pairs of recipe tags.

The fourth line describes the recipe tag. It states that the recipe tag has four subtags: title, description, ingredients, and process. Since tag names are not followed by repetition indicators (see the next section), recipe tags must contain exactly one pair of each of the listed tags.

Here is the first definition of a tag that does not contain nested tags. By definition, it contains #PCDATA, which is arbitrary character data that is not considered part of the markup.

By definition, the ingredients element contains one or more tags named ingredient. Refer to Listing 14.1 and you'll understand.

Since the ingredient element corresponds to a single ingredient, it makes sense that the element contains simple character data.

The process element contains one or more instances of the step element.

The step element, like the ingredient element, corresponds to an individual item in a list of more high level. Therefore, it must contain character data.

Notice that the recipe element in Listing 14.1 contains an attribute. This attribute, category, specifies the general category to which the recipe belongs - in the example given, the category "Italian" (Italian). The ATTLIST definition specifies both the element name and the attribute name. In addition, assigning each recipe to a specific category simplifies the classification, so the attribute is declared required (#REQUIRED).

The last line simply completes the DTD definition. The definition must always be properly completed, otherwise an error will occur.

To conclude this section, I will provide a summary of the main components of a typical DTD file:

element type declarations;
attribute declarations;
ID, IDREF and IDREFS;
entity declarations.

We've already seen some of these components in Listing 14.2. Each component will be described in more detail below.

Element declarations

All elements used in an XML document must be defined in the DTD that accompanies the document. We've already seen two common kinds of definitions: for an element containing other elements, and for an element containing character data. This definition indicates that the element contains only character data:

The following definition of the process element says that it contains exactly one nested element called step:

However, processes with one step are quite rare - most likely, there will be several steps. To indicate that an element contains one or more instances of a nested step element, use the repetition flag:

The number of nested elements can be set in several ways. Full list element operators are given in table. 14.1.

Table 14.1. Element Operators

If an element will contain several subelements, they should be listed separated by commas in the definition of the parent element:

Since repetition features are not specified, each tag must appear exactly once.

The element definition is refined using logical operators. Let's say you're working with recipes that always include pasta with one or more types of cheese or meat. In this case, the ingredient element is defined as follows:

Since the pasta element must be present in the ingredient element, it is indicated with the repetition sign +. This is followed by either a cheese element or a meat element; we separate the alternatives with a vertical bar and enclose them in parentheses with a + sign, since the recipe always includes one or the other.

There are other types of element definitions. We have considered only the simplest cases. However, the material provided is sufficient to understand the examples given in the rest of this chapter.

Attribute Declarations

Element attributes describe the values associated with elements. XML elements like HTML elements, may have zero, one, or more attributes. The general syntax for declaring attributes is as follows:

ElementName specifies the name of the element to include in the tag. The attributes associated with the element are then listed. Each attribute declaration consists of three main components: a name, a data type, and a flag that defines the characteristics of that attribute. Declarations of other attributes may be placed instead of the ellipsis(...).

We already saw a simple attribute declaration in Listing 14.2:

However, as can be seen from the above general definition, simultaneous declaration of several attributes is allowed. Let's say that in addition to the category attribute, you want to associate an additional difficulty attribute with the recipe element. Both attributes are declared in the same list:

It is not necessary to format your ads this way; however, multi-line declarations are clearer than single-line declarations. Additionally, since both attributes are required, the reci re tag cannot be limited to just one attribute; it must include both attributes at once. For example, the following tag would be considered invalid:

Why? Because it is missing the category attribute. A valid tag must contain both attributes:

Special conditions for attribute processing are described by three flags listed in Table. 14.2.

Table 14.2. Attribute flags

Attribute Types

An element attribute can be declared with a specific type. The attribute types are described below.

CDATA attributes

Very often attributes contain general character data. These attributes are called CDATA attributes. The following example was already encountered at the beginning of this section:

ID, IDREF, and IDREFS Attributes

The idea of unambiguously representing data (for example, information about a user or product stored in a database) through identifiers has been encountered several times in previous chapters of the book. Identifiers are also often used in XML because cross-referencing between documents is not limited to common tasks data processing, but also in World Wide Web(hyperlinks).

Element identifiers are assigned to the ID attribute. Let's say you want to associate a unique ID with each recipe. The corresponding DTD fragment might look like this:

The declaration of the recipe element in the document might then look like this:
Spaghetti alla Carbonara
The recipe is uniquely identified by the identifier ital003. Note that the redpe-id attribute is of type ID, so ital003 cannot be used as the value of the recipe-id attribute of another element, otherwise the document will be considered syntactically invalid. Now let's say that you later want to reference this recipe from another document - say, from a user's list of favorite recipes. This is where cross-references and the IDREF attribute come into play. The IDREF attribute is assigned an identifier that is used to reference an element, similar to how a URL is used to identify a page in a hyperlink. Consider the following XML code snippet:

When the XML document is processed, the element is replaced with a more descriptive link to the recipe with the specified identifier (for example, the name of the recipe). It will likely be formatted as a hyperlink to make it easier to navigate to said recipe.

Enumerable attributes

When declaring an attribute, you can list all the valid values accepted by the attribute. In our example, this would be convenient because you can immediately define a list of valid categories. The above declaration is written as follows:

Note that when using valid value lists, you do not need to include the CDATA type in the declaration because all values listed are in CDATA format.

Enumerated attributes with default value

Sometimes it is convenient to declare a default value for an attribute. Chances are you've done this before when building forms with drop-down lists. For example, if most of the recipes in your cookbook are Italian, the recipe attribute will often be categorized as Italian. In this case, the Italian category can be assigned as the default:

If the category attribute is not explicitly specified, it defaults to Italian.

ENTITY and ENTITIES attributes

Data in XML documents is not always text - the document can also contain binary information (for example, graphics). Such data can be referenced using the entity attribute. For example, in the description of the description element you can specify the recipePicture attribute with a graphic image:

You can also declare several entities at once by replacing ENTITY with ENTITIES. Values are separated by spaces.

NMTOKEN and NMTOKENS attributes

NMTOKEN attributes are strings of characters included in a limited set. Declaring an attribute of type NMTOKEN assumes that the attribute value matches established restrictions. Typically, the value of the NMTOKEN attribute consists of one word:

You can declare multiple attributes at once by replacing NMTOKEN with NMTOKENS. Values are separated by spaces.

Entity Declarations

An entity declaration is similar to the define command in some programming languages, including PHP. Entity references were briefly mentioned in the previous section, "Introducing XML Syntax." Just in case, let me remind you that an entity link is used as a replacement for another piece of content. When an XML document is processed, all occurrences of an entity are replaced by the content that it represents. There are two types of entities: internal and external.

Internal Entities

Internal entities are like string variables that associate a name with a piece of text. For example, if you wanted to define a name for a link to copyright information, you could declare an entity like this:

During document processing, all instances of &Copyright are replaced with the text “Copyright 2000 YourCompanyName. All Rights Reserved." All XML code in the replacement text is treated as if it were present in the original document.

Internal entities are useful in situations where you plan to use the entity in a relatively small way. large quantities XML documents. If you have a large number of documents, it is better to use external entities.

External Entities

External entities are used to reference content located in another file. Entities of this type may contain text information, but can also refer to binary data (for example, graphics). Returning to the previous example, let's say you decide to save your copyright information in a separate file to make it easier to edit in the future. The link to the created file looks like this:

When the XML document is subsequently processed, all &Copyright references are replaced with the contents of the copyright.xml document. All XML code in the replacement text is treated as if it were present in the original document.

External entities are also useful for referencing graphic images. For example, if you want to include a graphical logo in an XML document, create an external entity:

XML Resources

Although the above material is quite enough for understanding basic structure XML documents, this description is not complete. Below are links to Internet resources containing more detailed information:

The rest of the chapter shows you how to use PHP to process XML documents. At first glance, the task seems very difficult (lexical analysis of any documents of any type causes a lot of difficulties).

But once you get acquainted with the basic strategy for working with XML in PHP, everything turns out to be surprisingly simple.

Document schema description

A DTD describes the document layout for a particular markup language through a set of declarations (parameter objects, elements, and attributes) that describe its class (or type) in terms of the syntactic constraints of that document. A DTD can also declare constructs that are always necessary to define the structure of a document, but can, however, affect the interpretation of certain documents.

Declaring parameter objects

A parameter object declaration defines a macro of a particular type that can be referenced and expanded somewhere in the DTD. These macros may not appear in the document itself, but only in the DTD. If a parameter object is referenced by its DTD name, it is expanded into a string that specifies the contents of that object.

The fontstyle parameter object contains a group of tags TT | I | B | BIG | SMALL.

"#PCDATA | %fontstyle; | %phrase; | %special; | %formctrl;">

The inline parameter object contains text data and four more parameter objects fontstyle , phrase , special and formctrl .

Declaration of elements

Element declarations provide a list of allowed element names in a document, and also specify information regarding tags (whether they are required) and the content model for each element.

Various keywords and the characters define the contents of the element:

EMPTY - empty content

ANY - any content

, - indicates the order

| - separation of alternatives

() - grouping

* - any number of elements (zero or more)

+ - at least one element (one or more)

? - optional presence of an element (zero or one)

If not *, + or ? - there must be only one element

(DT|DD) +>

A DL element must contain one or more DT or DD elements in any order.

(%block;|SCRIPT) + -(FORM) >

A FORM element must contain one or more elements with a block parameter object or SCRIPT elements in any order, but it is not possible to contain another FORM element.

Defining Attributes

Each element in a DTD document can have a list of attributes associated with it. To do this, use the!ATTLIST directive, which specifies the name of the element to which the list of attributes can be mapped and the parameters of each attribute: its name, type and default properties.

For example:

name CDATA #REQUIRED>

This example defines name attribute for the MAP element. It is mandatory.

There are these types of attributes:

CDATA (Character set of data) - the attribute value can be any character data

ID - the attribute value must be a unique identifier of the element

IDREF - the value of the element is a reference to the element by its ID

IDREFS - the same as IDREF, but with the ability to link not by one identifier, but by several

NMTOKEN - the attribute value can be a sequence of characters, somewhat similar to a name (hence the name - name token). This is a string that contains any combination of those characters that are allowed for XML names.

NMTOKENS - the attribute value is a list of values

ENTITY - the value is used to reference an external entity.

ENTITIES - allows you to specify a list of external entities, separated by spaces.

NOTATION - the attribute value can be one of the previously defined notations

NOTATIONS - allows you to specify a list of notations.

Listings and NOTATION-listings

ENUMERATION - specifies a list of possible value alternatives.

There are the following default properties:

IMPLIED - the attribute value is not required to be specified;

REQUIRED - the attribute value must be specified;

FIXED - the value of this attribute is specified as a constant in the DTD and cannot be changed in the document;

some specific value that is used by default.

Association of a document with a specific DTD

To associate a document with a specific DTD, you must specify the Document Type Declaration element at the beginning of the document text.

Depending on the location of the DTD, the Document Type Declaration can be of two types:

Internal DTD subset

A set of DTD declarations is contained within the body of the document itself. For example:

]> ]>

External DTD subset

A set of DTD declarations is located in a separate text file with the extension .dtd In this case, a link to the file can be made through a public identifier and (or) through a system identifier. For example:

"http://www.w3.org/TR/html4/strict.dtd">

Example

An example of a very simple XML DTD describing a list of people:

(person*) > (name, birthdate?, gender?, socialsecuritynumber?) > (#PCDATA) > (#PCDATA) > (#PCDATA) >

Starting from the first line:

Element Contains any number of elements Sign<*>means that 0, 1 or more elements are possible Inside an element .

Element Contains elements , , And . Signmeans the element is optional. Element does not contain, which means that the element must contain element .

Element contains data.

Element contains data.

Element contains data.

Element contains data.

An example XML document using this DTD:

> > > Fred Bloggs > > 27/11/2008> > Male > > 1234567890> > >

See also

Wikimedia Foundation. 2010.

See what "DTD" is in other dictionaries:

DTD- , die in einer ASCII Datei (ASCII) abgelegte Beschreibung der Struktur von Dokumenten, welche alle vom selben Typ sind. Eine DTD wird nach den Regeln der international anerkannten… … Universal-Lexikon

DTD- may stand for: Contents 1 Media 2 Music 3 Sports 4 Technologies 4.1 Computing ... Wikipedia

DTD- steht für: Inhaltsverzeichnis 1 Medien 2 Music 3 Technologien 3.1 Computer 3.1.1 Spiele … Deutsch Wikipedia

Dtd- steht für: Darwin Digital Television, eine australische Fernsehstation Delta Tau Delta, eine US amerikanische Studentenorganization Document Type Definition, siehe Dokumenttypdefinition … Deutsch Wikipedia

DTD- (dē tē dēʹ) n. A set of rules for marking up a document in SGML. * * * …Universalium

DTD- (document type definition) specification written in the Standard Generalized Markup Language and containing information about the format of a particular document (Computers) ... English contemporary dictionary
XML is used to describe such "amateur" tags schemes. They are necessary in order to:
describe what exactly the markup is;

describe exactly what the markup means.

The most well-known schema description languages are the following:
DTD (Document Type Definition) is a document type definition language that was originally used as a language for describing the structure of an SGML document.

XDR (XML Data Reduced is an XML schema dialect developed by Microsoft that was supported in Internet Explorer versions 4 and 5.

XML Schema or simply XSD ( schema definition language XML) is a W3C recommendation since 2001.

Let's take a closer look at the first two of them. A third circuit description language is discussed in laboratory work 11.

DTD scheme
The DTD schema provides sample markup of the document, which indicates availability, order following And location elements and them attributes in the XML document.
In terms of a DTD, the content model of an XML document can be described as follows:
Every element the document can be one of the following types:
Content Syntax Comment
Data Contains only text data
Other elements Contains only child elements
Mixed Contains a combination of text data and child elements
EMPTY Contains nothing
ANY Can contain text data or child elements
Attributes found inside document tags are described separately using the syntax:
In this case, an attribute in a DTD can have one of three types:
Line

Labeled attribute

Attribute with enumeration

In addition to the attribute type, you can also set its modality:
Consider as an example the description of attributes string type for an element describing some message:
If this element contains attributes with transfer, then their description may look, for example, as follows:
Labeled element attributes can be of four types:
Finally, the following sequence occurrence indicators can be used in the DTD:
Symbol Example Description
, (a, b, c) Consecutive use of list elements
| (a | b | c) One of the list members is used
date One and only one element is used
? subject? Optional use (0 or 1 time)
+ paragraph+ Used one or more times
* brother* Used zero or more times
As an example, here is a DTD diagram describing the structure of an electronic mailbox:

This is the next article in the series “XML Basics” and in it we will look at the basics of describing the structure XML data using DTD. It's quite old way descriptions XML structures-documents, but it is still in use, so we will still consider it.

I also want to point out that this great way show how XML checks document content, grammar, etc. We will look at a newer and more advanced way of describing the structure of XML documents using XML Schema technology in the next article, but for now let’s move on directly to studying the XML DTD.

In this article we will look at several important points. This is what an XML DTD is and what it is needed for, let's talk about the disadvantages of DTD, and also learn how to independently compose your own DTD for validating XML documents. All this, as usual, will be presented step by step, as briefly and clearly as possible in order to save your time.

So let's begin.

What is a DTD in XML and why is it needed?

In short, a DTD in XML is used to check the grammar of a document and its conformance to a standard (one that the developer or yourself came up with). This allows the parser (processor) to determine at the processing stage whether the document meets our requirements. That is, the XML document is validated.

The need to check the grammar of XML documents is as follows:

The XML document may not be intended for your system.

The XML document may contain incorrect data.

The XML document may contain errors in the structure ().

So, we figured out what an XML DTD is and why it is needed. Now let's briefly look at the disadvantages of DTDs, and then move on to the process of creating DTD files for validating XML documents.

Disadvantages of XML DTD

Different from XML syntax language. This causes many problems, such as encoding problems or the inability to track errors.

No data type checking. There is only one type in DTD - string.

There is no DTD. You cannot match two or more DTD descriptions to a document.

It was short list DTD shortcomings, which were successfully corrected in XML schemas, which we will talk about in the following articles.

Declaring elements, attributes and entities in a DTD. Modifiers “*”, “?”, “+”

Special declarations and modifiers are used to declare elements, attributes, and entities in a DTD. To understand everything in detail, let's first look at the theoretical information, and then in the second part of the article we will move on to practical examples.

Defining an XML Element and a Sequence of XML Elements

The book element contains one title, author, price, and description elements.

Element Alternatives

The pricelist element contains the elements title, price and one of three elements to choose from – author, company or sample.

Empty elements

The none element must be empty.

Attribute Declaration

The pricelist element can contain two attributes - an id attribute and a name attribute. In this case, the id attribute is required, since #REQUIRED is specified, and the name attribute is optional (#IMPLIED is specified). In turn, CDATA indicates to the handler that there is no need to parse the contents of the attributes.

Defining Entities

If the entity “” is encountered, then “Dmitry Denisov” will be automatically substituted instead.

Modifiers (explain repetitions of elements)

* - zero or many.
? – zero or one.
+ - one or many.

The books element can contain one or more book elements.

Now let's look at what this all looks like with more practical examples.

Creating a DTD file for validating an XML document using the example of a book price list

Let us have the same price list of books that we use for examples in almost every article about XML. The XML document itself will look something like this.

Book 1 &myname; Price 1 Description

Of course, the above example is not the ultimate dream, but it will do as an example. As you can see from the example, we have a root element pricelist, which contains nested book elements. Inside the book elements there are title, author, price and possibly description elements, which may contain some text data.

To validate this price list, we can use the following DTD document.

Now let's look at everything in more detail.

— we declare the root element books and indicate in parentheses what it can contain. In this case, it can contain one or more book elements (the plus sign means one or more, see above).

— define the book element. The book element can contain one title element, one or more author elements (plus sign), one price element, and one or no description elements (question mark).

— define the title element. We specify #PCDATA as the content of the element. This means that the parser is required to parse what is inside this element.

Similarly, we define the elements author, price, description.

— we define the essence. First we write the entity itself, and then in quotes what will be displayed in its place. By default, only 3 entities are defined in XML. This is more (">" -<), меньше («<» — >) and ampersand (“&” - &). If you wish, you can create an unlimited number of entities using this method. The meanings can be not only words, but also entire sentences of significant length.

Connecting DTD for validating XML documents

Declarative method

This method is very rarely used, since its essence is to create self-sufficient documents. That is, the document will immediately contain both DTD and XML. The following construct is used to add a DTD to XML.

where instead of DOCUMENT we indicate the root element of the XML document.

For clarity, let's look at an example of a ready-made self-sufficient document with a declarative way to include a DTD.

]>
External DTD definition - connecting a DTD document

The essence this method is to connect a DTD file to an XML document using the following construct.

where DOCUMENT – we indicate the root element of the XML document.
file.dtd – link to the DTD file.

For clarity, consider the following example.

XML document

This concludes this article. We looked at all the main points when working with XML DTD and, I hope, I was able to explain everything clearly. If you don't want to miss other XML and XSLT tutorials, I recommend subscribing to the newsletter using the form below.

That's all. Good luck and success in learning XML!

Content	Syntax	Comment
Data		Contains only text data
Other elements		Contains only child elements
Mixed		Contains a combination of text data and child elements
EMPTY		Contains nothing
ANY		Can contain text data or child elements

Symbol	Example	Description
,	(a, b, c)	Consecutive use of list elements
\|	(a \| b \| c)	One of the list members is used
	date	One and only one element is used
?	subject?	Optional use (0 or 1 time)
+	paragraph+	Used one or more times
*	brother*	Used zero or more times