Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 [ 66 ] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

<!DDCTYPE PcSpecs [

<!ELEMEHT PCS CPC*)>

<!ЕЬЕНЕЫТ PC (MODEL, PRICE, PROCESSOR, RAH, DISK+)>

<!ЕЬЕМЕЫТ MODEL (\#PCDATA)>

<!ELEMENT PRICE (\#PCDATA)>

<!ELEMEHT PROCESSOR (MANF, MODEL, SPEED)>

<!ELEMENT MANF (\#PCDATA)>

<!ELEMENT MODEL (\#PCDATA)>

<!ELEMENT SPEED <\#PCDATA)>

<!ELEMENT RAM C\#PCDATA)>

<!ELEMENT DISK (HARDDISK 1 CD DVD)>

<!ELEMENT HARDDISK (MAKE, MODEL, SIZE)

<!ELEMENT SIZE (\#PCDATA)>

<!ELEMENT CD (SPEED)>

<!ELEMENT DVD (SPEED)>

]>

Figure 5J4: A DTD for personal computers

The name of the DTD is PcSpecs. The first element, which is like the start sjTnbol of a CFG, is PCS (list of PC specifications). Its definition, PC*, says that a PCS is ?;ero or more PC entries.

We then see the definition of a PC element. It consists of the concatenation of five things. The first four are other elements, corresponding to the model, price, processor type, and RAM of the PC. Each of these must appear once, in that order, since the comma represents concatenation. The last constituent, DISK+, tells us that there will be one or more disk entries for a PC,

Many of the constituents are simply text; MODEL, PRICE, and RAM are of this type. However, PROCESSOR has more structure. We see from its definition that it consists of a manufacturer, model, and speed, in that order; each of those elements is simple text.

A DISK entry is the most complex. First, a disk is either a hard disk, CD, or DVD, as indicated by the rule for element DISK, which is the OR of three other elements. Hard disks, in turn, have a structure in which the manufacturer, model, and size are specified, wliUe CDs and DVDs are represented only by their speed.

Figure 5.15 is an example of an XML document that conforms to the DTD of Fig. 5.14. Notice that each element is represented in the document by a tag with the name of that element and a matching tag at the end, with an extra slash, just as in HTML, Thus, in Fig. 5.15 we see at the outermost level the tag <PCS>.. ,</PCS>. Inside these tags appears a list of entries, one for each PC sold by this manufacturer; we have only shown one such entry explicitly.

Within the illustrated <PC> entry, we can easily see that the model number



<PCS> <PC>

<MODEL>4560</MODEL>

<PRICE>$2295</PRICE>

<PROCESSOR>

<MANF>Intel</HANF>

<HODEL>Pentium</HDDEL>

<SPEED>800MHz</SPEED> </PROCESSDR> <RAH>256</RAM> <DISK><HARDDISK>

<MAWF>Haxtor</MANF>

<MODEL>Diamond</MODEL>

<SIZE>30.5Gb</SIZE> </HARDDISK></DISK> <DISK><CD>

<SPEED>32x</SPEED> </CD></DISK> </PC> <PC>

</PC> </PCS>

Figure 5.15: Part of a document obeying the structure of the DTD in Fig. 5.14

is 4560, the price is $2295, and it has an SOOMHz Intel P(;ntium processor. It has 256Mb of RAM, a 30.5Gb Maxtor Diamond hard disk, and a 32x CD-ROM reader. What is important is not tliat we can read these; facts, but that a program could read the document, and guided by the grammar in the DTD of Fig. 5.14 that it has also read, could interpret the numbers and names in Fig. 5.15 properly. □

You may have noticed that the rules for the elements in DTDs like Fig. 5.14 are not quite Uke productions of context-free grammars. Many of the rules are of the correct form. For instance,

<!ELEMENT PROCESSOR (MAKF, MODEL, SPEED)>

Is analogous to the production

Processor Manf Model Speed

However, the rule



<!ELEMENT DISK (HARDDISK I CD DVD)>

does not have a definition for DISK that is like a production body. In this case, the extension is simple: we may interpret this rule as three productions, with the vertical bar playing the same role as it does in our shorthand for productions having a common head. Thus, this rule is equivalent to the three productions

Disk HardDisk \ Cd \ Dvd

Tiie most difficult case is

<!ELEMENT PC (MODEL, PRICE, PROCESSOR, RAH, DISK+)>

where the body has a closure operator within it. The solution is to replace DISK+ by a. new variable, say Disks, that generates, via a pair of productions, one or more instances of the тапаЫе Disk. The equivalent productions are thns:

Pc Model Price Processor Ram Disks Disks -> Disk \ Disk Disks

There is a general technique for converting a CFG with regular expressions as production bodies to an ordinary CFG. We shall give the idea informally; you may wish to formalize both the meaning of CFGs with regular-expression productions and a proof that the extension yields no new languages beyond the CFLs. We show, inductively, how to convert a production with a regular-expression body to a collection of equivalent ordinary productions. The induction is on the size of the expression in the body.

BASIS: If the body is the concatenation of elements, then the production is already in the legal form for CFGs, so we do nothing.

INDUCTION: Otherwise, there are five cases, depending on the final operator

used,

1. The production is of the form Л -+ , £2, where Ei and E2 are expressions pernntted in the DTD language. This is the concatenation case. Introduce two new variables, В and C, that appear nowhere else in the grammar. Replace A-¥ Ei ,E2 by the productions

ABC BEi С E2

The first production, A ВС, is legal for CFGs, The last two may or may not be legal. However, their bodies are shorter than the body of the original production, so we may inductively convert them to CFG form.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 [ 66 ] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175