Строительный блокнот Automata methods and madness <!DDCTYPE PcSpecs [ <!ELEMEHT PCS CPC*)> <!ЕЬЕНЕЫТ PC (MODEL, PRICE, PROCESSOR, RAH, DISK+)> <!ЕЬЕМЕЫТ MODEL (\#PCDATA)> <!ELEMENT PRICE (\#PCDATA)> <!ELEMEHT PROCESSOR (MANF, MODEL, SPEED)> <!ELEMENT MANF (\#PCDATA)> <!ELEMENT MODEL (\#PCDATA)> <!ELEMENT SPEED <\#PCDATA)> <!ELEMENT RAM C\#PCDATA)> <!ELEMENT DISK (HARDDISK 1 CD DVD)> <!ELEMENT HARDDISK (MAKE, MODEL, SIZE) <!ELEMENT SIZE (\#PCDATA)> <!ELEMENT CD (SPEED)> <!ELEMENT DVD (SPEED)> ]> Figure 5J4: A DTD for personal computers The name of the DTD is PcSpecs. The first element, which is like the start sjTnbol of a CFG, is PCS (list of PC specifications). Its definition, PC*, says that a PCS is ?;ero or more PC entries. We then see the definition of a PC element. It consists of the concatenation of five things. The first four are other elements, corresponding to the model, price, processor type, and RAM of the PC. Each of these must appear once, in that order, since the comma represents concatenation. The last constituent, DISK+, tells us that there will be one or more disk entries for a PC, Many of the constituents are simply text; MODEL, PRICE, and RAM are of this type. However, PROCESSOR has more structure. We see from its definition that it consists of a manufacturer, model, and speed, in that order; each of those elements is simple text. A DISK entry is the most complex. First, a disk is either a hard disk, CD, or DVD, as indicated by the rule for element DISK, which is the OR of three other elements. Hard disks, in turn, have a structure in which the manufacturer, model, and size are specified, wliUe CDs and DVDs are represented only by their speed. Figure 5.15 is an example of an XML document that conforms to the DTD of Fig. 5.14. Notice that each element is represented in the document by a tag with the name of that element and a matching tag at the end, with an extra slash, just as in HTML, Thus, in Fig. 5.15 we see at the outermost level the tag <PCS>.. ,</PCS>. Inside these tags appears a list of entries, one for each PC sold by this manufacturer; we have only shown one such entry explicitly. Within the illustrated <PC> entry, we can easily see that the model number <PCS> <PC> <MODEL>4560</MODEL> <PRICE>$2295</PRICE> <PROCESSOR> <MANF>Intel</HANF> <HODEL>Pentium</HDDEL> <SPEED>800MHz</SPEED> </PROCESSDR> <RAH>256</RAM> <DISK><HARDDISK> <MAWF>Haxtor</MANF> <MODEL>Diamond</MODEL> <SIZE>30.5Gb</SIZE> </HARDDISK></DISK> <DISK><CD> <SPEED>32x</SPEED> </CD></DISK> </PC> <PC> </PC> </PCS> Figure 5.15: Part of a document obeying the structure of the DTD in Fig. 5.14 is 4560, the price is $2295, and it has an SOOMHz Intel P(;ntium processor. It has 256Mb of RAM, a 30.5Gb Maxtor Diamond hard disk, and a 32x CD-ROM reader. What is important is not tliat we can read these; facts, but that a program could read the document, and guided by the grammar in the DTD of Fig. 5.14 that it has also read, could interpret the numbers and names in Fig. 5.15 properly. □ You may have noticed that the rules for the elements in DTDs like Fig. 5.14 are not quite Uke productions of context-free grammars. Many of the rules are of the correct form. For instance, <!ELEMENT PROCESSOR (MAKF, MODEL, SPEED)> Is analogous to the production Processor Manf Model Speed However, the rule <!ELEMENT DISK (HARDDISK I CD DVD)> does not have a definition for DISK that is like a production body. In this case, the extension is simple: we may interpret this rule as three productions, with the vertical bar playing the same role as it does in our shorthand for productions having a common head. Thus, this rule is equivalent to the three productions Disk HardDisk \ Cd \ Dvd Tiie most difficult case is <!ELEMENT PC (MODEL, PRICE, PROCESSOR, RAH, DISK+)> where the body has a closure operator within it. The solution is to replace DISK+ by a. new variable, say Disks, that generates, via a pair of productions, one or more instances of the тапаЫе Disk. The equivalent productions are thns: Pc Model Price Processor Ram Disks Disks -> Disk \ Disk Disks There is a general technique for converting a CFG with regular expressions as production bodies to an ordinary CFG. We shall give the idea informally; you may wish to formalize both the meaning of CFGs with regular-expression productions and a proof that the extension yields no new languages beyond the CFLs. We show, inductively, how to convert a production with a regular-expression body to a collection of equivalent ordinary productions. The induction is on the size of the expression in the body. BASIS: If the body is the concatenation of elements, then the production is already in the legal form for CFGs, so we do nothing. INDUCTION: Otherwise, there are five cases, depending on the final operator used, 1. The production is of the form Л -+ , £2, where Ei and E2 are expressions pernntted in the DTD language. This is the concatenation case. Introduce two new variables, В and C, that appear nowhere else in the grammar. Replace A-¥ Ei ,E2 by the productions ABC BEi С E2 The first production, A ВС, is legal for CFGs, The last two may or may not be legal. However, their bodies are shorter than the body of the original production, so we may inductively convert them to CFG form.
|