Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 [ 63 ] 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

PROOF: The proof is an induction on the length of the derivation a w.

BASIS: If the derivation is one-step, then Л -f w must be a production. Since w consists of terminals only, the fact that w is in the language of A will be discovered in the basis part of the recursive inference procedure.

INDUCTION: Suppose the derivation takes n -i- 1 steps, and assume that for any derivation of n or fewer steps, the statement holds. Write the derivation as Л X1X2 Xfi w. Then, as discussed prior to the theorem, we can break w as w - wywo wk, where:

a) If Xi is a terminal, then тщ = Xi.

b) If Xi is a variable, then A.; w,. Since the first step of tho deriration Л w is surely not part of the derivation => we know that this derivation is of n or fewer steps. Thus, the inductive; hypothesis applies to it, and we know that wi is inferred to be in the language of X,:.

Now, we have a production A X1X2 Xk, with ш, either equal to Xf or known to be in the language of X In the next round of the recursive inference procedure, we shall discover that iuiwt is in the language of A. Since wiiv-2 -vik = u, we have shown that tv is inferred to be in the language of A. □

5,2.7 Exercises for Section 5.2

Exercise 5.2.1: For the grammar and each of the strings in Exercise 5.1.2, give parse trees.

! Exercise 5.2.2: Suppose that G is a CFG without any productions that have e as the right side. If w is in L(G), the length of w is n, and w has a derivation of m steps, show that w has a parse tree witli n -b m nodes.

! Exercise 5.2.3: Suppose all is as in Exercise 5.2.2, but G may have some productions with e as tlie right side. Show that a parse tree for a string ги other than e may have as many as n + 2m - 1 nodes, but no iiiort;.

! Exercise 5,2.4: In Section 5.2.6 we mentioned that if X1X2 X, a, then all the positions of n that come from expansion of .X, fire to the left of all the positions that come from expansion of Xj, if J < j. Prove this fact. Hint: Perform an induction on the number of steps in the derivation.

5.3 Applications of Context-Free Grammars

Context-free grammars were originally conceived by N. Chomsky as a way to describe natural languages. That promise has not been fulfilled. However, as uses for recursively defined concepts in Computer Science have multiplied, so has the need for CFGs as a way to describe instances of these conce])ts. We shall sketch two of these uses, one old and one new.



1. Grammars are used to describe programming languages. More importantly, there is a mechanical way of turning the language description as a CFG into a parser, the component of the compiler that discovers the structure of the source program and represents that structure by a parse tree. This application is one of the earhest uses of CFGs; in fact it is one of the first ways in which theoretical ideas in Computer Science found their way into practice.

2. The development of XML (Extensible Markup Language) is widely predicted to facilitate electronic commerce by allowing participants to share conventions regarding the format of orders, product descriptions, and many other kinds of documents. An essential part of XML is the Document Type Definition (DTD), which is essentially a context-free grammar that describes the allowable tags and the ways in which these tags may be nested. Tags are the fanuliar kcjwords with triangular brackets that you may know from HTML, e.g., <EH> and </EK> to .surround text that needs to be emphasized. However. XML tags deal not with the formatting of text, but with the meaning of text. For instance, one could surround a sequence of characters that was intended to be interpreted as a pliont number by <PHONE> and </PHONE>.

5.3.1 Pcirsers

Many aspects of a programming language have a structure that may be described by regular expressions. For instance, we discussed in Example 3.9 how identifiers could be represented by regular expressions. However, there are idso some very important aspects of typical programming languages that cannot be represented by regular expressions alone. The following are two examples.

Example 5.19 : Typical languages use parentheses and/or brackets in a nested and balanced fashion. That is, we must be able to match some left parenthesis against a right parenthesis that appears immediately to its right, remove both of them, and repeat. If we eventually eliminate all the parentheses, then the string was balanced, and if we cannot match parentheses in this way, then it is unbalanced. Examples of strings of balanced parentheses are (()), ()(), (()()), and e, while )( and (() are not.

A grammar Gbai ~ {{B], {{,)],P,B) generates all and only the strings of balanced parentheses, where P consists of the productions:

В BB \ (B) \ e

The first production, В -> BB, says that the concatenation of two strings of balanced parentheses is balanced. That assertion makes sense, because we can match the parentheses in the two strings independently. The second production, В - (B), says that if we place a pair of parentheses around a balanced string, then the result is balanced. Again, this rule makes sense, because if we match



tho parentheses in the inner string, then they are ail eliminated and we are then allowed to match the first and last parentheses, which have become adjacent. The third production, В e is the basis; it says that the empty string is balanced.

The above informal arguments should convince us that GbaS generates all strings of balanced parentheses. We need a proof of the converse - that every string of balanced parentheses is generated by this grammar. However, a proof by induction on the lengtii of the balanced string is not hard and is left as an exercise.

We mentioned that the set of strings of balanced parenthescis is not a regular language, and we shall now prove that fact. If L{Gbai) were regnlar, then there would be a constant n for this language from the pumping lemma for regular languages. Consider the balanced string w = ( ) , that is, n left parentheses followed by n matching right parentheses. If we break w = xyz according to the pumping lemma, then у consists of oidy left parenth(!ses, and therefoie xz has more right parentheses than left. This string is not balanced, contradicting the assumption that the language of balanced parentheses is regular. □

Programming languages consist of more than parentheses, of course, but parentheses are an essential part of arithmetic or conditional expressions. The grammar of Fig. 5,2 is more typical of the structure of arithmetic expressions, although we used only two operators, plus and times, and we included the detailed structure of identifiers, which would more likely be handled by the lexical-analyzer portion of the compiler, as we mentioned in Section 3,3.2. However, the language described in Pig, 5.2 is not regular either. For instance, according to this grnmar, ( ) is a legal expression. We can use the pumping lemma to show that if the language were regular, then a string with some of the left parentheses removed and the a and all right parentheses intact would also be a legal expression, which it is not.

There are numerous aspects of a typical progranmiing Umguage that behave like balanced parentheses. There will usually be parentheses tliemselves, in expressions of all types. Beginnings and endings of code blocks, such as begin and end in Pascal, or the curly braces {, ..} of C, aie examples. That is, whatever curly braces appear m a С program must form a balanced sequence, with { in place of the left parenthesis and } in place of the right parenthesis.

There is a related pattern that appears occasionally, where parentheses can be balanc:ed with the exception that there can be unbaUuu;ed left parentheses. An example is the treatment of if and else in C, An if-clause can appear unbalanced by any else-clause., or it may be balanced by a matching else-clause. A grammar that generates the possible sequences of if and else (represented by i and c, respectively) is;

S->€\SS\iS\ iSeS

For instance, ieie, He, and iei are possible sequences of if s and elses, and each of these strings is generated by the above grammar. Some examples of illegal sequences, not generated by the grammar, are ei and ieeii.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 [ 63 ] 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175