Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 [ 69 ] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

/ a 1 fe I /a 1 I /0 t /1

F I\{E)

T -+ F\T*F

E -+ T\ E + T

Figure 5.19: An unambiguous expression grammar

Example 5.27: Figure 5,19 shows an unambiguous grammar that generates the same language as the grammar of Fig, 5.2. Think of F, T, and E as the

2. A sequence of identical operators can group either from the left or from the right. For example, if the *s in Fig. 5.17 were replaced by -i-s, we would see two different parse trees for the string E + E + E. Since addition and multiplication are associative, it doesnt matter whether we group from the left or the right, but to eliminate ambiguity, we must pick one. The conventional approach is to insist on grouping from the left, so the structure of Fig. 5.17(b) is the only correct grouping of two +-signs.

The solution to the problem of enforcing precedence is to introduce several different variables, each of which represents those expressions that share a level of binding strength. Specifically:

1. A factor is an expression that cannot be broken apart by any adjacent operator, either a * or a +. The only factors in our expression language are:

(a) Identifiers. It is not possible to separate the letters of an identifier by attaching an operator,

(b) Any parenthesized expression, no matter what appears inside the parentheses. It is the purpose of parentheses to prevent what is inside from becoming the operand of any operator outside the parentheses.

2. .A term is an expression that cannot be broken by the + operator. In our example, where + and * are the only operators, a term is a product of one or more factors. For instance, the term a * 6 can be broken if we use left associativity and place al* to its left. That is, al * a + Ь is grouped (al * o) * 6, which breaks apart the a* b. However, placing an additive term, such as aH-, to its left or +al to its right cannot break a*b. The proper grouping of al + a * his al + {a * b), and the proper grouping of a* b +ai is {a*b) + al.

3. An expression will henceforth refer to any possible expression, including those that can be broken by either an adjacent * or an adjacent -Ь. Thus, an expression for our example is a sum of one or more terms.



Figure 5,20: The sole parse tree for

a + a*a

The fact that this grammar is unambiguous may be far from obvious. Here are the key observations that explain why no string in the language can have two different parse trees.

Any string derived from T, a term, must be a sequence of one or more factors, connected by *s, A factor, as we have defined it, and as follows from the productions for F in Fig, 5.19, is either a .single identifier or any parenthesized expression.

Because of the form of the two productions for T, the only parse tree for a sequence of factors is the one that breaks /i * /a * - * Д, for n > 1 into a term /] * /2 * * /n-i and a factor fn- The reason is that F cannot derive expressions like / -1 * fn without introducing parentheses around them. Thus, it is not possible that when using the production Г - T* F, the F derives anything but the last of the factors. That is, the parse tree for a term can only look like Fig. 5.21.

Likewise, an expression is a sequence of terms connected by +. When we use the production E -> E + T to derive ti + t2 + + t , the T must derive only tn, and the E in the body derives t\ + 2 + + n-i-The reason, again, is that T cannot derive the sum of two or more terms without putting parentheses around them.

variables whose languages are the factors, terms, and expressions, ад defined above. For instance, this grammar allows oidy one parse tree for the string a + u * a; it is shown in Fig, 5.20.




Figure 5.21: The form of all parse trees for a term

5.4.3 Leftmost Derivations as a Way to Express Ambiguity

While derivations are not necessarily unique, even if the grammar is unambiguous, it turns out that, in an unambiguous grammar, leftmost derivations will he unique, and rightmost derivations will be unique. We shall consider leftmost derivations only, and state the result for rightmost derivations.

Example 5.28: As an example, notice the two parse trees of Fig. 5.18 that each yield E + E * E. If we construct leftmost derivations from them we get the following leftmost derivations from trees (a) and (b), respectively:

a) E + E I + E=> a + E=i a + E*E a+I*E a + a*E

hn Im Im Im Im Im Im

a + a* I => a + a*a

h) E E* Е=> E + E*E=> I + E*E a + E*E a + I* E

Im Im Im tm Im Im

a + a * E a + a*I= a + a*a

tm Im

Note that these two leftmost derivations differ. This example does not prove the theorem, but demonstrates how the differences in the trees force different steps to be taken in the leftmost derivation. □

Theorem 5.29: For each grammar G = (V,T, P, S) and string w in T*,w has two distinct parse trees if and only if w has two distinct leftmost derivations from S.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 [ 69 ] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175