Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 [ 84 ] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

3, We must eliminate unit productions, those of the form Л 5 for variables A and B.

7.1.1 Eliminating Useless Symbols

We say a symbol X is useful for a grammar G = (V,T,P,S) if there is some derivation of the form S aX0 => w, where w is in T*. Note that X may be in either V or T, and the sentential form aX0 might be the first or last in the derivation. If Ji is not useful, we say it is useless. Evidently, omitting useless symbols from a grammar will not change the language generated, so we may as well detect and eliminate all useless symbols.

Our approach to eliminating useless symbols begins by identifying the two things a symbol has to be able to do to be useful;

1. We say X is generating ii X => w for some terminal string w. Note that every terminal is generating, since w can be that terminal itself, which is derived by zero steps.

2. We say X is reachable if there is a derivation S aX for some a and

Surely a symbol that is useful will be both generating and reachable. If we eliminate the symbols that are not generating first, and then eliminate from the remaining grammar those symbols that are not reachable, we shall, as will be proved, have only the useful symbols left.

Example 7.1: Consider the grammai:

S AB \a Ab

All symbols but В are generating; a and b generate themselves; S generates a, and A generates b. If we eliminate B, we must eliminate the production S - AB, leaving the grammar:

Now, we find that only S and a are reachable from S. Eliminating A and b leaves only the production S a. That production by itself is a grammar whose language is {a}, just as is the language of the original grammar.

Note that if we start by checking for reachability first, we find that all symbols of the grammar

S AB \a Ab



axe reachable. If we then eliminate the symbol В because it is not generating, we are left with a grammar that still has useless symbols, in particular, A and b. □

Theorem 7.2: Let G = {V,T,P,S) be a CFG, and assume that L{G) ф %\ i.e., G generates at least one string. Let Gi = (l/i,ri,Pi,S) be the grammar we obtain by the following steps:

L First eliminate nongenerating symbols and all productions mvolvlng one or more of those symbols. Let Gi - (72,Г2,Р2,5) be this new grammar. Note that 5 must be generating, since we assume L{G) has at least one string, so 5 has not been eliminated.

2. Second, eliminate ail symbols that are not reachable in the grammar G-i. Then G\ has no useless symbols, and L{Gi) - L{G).

PROOF: Suppose X is a symbol that remains; i.e., X is in Vi U Ti. We know that X w for some w inT. Moreover, every symbol used in the derivation

of w from X is also generating. Thus, X w.

Since X was not eliminated in the second step, we also know that there are a and 0 such that S aX0. Further, every symbol used in this derivation is

reachable, so 5 аХ/З.

We know that every symbol in aXj3 is reachable, and we also know that all these sinbols are in V2 U Tb, so each of them is generating in G2. The derivation of some terminal string, say aX0 xwy, involves only symbols

that are reachable from S, because they are reached by symbols in aXfi. Thus, this derivation is also a derivation of Gi; that is,

S aXp xwy Gi Gi

We conclude that X is useful in Gi- Since X is an arbitrary symbol of Gi, we conclude that Gi has no useless symbols.

The last detail is that we must show L{Gi) = L{G). As usual, to show two sets the same, we show each is contained in the other.

L(Gi) С L(G): Since we have only eliminated symbols and productions from G to get Gi, it follows that L(Gi) С L{G).

L{G) С L(Gi): We must prove that if w is in L(G), then w is in L(Gi). If w is in L(G), then S w. Each symbol in this derivation is evidently both

reachable and generating, so it is also a derivation of Gi. That is, S w, and

thus w is in L(Gi). □



7.1.2 Computing the Generating and Reachable Symbols

Two points remain. How do we compute the set of generating symbols of a grammar, and how do we compute the set of reachable symbols of a grammar? For both problems, the algorithm we use tries its best to discover symbols of these types. We shall show that if the proper inductive constructions of these sets fails to discover a symbol to be generating or reachable, respectively, then the symbol is not of these types.

Let G - (V, T, P, S) be a grammar. To compute the generating symbols of G, we perform the following induction.

BASIS: Every symbol of T is obviously generating; it generates itself,

INDUCTION; Suppose there is a production Л -> q, and every symbol of a is already known to be generating. Then A is generating. Note that this rule includes the case where a = e; all variables that have e as a production body are surely generating.

Example 7.3: Consider the grammar of Example 7,1, By the basis, a and b are generating. For the induction, we can use the production A 6 to conclude that A is generating, and we can use the production 5 - a to conclude that 5 is generating. At that pomt, the induction is finished. We cannot use the production 5 -> AB, because В has not been established to be generating. Thus, the set of generating symbols is {a, b, A,S}. □

Theorem 7.4: The algorithm above finds all and only the generating symbols of G.

PROOF: For one direction, it is an easy induction on the order in which symbols are added to the set of generating symbols that each symbol added really is generating. We leave to the reader this part of the proof.

For the other direction, suppose X is a generating symbol, say X w.

We prove by induction on the length of this derivation that X is found to be generating.

BASIS: Zero steps. Then J is a terminal, and X is found in the basis.

INDUCTION: If the derivation takes n steps for n > 0, then AT is a variable. Let the derivation be X => a w; that is, the first production used is X a. Each symbol of a derives some terminal string that is a part of w, and that derivation must take fewer than n steps. By the inductive hypothesis, each symbol of a is found to be generating. The inductive part of the algorithm allows us to use production X a to infer that X is generating. □

Now, let us consider the inductive algorithm whereby we find the set of reachable symbols for the grammar G = {V,T,P,S). Again, we can show that by trying our best to discover reachable symbols, any symbol we do not add to the reachable set is really not reachable.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 [ 84 ] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175