Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 [ 85 ] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

BASIS: S is surely reachable.

INDUCTION: Suppose we have discovered that some variable A is reachable. Then for all productions with A in the head, all the symbols of the bodies of those productions are also reachable.

Example 7.5: Again start with the grammar of Example 7.1. By the basis, S is reachable. Since 5 has production bodies AB and a, we conclude that A, B, and a are reachable. В has no productions, but A has A b. We therefore conclude that b is reachable. Now, no more symbols can be added to the reachable set, which is {S, A, B, a, b}. □

Theorem 7.6: The algorithm аЬоле finds all and only the reachable symbols of G.

PROOF: This proof is another pair of simple inductions akin to Theorem 7.4. We leave these arguments as an exercise. □

7.1.3 Eliminating e-Preductions

Now, we shall show that e-productions, while a convenience in many grammar-design problems, are not essential. Of course without a production that has an б body, it is impossible to generate the empty string as a member of the language. Thus, what we actually prove is that if language L has a CFG, then L - {e} has a CFG without e-productions. If e is not in L, then L itself is L - {б}, so L has a CFG with out e-productions.

Our strategy is to begin by discovering which variables are nullable. A variable A is nullable \{ A e. If A is unliable, then whenever A appears in a production body, say В -> CAD, A might (or might not) derive e. We make two versions of the production, one without A in the body {B -> CD), which corresponds to the case where A would have been used to derive e, and the other with A still present {B -> CAD). However, if we use the version with A present, then we cannot allow A to derive 6. That proves not to be a problem, since we shall srniply efiminate all productions with б bodies, thus preventing any variable from deriving e.

Let G = (У, Г, P, S) be a CFG. We can find all the nullable symbols of G by the following iterative algorithm. We shall then show that there are no nullable symbob except what the algorithm finds.

BASIS: If Л e is a production of G, then A is nullable.

INDUCTION: If there is a production В C\C2---Ck, where each d is nullable, then В is nullable. Note that each Gj must be a variable to be nullable, so we only have to consider productions with all-variable bodies.

Theorem 7.7: In any grammar G, the only nullable symbols are the variables found by the algorithm above.



PROOF: For the if direction of the implied A is nullable if and only if the algorithm identifies A as nullable, we simply observe that, by an easy induction on the order in which nullable symbols are discovered, that the each such symbol truly derives e. For the only-if part, we can perform an induction on the length of the shortest derivation A = с

BASIS: One step. Then Л e must be a production, and A is discovered in the basis part of the algorithm.

INDUCTION; Suppose A с by u steps, where n > 1. The first step must look like A => СС-- - Ck e, where each Ci derives e by a sequence of fewer than n steps. By the inductive hypothesis, each Ci is discovered by the algorithm to be nullable. Thus, by the inductive step. A, thanks to the production A Ci C2 Cfc, is found to be nullable. □

Now we give the construction of a grammar without c-productions. Let G = {V,T,P,S) be a CFG. Determine all the nullable symbols of G. We construct a new grammar G\ - (V,Г,P\,S), whose set of productions Pi is determined as follows.

For each production A XiX2 - Xk of P, where к > I, suppose that m of the к Xis are nullable symbols. The new grammar Gi will have 2 versions of this production, where the nullable Xs, in all possible combinations are present or absent. There is one exception: if m = k, i.e., all symbols are nullable, then we do not include the case where all Xis are absent. .41so, note that if a production of the form A -> e is in P, we do not place this production in Pi.

Example 7.8 : Consider the grammar

SAB A aAA I e В ЬВВ I €

First, let us find the nullable symbols. A and В are directly nullable because they have productions with e as the body. Then, we find that 5 is nullable, because the production S AB has a body consisting of nullable symbols only. Thus, all three variables are nullable.

Now, let us construct the productions of grammar G\. First consider S AB. All symbols of the body are nullable, so there are four ways we could choose present or absent for A and B, independently. However, we are not allowed to choose to make all symbols absent, so there are only three productions:

SAB\A\B

Next, consider production A -> aAA. The second and third positions hold nullable symbols, so again there are four choices of present/absent. In this case.



al] four choices are allowable, since the nonmillable symbol a will be present in any case. Our four choices yield productions:

A aAA I aA\ aA{a

Note that the two middle choices happen to yield the same production, since it doesnt matter which of the As we eliminate if we decide to eliminate one of them. Thus, the final grammar Gi will only have three productions for A. Similarly, the production В yields for Gi:

В bBB \bB\b

The two 6-productions of G yield nothing for Gi Thus, the following productions:

SAB\A]B A -¥ aAA I aA I a В bBB \bB\b

constitute Gi. □

We conclude our study of the elimination of e-productions by proving that the construction given above does not change the language, except that e is no longer present if it was in the language of G. Since the construction obviously eliminates e-productions, we shall have a complete proof of the claim that for every CFG G, there is a grammar Gi with no e-productions, such that

L(GO = b(G) - {e}

Theorem 7.9: If the grammar Gi is constructed from G by the above construction for eliminating e-productions, then L{Gi) = L{G) - {e}.

PROOF: We must show that \i w ф then w \s in L{G\) if and only if w is in L{G). As is often the case, we find it easier to prove a more general statement. In this case, we need to talk about the terminal strings that each variable generates, even though we only care what the start symbol S generates. Thus, we shall prove:

A w\i and only \i A w and w Ф t.

Gi G

In each case, the proof is an induction on the length of the derivation. (Only-if) Suppose that A U w. Then surely w ф because Gi has no e-

productions. We must show by induction on the length of the derivation that A w.

BASIS: One step. Then there is a production A w in Gi. The construction of G] tells us that there is some production A a of G, such that tt is with zero or more nullable variables interspersed- Then in G, A a where

the steps after the first, if any, derive e from whatever variables there are in a.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 [ 85 ] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175