Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 [ 87 ] 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

Pair

Productions

{E,E)

E E + T

iE,T)

ET*F

{E,F)

E{E)

{EJ)

£ ~y 0 1 & 1 /a 1 /6 1 /0 1 Л

{T,T)

T T*F

{T,F)

T{E)

(TJ)

T a\b\Ia\Ib\IO\n

{F,F)

F{E)

(FJ)

Fa\b\Ia\Ib\JO\n

/ -+ a 1 6 Uc 1 /6 UO 1 Л

Figure 7.1: Grammar constructed by step (2) of the unit-production-elimination algorithm

E E + T\ T *F \{E)\a\b\Ia T T*F\{E)\a\b\Ia\Ib\I(i F {E)\a\b\Ia\Ib\I{)\Il / a ! Ь I /a I /6 I /0 I Л

76 Л

/О I /1

has no unit productions, yet generates the same set of expressions as the grammar of Fig. 5.19. □

Theorem 7.13: ff grammar Gi is constructed from grammar G by the algorithm described above for ehminating unit productions, then L{G\) = L{G).

PROOF: We show that w is in L[G) if and only if u; is m L(Gi).

(If) Suppose S w. Since every production of Gj is equivalent to a sequence

of zero or more unit productions of G followed by a nonunit production of G,

we know that a [i implies jH. That is, every step of a derivation in G\ Gi G

can be replaced by one or more derivation steps in G. If wo put these sequences of steps together, we conclude that S w.

(Only-if) Suppose now that w is in L(G). Then by the equivalences in Section 5.2, we know that w has a leftmost derivation, i.e., S => w. Whenever a

unit production is used in a leftmost derivation, the variable of the body becomes the leftmost variable, and so is immediately replaced. Thus, the leftmost derivation in grammar G can be broken into a sequence of stops in whicli zero or more unit productions are followed by a nonunit production. Note that any nonunit production that is not preceded by a unit production is a step by itself Each of these steps can be performed by one production of Gi, because the construction of Gi created exactly the productions that reflect zero or more unit productions followed by a nonunit production. Thus, S => w. □



We can now summarize the various simpUfications described so far. We want to convert any CFG G into an equivalent CFG that has no useless symbols, e-productions, or unit productions. Some care must be taken in the order of application of the constructions. A safe order is:

1. EUminate e-productions,

2. Ehminate unit productions.

3. Eliminate useless symbols.

You should notice that, just as in Section 7.1,1, where we had to order the two steps properly or the result might have useless symbols, we must order the three steps above as shown, or the result might still have some of the features we thought we were eliminating.

Theorem 7.14; If G is a CFG generating a language that contains at least one string other than e, then there is another CFG Gi such that L{Gi) = L{G)-{e}, aird Gi has no e-productiohs, unit productions, or useless symbols.

PROOF: Start by eliminating the e-productions by the method of Section 7.1.3. If we then eliminate unit productions by the method of Section 7.1.4, we do not introduce any e-productions, since the bodies of the new productions are each identical to some body of an old production. Finally, we eliminate useless symbols by the method of Section 7,1,1, As this transformation only eliminates productions and symbols, never introducing a new production, the resulting grammar will still be devoid of e-productions and unit productions. □

7Л.5 Chomsky Normal Form

We complete our study of grammatical simplifications by showing that every nonempty CFL without e has a grammar G in which all productions are in one of two simple forms, either:

1. A -¥ ВС, where A, B, and C, are each variables, or

2. A a, where A is a variable and a is a terminal.

Further, G has no useless symbols. Such a grammar is said to be in Chomsky Normal Form, or CNF,

To put a grammar in CNF, start with one that satisfies the restrictions of Theorem 7.14; that is, the grammar has no e-productions, unit productions, or useless symbols. Every production of such a grammai is either of the form A a, which is already in a form allowed by CNF, or it has a body of length 2 or more. Our tasks are to:

N. Chomsky is the linguist who first proposed context-free grammars as a way to describe natural languages, and who proved that every cfg could be converted to this form. Interestingly, CNF does not appear to have important uses in natural linguistics, although we shall see it has several other uses, such as an efficient test for membership of a string In a context-free language (Section 7.4.4).



I TMF I LER \ a\ b\ JA \ IB \ JZ \ 10 I LER \ a\ b\ IA \ IB \ IZ \ 10 a\b \ IA \ IB \ IZ \ 10

IA\IB\IZ\ 10

LER j

a 1 6 1

->

->

Figure 7.2: Making all bodies either a single terminal or several variables

a) Arrange that all bodies of length 2 or more consist only of variables.

b) Break bodies of length 3 or more into a cascade of productions, each with a body consisting of two variables.

The construction for (a) is as follows. For every terminal a that appears in a body of length 2 or more, create a new variable, say A. This variable has only one production, A -i a. Now, we use A in place of о everywhere a appears in a body of length 2 or more. At this point, every production has a body that is either a single terminal or at least two variables and no terminals.

For step (b), we must break those productions A ~+ BiB Bk, for к > into a group of productions with two variables in each body. We introduce к - 2 new variables, Ci, C2,.. .,Ck-2- The original production is replaced by the fc - 1 productions

A-BiCi, Ci-> B2C2,..., C7fc 3-> Bfc 3Cft-2, Ca; 2-> Ffc-iBfc

ExEunple 7.15; Let us convert the grammar of Example 7.12 to CNF. For part (a), notice that there are eight terminals, a, 6, 0, 1, +, *, (, and ), each of which appears in a body that is not a single terminal. Thus, we must introduce eight new variables, corresponding to these terminals, and eight productions in which the new variable is replaced by its terminal. Using the obvious initials as the new variables, we introduce:

A-a B-b ZQ 01 F->+ M-y* L{ R-&)

If we introduce these productions, and replace every terminal in a body that is other than a single terminal by the corresponding variable, we get the grammar shown in Fig. 7.2.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 [ 87 ] 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175