Строительный блокнот Automata methods and madness
Figure 7.1: Grammar constructed by step (2) of the unit-production-elimination algorithm E E + T\ T *F \{E)\a\b\Ia T T*F\{E)\a\b\Ia\Ib\I(i F {E)\a\b\Ia\Ib\I{)\Il / a ! Ь I /a I /6 I /0 I Л 76 Л /О I /1 has no unit productions, yet generates the same set of expressions as the grammar of Fig. 5.19. □ Theorem 7.13: ff grammar Gi is constructed from grammar G by the algorithm described above for ehminating unit productions, then L{G\) = L{G). PROOF: We show that w is in L[G) if and only if u; is m L(Gi). (If) Suppose S w. Since every production of Gj is equivalent to a sequence of zero or more unit productions of G followed by a nonunit production of G, we know that a [i implies jH. That is, every step of a derivation in G\ Gi G can be replaced by one or more derivation steps in G. If wo put these sequences of steps together, we conclude that S w. (Only-if) Suppose now that w is in L(G). Then by the equivalences in Section 5.2, we know that w has a leftmost derivation, i.e., S => w. Whenever a unit production is used in a leftmost derivation, the variable of the body becomes the leftmost variable, and so is immediately replaced. Thus, the leftmost derivation in grammar G can be broken into a sequence of stops in whicli zero or more unit productions are followed by a nonunit production. Note that any nonunit production that is not preceded by a unit production is a step by itself Each of these steps can be performed by one production of Gi, because the construction of Gi created exactly the productions that reflect zero or more unit productions followed by a nonunit production. Thus, S => w. □ We can now summarize the various simpUfications described so far. We want to convert any CFG G into an equivalent CFG that has no useless symbols, e-productions, or unit productions. Some care must be taken in the order of application of the constructions. A safe order is: 1. EUminate e-productions, 2. Ehminate unit productions. 3. Eliminate useless symbols. You should notice that, just as in Section 7.1,1, where we had to order the two steps properly or the result might have useless symbols, we must order the three steps above as shown, or the result might still have some of the features we thought we were eliminating. Theorem 7.14; If G is a CFG generating a language that contains at least one string other than e, then there is another CFG Gi such that L{Gi) = L{G)-{e}, aird Gi has no e-productiohs, unit productions, or useless symbols. PROOF: Start by eliminating the e-productions by the method of Section 7.1.3. If we then eliminate unit productions by the method of Section 7.1.4, we do not introduce any e-productions, since the bodies of the new productions are each identical to some body of an old production. Finally, we eliminate useless symbols by the method of Section 7,1,1, As this transformation only eliminates productions and symbols, never introducing a new production, the resulting grammar will still be devoid of e-productions and unit productions. □ 7Л.5 Chomsky Normal Form We complete our study of grammatical simplifications by showing that every nonempty CFL without e has a grammar G in which all productions are in one of two simple forms, either: 1. A -¥ ВС, where A, B, and C, are each variables, or 2. A a, where A is a variable and a is a terminal. Further, G has no useless symbols. Such a grammar is said to be in Chomsky Normal Form, or CNF, To put a grammar in CNF, start with one that satisfies the restrictions of Theorem 7.14; that is, the grammar has no e-productions, unit productions, or useless symbols. Every production of such a grammai is either of the form A a, which is already in a form allowed by CNF, or it has a body of length 2 or more. Our tasks are to: N. Chomsky is the linguist who first proposed context-free grammars as a way to describe natural languages, and who proved that every cfg could be converted to this form. Interestingly, CNF does not appear to have important uses in natural linguistics, although we shall see it has several other uses, such as an efficient test for membership of a string In a context-free language (Section 7.4.4). I TMF I LER \ a\ b\ JA \ IB \ JZ \ 10 I LER \ a\ b\ IA \ IB \ IZ \ 10 a\b \ IA \ IB \ IZ \ 10 IA\IB\IZ\ 10
Figure 7.2: Making all bodies either a single terminal or several variables a) Arrange that all bodies of length 2 or more consist only of variables. b) Break bodies of length 3 or more into a cascade of productions, each with a body consisting of two variables. The construction for (a) is as follows. For every terminal a that appears in a body of length 2 or more, create a new variable, say A. This variable has only one production, A -i a. Now, we use A in place of о everywhere a appears in a body of length 2 or more. At this point, every production has a body that is either a single terminal or at least two variables and no terminals. For step (b), we must break those productions A ~+ BiB Bk, for к > into a group of productions with two variables in each body. We introduce к - 2 new variables, Ci, C2,.. .,Ck-2- The original production is replaced by the fc - 1 productions A-BiCi, Ci-> B2C2,..., C7fc 3-> Bfc 3Cft-2, Ca; 2-> Ffc-iBfc ExEunple 7.15; Let us convert the grammar of Example 7.12 to CNF. For part (a), notice that there are eight terminals, a, 6, 0, 1, +, *, (, and ), each of which appears in a body that is not a single terminal. Thus, we must introduce eight new variables, corresponding to these terminals, and eight productions in which the new variable is replaced by its terminal. Using the obvious initials as the new variables, we introduce: A-a B-b ZQ 01 F->+ M-y* L{ R-&) If we introduce these productions, and replace every terminal in a body that is other than a single terminal by the corresponding variable, we get the grammar shown in Fig. 7.2.
|