Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 [ 99 ] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

Example 7,34: The following are the productions of a CNF grammar G:

S -> AB\BC A BA\a

В -+ СС\Ь С -+ АВ\а

We shall test for membership in L{G) the string baaba. Figure 7.14 shows the table filled iu for this string.

{S,A,Q

{S,A,Q

{S,A}

{S.Q

iS.A}

{A,Q

{B} {A,Q

b a

Figure 7.14: The table for string baaba constructed by the CYK algorithm

To construct the first (lowest) row, we use the basis rule. We have only to consider which variables have a production body a (those variables aic A and C) and which variables have body 6 (only В does). Thus, above those positions holding a we see the entry {A,C}, and above the positions holding b we see [В]. That is,Xii =X4j ={B}, and X22 = X33 = X55 = {A,C].

In the second row we see the values of Aia, Л23, A34, and X. For instance, let us see how X\2 is computed. There is only one way to break the string from positions 1 to 2, which is ba, into two nonempty substrings. The first must be position 1 and the second must be position 2. In order for a variable to generate ba, it must have a body whose first variable is in Хц = {В} (i.e., it generates the h) and whose second variable is in X22 - {A, C} (i.e., it generates the a). This body can only be В A or ВС. If we inspect the grammar, we find that the productions A В A jmd S ВС are the only ones with these bodies. Thus, the two heads, A and S, constitute X12.

For a more complex example, consider the computation of We can

break the string aab that occupies positions 2 through 4 by ending the first string after position 2 or position 3, That is, we may choose A; = 2 or fc = 3 in the definition of X24. Thus, we must consider all bodies in AViXsj U Л2зХ44, This set of strings is {Л,С}{5,С} U = {AS,AC,CS,CGBB}. Of the



five strings in this set, only С С is a, body, and its head is B. Thus, Xii = {B}. □

7.4.5 Preview of Undecidable CFL Problems

In the next chapters we shall develop a remarkable theory that lets us prove formally that there are problems we cannot solve by any algorithm that can run on a computer. We shall use it to show that a number of simple-to-state questions about grammars and CFLs have no algorithm; they are called undecidable problems. For now, we shall have to content ourselves with a list of the most significant undecidable questions about context-free grammars and languages. The following are undecidable:

1. Is a given CFG G ambiguous?

2. Is a given CFL inherently ambiguous?

3. Is the intersection of two CFLs empty?

4. Are two CFLs the same?

5. Is a given CFL equal to S*, where S is the alphabet of this language?

Notice that the flavor of question (1), about ambiguity, is somewhat different from the others, in that it is a question about a grammar, not a language. All the other questions assume that the language is represented by a grammar or PDA, but the question is about the language (s) defined by the grammar or PDA. For instance, in contrast to question (1), the second question asks, given a grammar G (or a PDA, for that matter), does there exist some equivalent grammar G that is unambiguous. If G is itself unambiguous, then the answer is surely yes, but if G is ambiguous, there could still be some other grammar G for the same language that is unambiguous, as we learned about expression grammars in Example 5.27.

7.4.6 Exercises for Section 7.4

Exercise 7.4.1: Give algorithms to decide the following:

* a) Is L{G) finite, for a given CFG G? Hint: Use the pumping lemma.

! b) Does L(G) contain at least 100 strings, for a given CFG G?

!! c) Given a CFG G and one of its variables A, is there any sentential form in which A is the first symbol. Note: Remember that it is possible for A to appear first in the middle of some sentential form but then for all the symbols to its left to derive e.

Exercise 7.4.2: Use the technique described in Section 7,4,3 to develop linear-time algorithms for the following questions about CFGs:



7.5. SUMMARY OF CHAPTER 7 303

a) Which symbols appear in some sentential form?

b) Which symbots are nullable (derive e)?

Exercise 7.4.3: Using the grammar G of Example 7.34, use the CYK algorithm to determine whether each of the following strings is in L{G):

* a) ababa.

b) baaab.

c) aabab.

* Exercise 7.4.4: Show that in any CNF grammar, all parse trees for strings of length n have 2n - 1 interior nodes (i.e., 2n - 1 nodes with variables for labels).

! Exercise 7.4.5: Modify the CYK algorithm so can report the number of distinct parse trees for the given input, rather than just reporting membership in the language.

7.5 Summary of Chapter 7

-f Eliminating Useless Symbols: A variable can be elimmated from a CFG unless it derives some string of terminals and also appears in at least one string derived from the start symbol. To correctly eliminate such useless symbols, we must first test whether a variable derives a terminal string, and eliminate those that do not, along with all their productions. Only then do we eliminate variables that are not derivable from the start symbol.

Eliminating e- and Unit-productions: Given a CFG, we can find another CFG that generates the same language, except for string e, yet has no e-productions (those with body e) or unit productions (those with a single rariable as the body).

¥ Chomsky Normal Form: Given a CFG that derives at least one nonempty string, we can find another CFG that generates the same language, except for e, and is in Chomsky Normal Form: there are no useless symbols, and every production body consists of either two variables or one terminal.

♦ The Pumping Lemma: In any CFL, it is possible to find, in any sufficiently long string of the language, a short substring such that the two ends of that substring can be pumped in tandem; i.e., each can be repeated any desired number of times. The strings being pumped are not both e. This lemma, and a more powerful version called Ogdens lemma mentioned in Exercise 7.2.3, allow us to prove many languages not to be context-free.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 [ 99 ] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175