Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 [ 133 ] 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

We can have our program do any particular thing we want, e.g., halt or print hello, world, when and if it finds a solution. Otherwise, the program will never perform that particular action. Thus, it is undecidable whether a program prints hello, world, whether it halts, whether it calls a particular function, rings the console bell, or makes any other nontrivial action. In fact, there is an analog of Rices Theorem for programs: any nontrivial property that involves what the program does (rather than a lexical or syntactic property of the program itself) must be undecidable.

9.5.2 Undecidabiiity of Ambiguity for CFGs

Programs are sufficiently like Turing machines that the observations of Section 9.5.1 are unsurprising. Now, we shall see how to reduce PCP to a problem that looks nothing like a question about computers: the question of whether a given context-free grammar is ambiguous.

The key idea is to consider strings that represent a list of indexes (integers), in reverse, and the corresponding strings according to one of the lists of a PCP instance. These strings can be generated by a grammar. The similar set of strings for the other list in the PCP instance can also be generated by a grammar. If we take the union of these grannnars in the obvious way, then there is a string generated through the productions of each original grammar if and only if there is a solution to this PCP instance. Thus, there is a solution if and only if there is ambiguity in the grammar for the union.

Let us now make these ideas more precise, Let the PCP instance consist of lists A = wi,W2,...,Wk and В - xi, хг ,., at For list A we shall construct a CFG with A as the only variable. The terminals are all the symbols of the alphabet £ used for this PCP instance, plus a distinct set of index symbols ai,a2, - jdh that represent the choices of pairs of strings in a solution to the PCP instance. That is, the index symbol Oj represents the choice of Wi from the A list or Х{ from the В list. The productions for the CFG for the A hst are:

A -i- wi Aai I W2Aa2 WkAak \

Vhttx I W2a2 ]] WkOk

We shall call this grammar Ga and its language La- In the future, we shall refer to a language like La as the language for the list A.

Notice that the terminal strings derived by Ga are all those of the form uJijU;J2 WiOi -Oija,! for some m > 1 and list of Integers i], гг,. . ,im; each integer is in the range 1 to k. The sentential forms of Ga all have a single A between the strings (the ws) and the index symbols (the as), until we use one of the last group of к productions, none of which have an A in the body. Thus, parse trees look Hke the one suggested in Fig, 9,16.

Observe also that any terminal string derivable from A in Ga has a unique derivation. The index symbols at the end of the string determine uniquely which production must be used at each step. That is, only two production bodies end with a given index symbol a: Л ги,Ла,- and A и){Щ. We must



w



Figure 9.16; The form of parse trees in the grammar Ga

use the first of these if the derivation step is not the last, and we must use the second production if it is the last step.

Now, let us consider the other part of the given PCP instance, the hst В = xi,X2f.-,Xk. For this hst we develop another grammar Gb-

В xiBai \x2Ba2\---\xkBak\

X\ai I 202 I Ifcttft

The language of this grammar will be referred to as Lb- The same observations that we made for Ga apply also to Gg. In particular, a terminal string in Lb has a unique derivation, which can be determined by the index symbols in the tail of the string.

Finally, we combine the languages and grammars of the two lists to form a grammar Gab for the entire PCP instance. Gab consists of:

1, Variables A, B, and 5; the latter is the start symbol.

2. Productions S A\ B.

3. All the productions of Ga-

4, All the productions of Gb ,

We claim that Gab is ambiguous if and only if the instance (A, B) of PCP has a solution; that argument is the core of the next theorem.

Theorem 9.20: It is undecidable whether a CFG is ambiguous.



PROOF: We have already given most of the reduction of PCP to the question of whether a CFG is ambiguous; that reduction proves the problem of CFG ambiguity to be undecidable, since PCP is undecidable. We have only to show that the above construction is correct; that is:

Gab is ambiguous if and only if instance {A, B) of PCP has a solution.

(If) Suppose 11,12,... ,im is a solution to this instance of PCP. Consider the two derivations in Gab-

S A=> WiAaii => Wi-WiAaiai.

WiWi Wi , Aai, - uiui, Wi, wi-- Wi а.-а

S В Xi, Boij Xi, Xi Buiat

XiXi Xi ,Bai, , - --aiai, a;,-, Xf - Xiai --щщ,

Since ii,i2,. - - -Jm is a solution, we know that WiiWi --Wi - XiiXi -Xi. Thus, these two derivations are derivations of the same terminal string. Since the derivations themselves are clearly two distinct, leftmost derivations of the same terminal string, we conclude that Gab is ambiguous.

(Only-if) We already observed that a given terminal string cannot have more than one derivation in Ga and not more than one in Gb- So the only way that a terminal string could have two leftmost derivations in Gab is if one of them begins S A and continues with a derivation in Ga, while the other begins S => В and continues with a derivation of the same string in Gb-

The string with two derivations has a tail of indexes dj ftj , for some m > 1. This tail must be a solution to the PCP uistance, because what precedes the tail in the string with two deritions is both Wi,Wi - Wi and

9.5.3 The Complement of a List Language

Having context-free languages like La for the list A lets us show a number of problems about CFLs to be undecidable. More undecidabiiity facts for CFLs can be obtained by considering the complement language La- Notice that the language La consists of all strings over the alphabet £ U {ai,a2,... ,ak} that are not in La, where S is the alphabet of some instance of PCP, and the Ofs are distinct symbols representing the indexes of pairs in that PCP instance.

The interesting members of Z7 are those strings consisting of a prefix in £* that is the concatenation of some strings from the A list, followed by a suffix of index symbols that does not match the strings from A. However, there are also many strings in La that are simply of the wrong form: they are not in the language of regular expression E*(ai -I- Q2 -I-----h a)*.

We claim that La is a CFL. Unlike La, it is not very easy to design a grammar for La, but we can design a PDA, in fact a deterministic PDA, for La The construction is in the next theorem.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 [ 133 ] 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175