Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 [ 59 ] 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

The Form of Proofs About Grammars

Theorem 5.7 is typical of proofs that show a grammar defines a particular, informally defined language. We iirst develop an inductive hypothesis that states what properties the strings derived from each variable have. In this example, there was only one variable, P, so we had only to claim that its strings were palindromes.

We prove the if part: that if a string w satisfies the informal statement about the strings of one of the variables A, then A w. In our example, since P is the start sjmbol, we stated P by saying that u; is in th(; language of the grammar. Typically, we prove the if part by induction on the length of uj. If there are к variables, then the inductive statement to be proved has к parts, which must be proved as a mutual induction.

We nmst also prove the only-if part, that if .4 w, tlien w satisfies the informal statement about the strings derived from variable A. Again, in our example, since we had to deal only with tlie start symbol P, we assumed that w was in the language of Gpat as an equivalent to P w. The proof of this part is typically by induction on the number of steps in the derivation. If the grammar has productions that allow two or more variables to appear in derived strings, then we shall have to break a derivation of n steps into several parts, one derivation from each of the variables. These derivations may have fewer than n steps, so we have to perform an induction assuming the statement for all values n or less, as discussed in Section 1.4.2.

E => E*E=> I*E a*E

Im fm tm

Additionally, the derivation

£=> E*E Е*{Е) E*{E-\-E)

rm rm rm

shows that E * {E + E) Ы & right-sentential form. □ 5.1.7 Exercises for Section 5.1

Exercise 5.1.1: Design context-free grammars for the following languages:

* a) The set {0 1 n > 1}, that is, the set of all strings of one or more Os followed by an equal number of Is.

b) The set {огИс \ i j oi: j к}, that is, the set of strings of as followed by &s followed by cs, such that there are either a diflferent number of as and bs or a different number of bs and cs, or both.



! V.) The .set of all strings of as and 6s that are not of the form ivw. that is, not equal to any string repeated.

!! d) The set of all strings with twice as many Os as 1 s.

Exercise 5.1.2: The following grammai- generates the language of regular expression 0*1(0 + 1)*:

S -> A\B

A 0Л I e

В OB I IS I e

Give leftmost and rightmost derivations of the following strings:

* a) 00101.

b) 1001.

c) 00011.

! Exercise 5.1.3; Show that every regular language is a context-free language. Hint: Construct a CFG by in<hiction on the number of operators in the regular expression.

! Exercise 5.1.4: A CFG is said to be right-linear if each production body has at most one variable, and that rariable is at the right end. That is, all productions of a right-linear grammar are of the form A wB ov A w, where A and В are variables and w some string of zero or more terminals.

a) Show that every right-linear grammar generates a regular language. Hint: Construct an e-NFA that simulates leftmost derivations, using its state to represent the lone variable in the current left-sentential form.

b) Show that every regidar language has a right-linear grammar. Hint: Start with a DFA and let the variables of the grammar repri-sent states.

*! Exercise 5.1.5: Let T {0,1, (,),-H, *, 0,fi}. W may think of T as the set of symbols used by regular expressions over alphabet {0,1}; the only difference is that we use с for symbol e, to avoid potential confusion in what follows. Your task is to design a CFG with set of terminals T that generates exactly the regular expressions with alphabet {0,1}.

Exercise 5Л.6: We defined the relation with a basis a a and an induction that says a => /3 and 3 7 imply a 7. There are several other ways to define => that also have the effect of saying that => is zero or more steps. Prove that the following are true:

a) Q 0 i£ and only if there is a sequence of one or more strings

7ь72, ,7n

such that ft = 7i, ,3 = 7 , and for г = 1,2,..., n - 1 we have 7j 7j+i



b) If a = /i, and 7, then a 7. Hint: use induction on the number of steps in the derivation 13 y.

I Exercise 5.1.7; Consider the CFG G defined by productions;

SaS\Sb\a\b

a) Prove by induction on the string length that no string in L{G) has ba as a substring.

b) Describe L{G) informally. Justify your answer using part (a). !1 Exercise 5.1.8: Consider the CFG G defined by productions:

5 uSbS I bSaS I e Prove that L{G) is the set of all strings with an equal rmmber of as and bs.

5.2 Parse Trees

There is a tree representation for derivations that has proved extremely useful. This tree shows us clearly how the symbols of a terminal string are grouped into substrings, each of wiiicli belongs to the language of one of the variables of the grammar. But perhaps more importantly, the tree, known as a parse tree when used in a compiler, is the data structure of choice to represent the source program. In a compiler, the tree structure of the source program facilitates the translation of the source program into executable code by allowing natural, recursive fmictions to perform this translation process.

In this section, we introduce the parse tree and show that the existence of parse trees is tied closely to the existence of derivations and recursive inferences. We shall later study the matter of ambiguity in grammars and languages, which is an important application of parse trees. Certain grammars allow a terminal string to have more than one parse tree. That situation makes the grammar unsuitable for a programming language, since the compiler could not tell the structure of certain source programs, and therefore could not with certainty deduce what the proper executable code for the program was.

5.2.1 Constructing Parse Trees

Let us fix on a grammar G - {K,T, P, S). The parse trees for G are trees with the following conditions;

1. Each interior node is labeled by a variable in V.

2. Each leaf is labeled by either a variable, a terminal, or e. However, if the leaf is labeled e, then it must be the only child of its parent.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 [ 59 ] 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175