Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 [ 39 ] 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

(L*)* = L*. This law says that closing an expression that is already closed does not change the language. The language of (L*) is all strings created by concatenating strings in the language of L*. But those strings are themselves composed of strings from L. Thus, the string in (L*)* is also a concatenation of strings from L and is therefore in the language of i*.

0* = 6. The closure of 0 contains only the string e, as we discussed in Example 3.6.

It is easy to check that the only string that can be formed by concatenating any number of copies of the empty string is the empty string itself.

= LL = L*L. Recall that L+ is deiined to he L + LL A LLL + . Also, L* = e + L + LL + LLL + Thus,

LL* = Lf. + LL + LLL + LLLL +

When we remember that Le = L, we see that the infinite expansions for LL* and for L+ are the same. That proves L = LL . The proof that L = L*L is similar.*

L =: L + e. The proof is easy, since the expansion of L~ includes every term in the expansion of L* except e. Note that if the language L contains the string 6, then the additional H-e term is not needed; that is, = L* in this special case.

L = € + L. This rule is really the definition of the ? operator.

3.4.6 Discovering Laws for Regular Expressions

Each of the law-s above was proved, formally or informally. However, there is an infinite variety of laws about regular expressions that might be proposed. Is there a general methodology that will make our proofs of the correct laws

Notice that, ай a consequence, any language L commutes (under concatenation) witli its own closure; LL = L L. That rule does not contradict the fact iliat, in general, concatena tion is not commutative.

L + L = L. This law, the idempotence law for union, states that if we take the union of two identical expressions, we can replace them by one copy of the expression.

3.4.5 Laws Involving Closures

There arc a number of laws involving the closure operators and its UNIX-style variants and ?, We shall list them here, and give some explanation for why they are true.



For simplicity, we shall identify the regular expressions and their iaiigiiagcs, and avoid saying ttin language of in front, of pvery regular expression.

easy? It turns out that the truth of a law reduces to a question of the equality of two specific languages. Interestingly, the technique is closely tied to the regular-expression operators, and cannot be extended to expressions involving some other operators, such as intersection.

To see how this test works, let us consider a proposed law, such as

(L + M)* = {L*M*y

This law says that if we have any two languages L and M, and we close their union, we get the same language as if we take the language L*AI, that is, all strings composed of zero or more choices from L followed by zero or more choices from M, and close that language.

To prove this law, suppose first that string w is in the language of (L+M) . Then we can write w = wjw for some k. where each Wi is in either L or M. It follows that each ш,; is in the language of LM . To see why, if w is in L, pick one string, from L; this string is also in L*. Pick no strings from M; that is, pick e from M . If Wi is in jW, the argument is similar. Once every VJi is seen to be in LM , it follows that w is in the closure of this language.

To complete the proof, we also have to prove the converse: that strings in (LM )* are also in {L + Л/)*. We omit this part of the proof, since our objective is not to prove the law, but to notice the following important property of regular expressions.

Any regular expression with variables can be thought of as a concrete regular expression, one that has no variables, by thinking of each variable as if it were a distinct symbol. For example, the expression (Lh-M)* can have variables L and M replaced by symbols a and fc, respectivelv, giving us the regular expression (a-bb).

The language of the concrete expression guides us regarding the form of strings in any langvmge that is formed from the original expression when we replace the variabl(!s by languages. Thus, in our analysis of (t -I- A/) , we observed that any string w composed of a sequence of choices from either L or M, would be in the language of [L Л- M). We can arrive at that conclusion by looking at the language of the concrete expression, L((a + b)*), which is evidently the set of all strings of os and bs. We could substitute any struig in L for any occurrence of й in one of those strings, and we could substitute any string in M for any occurrence of 6, with possibly different choices of strings for different occurrences of a or h. Those substitutions, applied to all the strings in (a -t- b), gives us all strings formed by concatenating strings from L and/or Af, in any order.

The above statement may seem obvious, but as is pointed out in the box on Extensions of the Test Beyond Regular Expressions May Fail, it is not even true when some other operators are added to the three regular-expression Operators. We prove the general principle for regular expressions in the next theorem.



Theorem 3.13 : Let E be a regular expression with variables Li,ij2,..., L ,. Form concrete regular expression С by replacing each occurrence of L,- by the symbol ui, for г = 1,2,...,т. Then for any languages Lj, L,..., , every string Ю in L{E) can be written w - wiW2- --Wk, where each Wi is in one of the languages, say Lj., and the string ujuj --(h is in the language L{C). Less formally, we can construct L{E) by starting with each string in L{C), say ajjAjj -ajt 1 and substituting for each of the a/s any string from the corresponding language Lj..

PROOF: The proof is a structural induction on the expression E.

BASIS: The basis cases are where E is e, 0. or a variable L. In the first two cases, there is nothing to prove, since the concrete expression С is the same as E. If is a variable L, then L(E) - L. The concrete expression С is just a. where a is the symbol corresponding to L. Thus, L{C) = {a}. If we substitute any string in L for the symbol a in this one string, we get the language L, which is also L{E).

INDUCTION: There are three ca.4es, depending on the final operator of E. First, suppose that E = F + G\ i,e a union is the final operator. Let С and D be the concrete expressions formed from F and G, resjiectively, by substituting concrete symbols for the language-variables in these expressions. Note that the same symbol must be substituted for all occurrences of the same variable, in both F and G. Then the concrete expression that we get from EisC + D, and L{C + D) = L{C) + L(D).

Suppose that ш is a string in L{E), w-hen the language variables of E are replaced by specific languages. Then w is in either L(F) or L{G). By the inductive hypothesis, w is obtained by starting with a concrete string in L(C) or L{D), respectively, and substituting for the symbols strings in the corresponding languages. Thus, in either case, the string w can be constructed by starting with a concrete string in L(C + D), and malting the same substitutions of strings for symbols.

We must also consider the cases where E is FG or F*. How-елег, the arguments are similar to the union case above, and we leave them for you to complete. □

3.4.7 The Test for a Regular-Expression Algebraic Law

Now, we can state and prove the test for whether or not a law of regular expressions is true. The test for whether E = F is true, where E and F are two regular expressions with the same set of variables, is:

1. Convert E and F to concrete regular expressions С and D, respectively, by replacing each variable by a concrete symbol,

2. Test whether L(C) = LiD). If so, then £; = F is a true law, and if not, then the law is false. Note that we shall not see the test for whether two



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 [ 39 ] 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175