Строительный блокнот Automata methods and madness (L*)* = L*. This law says that closing an expression that is already closed does not change the language. The language of (L*) is all strings created by concatenating strings in the language of L*. But those strings are themselves composed of strings from L. Thus, the string in (L*)* is also a concatenation of strings from L and is therefore in the language of i*. 0* = 6. The closure of 0 contains only the string e, as we discussed in Example 3.6. It is easy to check that the only string that can be formed by concatenating any number of copies of the empty string is the empty string itself. = LL = L*L. Recall that L+ is deiined to he L + LL A LLL + . Also, L* = e + L + LL + LLL + Thus, LL* = Lf. + LL + LLL + LLLL + When we remember that Le = L, we see that the infinite expansions for LL* and for L+ are the same. That proves L = LL . The proof that L = L*L is similar.* L =: L + e. The proof is easy, since the expansion of L~ includes every term in the expansion of L* except e. Note that if the language L contains the string 6, then the additional H-e term is not needed; that is, = L* in this special case. L = € + L. This rule is really the definition of the ? operator. 3.4.6 Discovering Laws for Regular Expressions Each of the law-s above was proved, formally or informally. However, there is an infinite variety of laws about regular expressions that might be proposed. Is there a general methodology that will make our proofs of the correct laws Notice that, ай a consequence, any language L commutes (under concatenation) witli its own closure; LL = L L. That rule does not contradict the fact iliat, in general, concatena tion is not commutative. L + L = L. This law, the idempotence law for union, states that if we take the union of two identical expressions, we can replace them by one copy of the expression. 3.4.5 Laws Involving Closures There arc a number of laws involving the closure operators and its UNIX-style variants and ?, We shall list them here, and give some explanation for why they are true. For simplicity, we shall identify the regular expressions and their iaiigiiagcs, and avoid saying ttin language of in front, of pvery regular expression. easy? It turns out that the truth of a law reduces to a question of the equality of two specific languages. Interestingly, the technique is closely tied to the regular-expression operators, and cannot be extended to expressions involving some other operators, such as intersection. To see how this test works, let us consider a proposed law, such as (L + M)* = {L*M*y This law says that if we have any two languages L and M, and we close their union, we get the same language as if we take the language L*AI, that is, all strings composed of zero or more choices from L followed by zero or more choices from M, and close that language. To prove this law, suppose first that string w is in the language of (L+M) . Then we can write w = wjw for some k. where each Wi is in either L or M. It follows that each ш,; is in the language of LM . To see why, if w is in L, pick one string, from L; this string is also in L*. Pick no strings from M; that is, pick e from M . If Wi is in jW, the argument is similar. Once every VJi is seen to be in LM , it follows that w is in the closure of this language. To complete the proof, we also have to prove the converse: that strings in (LM )* are also in {L + Л/)*. We omit this part of the proof, since our objective is not to prove the law, but to notice the following important property of regular expressions. Any regular expression with variables can be thought of as a concrete regular expression, one that has no variables, by thinking of each variable as if it were a distinct symbol. For example, the expression (Lh-M)* can have variables L and M replaced by symbols a and fc, respectivelv, giving us the regular expression (a-bb). The language of the concrete expression guides us regarding the form of strings in any langvmge that is formed from the original expression when we replace the variabl(!s by languages. Thus, in our analysis of (t -I- A/) , we observed that any string w composed of a sequence of choices from either L or M, would be in the language of [L Л- M). We can arrive at that conclusion by looking at the language of the concrete expression, L((a + b)*), which is evidently the set of all strings of os and bs. We could substitute any struig in L for any occurrence of й in one of those strings, and we could substitute any string in M for any occurrence of 6, with possibly different choices of strings for different occurrences of a or h. Those substitutions, applied to all the strings in (a -t- b), gives us all strings formed by concatenating strings from L and/or Af, in any order. The above statement may seem obvious, but as is pointed out in the box on Extensions of the Test Beyond Regular Expressions May Fail, it is not even true when some other operators are added to the three regular-expression Operators. We prove the general principle for regular expressions in the next theorem. Theorem 3.13 : Let E be a regular expression with variables Li,ij2,..., L ,. Form concrete regular expression С by replacing each occurrence of L,- by the symbol ui, for г = 1,2,...,т. Then for any languages Lj, L,..., , every string Ю in L{E) can be written w - wiW2- --Wk, where each Wi is in one of the languages, say Lj., and the string ujuj --(h is in the language L{C). Less formally, we can construct L{E) by starting with each string in L{C), say ajjAjj -ajt 1 and substituting for each of the a/s any string from the corresponding language Lj.. PROOF: The proof is a structural induction on the expression E. BASIS: The basis cases are where E is e, 0. or a variable L. In the first two cases, there is nothing to prove, since the concrete expression С is the same as E. If is a variable L, then L(E) - L. The concrete expression С is just a. where a is the symbol corresponding to L. Thus, L{C) = {a}. If we substitute any string in L for the symbol a in this one string, we get the language L, which is also L{E). INDUCTION: There are three ca.4es, depending on the final operator of E. First, suppose that E = F + G\ i,e a union is the final operator. Let С and D be the concrete expressions formed from F and G, resjiectively, by substituting concrete symbols for the language-variables in these expressions. Note that the same symbol must be substituted for all occurrences of the same variable, in both F and G. Then the concrete expression that we get from EisC + D, and L{C + D) = L{C) + L(D). Suppose that ш is a string in L{E), w-hen the language variables of E are replaced by specific languages. Then w is in either L(F) or L{G). By the inductive hypothesis, w is obtained by starting with a concrete string in L(C) or L{D), respectively, and substituting for the symbols strings in the corresponding languages. Thus, in either case, the string w can be constructed by starting with a concrete string in L(C + D), and malting the same substitutions of strings for symbols. We must also consider the cases where E is FG or F*. How-елег, the arguments are similar to the union case above, and we leave them for you to complete. □ 3.4.7 The Test for a Regular-Expression Algebraic Law Now, we can state and prove the test for whether or not a law of regular expressions is true. The test for whether E = F is true, where E and F are two regular expressions with the same set of variables, is: 1. Convert E and F to concrete regular expressions С and D, respectively, by replacing each variable by a concrete symbol, 2. Test whether L(C) = LiD). If so, then £; = F is a true law, and if not, then the law is false. Note that we shall not see the test for whether two
|