Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 [ 29 ] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

Expressions and Their Languages

Strictly speaiiing, a regular expression E is just an expression, not a language. We should use L{E) when we want to refer to the language that E denotes. However, it is common usage to refer to say when we really mean We shall use this convention as long as it is clear we are

talking about a language and not about a regular expression.

Note that the dot can optionally be used to denote the concatenation operator, either as an operation on languages or as the operator in a regular expression. For instance, 0.1 is a regular expression meaning the same as 01 and representing the language {01}. However, we shcdl avoid the dot as concatenation in regular expressions.

3. If £ is a regular expression, then E* is a regular expression, denoting the closure of L{E). That is, L{E) - {L{E))\

4. If £ is a regular expression, then (E), a parenthesized E, is also a regular expression, denoting the same language as E. Formally; L[{E)) = L{E).

Example 3.2: Let us write a regular expression for the set of strings that consist of alternating Os and Is. First, let us develop a regular expression for the language consisting of the single string 01. We can then use the star operator to get an expression for all strings of the form 0101 - 01.

The basis rule for regular expressions tells us that 0 and 1 are expressions denoting the languages {0} and {1}, respectively. If we concatenate the two expressions, we get a regular expression for the language {01}; this expression is 01. As a general rule, if we want a regular expression for the language consisting of only the string w, we use w itself as the regular expression. Note; that in the regular expression, the symbols of w will normally be written in boldface, but the change of font is only to help you distinguish expressions from strings and should not be taken as significant.

Now, to get all strings consisting of zero or more occurrences of 01, we use the regular expression (01)*. Note that we first put parentheses around 01, to avoid confusing with the expression 01*, whose language is all strings consisting of a 0 and any number of Ts, The reason for this interpretation is explained in Section 3,1,3, but briefly, star takes precedence over dot, and therefore the argument of the star is selected before p(nforining any concatenations.

However, L((01)*) is not exactly the language that we want. It hicludes only those strings of alternating Os and Is that begin with 0 and end with 1. We also need to consider the possibility that there is a 1 at the beginning and/or

In fact UNIX regular expressions use the dot. for an entirely different pnipose: representing any .SCII cliaracter.



a 0 at the end. One approach is to construct three more regular expressions that handle the other three possibilities. That is, (10) represents those alternating strings that begin with 1 and end with 0, while 0(10)* can be used for strings that both begin and end with 0 and 1(01)* serves for strings that begin and end dth 1, The entire regular expression is

(01)*+ (10)*+0(10)*+ 1(01)

Notice that we use the + operator to take the union of the four languages that together give us all the strings with alternating Os and Is.

However, there is another approach that yields a regular expression that looks rather different and is also somewhat more succinct. Start again with the expression (01)*. We can add an optional 1 at the beginning if we concatenate on the left with the expression e + 1. Likewise, we add an optional 0 at the end with the expression e + 0. For instance, using the definition of the + operator:

L{c + 1) - L{f.) и L(l) = {e} и {1} = {e, 1}

If we concatenate this language with any other language L, the e choice gives us all the strings in L, while the 1 choice gives us Iw for every string w in L. Thus, another expression for the set of strings that alternate Os and Is is:

(e + l)(01)*(e + 0)

Note that we need parentheses around each of the added expressions, to make sure the operators group properly. □

3.1.3 Precedence of Regular-Expression Operators

Like other algebras, the regular-expression operators have an assumed order of precedence, which means that operators are associated with their operands in a particular order. We are familiar with the notion of precedence from ordinary arithmetic expressions. For instance, we know that xy-\-z groups the product xy before the sum, so it is equi\lent to the parenthesized expression {xy) + z and not to the expression x(y + z). Similarly, we group two of the same operators from the left in arithmetic, so x-y - z is equivalent to (x -y) - z, and not to x-{y-z). For regular expressions, the following is the order of precedence for the operators:

1. The star operator is of highest precedence. That is, it applies only to the smallest sequence of symbols to its left that is a well-formed regular expression.

2. Next in precedence comes the concatenation or dot operator. After grouping all stars to their operands, we group concatenation operators to their operands. That is, all expressions that are juxtaposed (adjacent, with no intervening operator) are grouped together. Since concatenation



is an associative operator it does not matter in what order we group consecutive concatenations, although if there is a choice to be made, you should group them from the left. For instance, 012 is grouped (01)2.

3. Finally, all unions (+ operators) are grouped with their operands. Since union is also associative, it again matters little in which order consecutive unions are grouped, but we shall assume grouping from the left.

Of course, sometimes we do not want the grouping in a regular expression to be as required by the precedence of the operators. If so, we are free to use parentheses to group operands exactly as we choose. In addition, there is never anything wrong with putting parentheses around operands that you want to group, even if the desired grouping is implied by the rules of precedence.

Example 3.3: The expression 01* + 1 is grouped (0(1*)) + 1. The star operator is grouped first. Since the symbol 1 immediately to its left is a legal regular expression, that alone is the operand of the star. Next, we group the concatenation between 0 and (1*), giving us the expression (0(1*)). Finally, the union operator connects the latter expression and the expression to its right, whicli is 1.

Notice that the language of the given expression, grouped according to the precedence rules, is the string 1 plus all strings consisting of a 0 followed by any number of Is (including none). Had we chosen to group the dot before the star, we could have used parentheses, as (01)* + 1. The language of this expression is the string 1 and all strings that repeat 01, zero or more times. Had we wished to group the uinon first, we could have added parentheses around the union to make the expression 0(1* + 1). That expressions language is the set of strings that begin with 0 and have any tmmber of Is following. □

3.1.4 Exercises for Section 3.1

Exercise 3.1.1: Write regular expressions for the following languages:

* a) The set of strings over alphabet {a, b, (:} containuig at least one a and at

least one b.

b) The set of strings of Os and Is whose tenth symbol from the right end is 1.

c) The set of strings of Os and Is with at most one pair of consecutive Is. ! Exercise 3.1.2: Write regular expressions for the following languages:

* a) The set of all strings of Os and Ts such that every pair of adjacent Os

appears before any pair of adjacent Vs.

b) The set of strings of Os and Is whose number of Os is divisible by five.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 [ 29 ] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175