Строительный блокнот Automata methods and madness 4.3.2 Testing Emptiness of Regular Languages At first glance the answer to the ciuestion is regular language; L empty? is obvious: 0 is empty, and all other regular languages are not. However, as we discussed at the beginning of Section 4.3, the problem is not stated with an explicit list of the strings in L. Rather, we are given some representation for L and need to decide whether that repr{;seiitation denotes the language 0, Parsing methods capable of doitig this task in 0{n) time are discussed in A. V. Alio, R. Sethi, and .1. D. Ulhrian, Compiler Design: Principlafj, Tooh, and Techniquf.f!, .Addi.soti-Wwlev, Ш6. Automaton-to-Regular-Expression Conversion If we examine the construction of Section 3,2.1 we observe tliat at each of ti rounds (where n is the number of states of the DFA) we can quadruple the size of the regular expressions constructed, since each is built from four expressions of the previous round. Thus, simply writing down the expressions can take time 0(n4 ). The improved construction of Section 3.2.2 reduces the constant factor, but does not affect the worst-case exponentiality of the problem. The same construction works in the same running time if the input is an NFA, or even an e-NFA, although we did not prove those facts. It is important to use those constructions for NFAs, however. If we first convert an NFA to a DFA and then convert the DFA to a regular expression, it could take time 0(n4 ), which is doubly exponential. Regular-Expression-to-Automaton Conversion Conversion of a regular {;xprc!Ssion to an e-NFA takes linear time, Wc need to parse the expression efficiently, using a technique that takes only 0{n) time on a regular expression of length n, The result is an expression tree with one node for each symbol of the regular expression (although parentheses do not have to appear in the tree; they just guide the parsing of the expression). Once we have an expression tree for the regular expression, we can work up the tree, building the e-NFA for each node. The construction rules for the conversion of a regular expression that we saw in Section 3.2.3 never add move than two states and four arcs for any node of the expression tree. Thus, the numbers of states and arcs of the resulting e-NFA are both 0{n). Moreover, the work at each node of the parse tree in creating these elements is constant, provided the function that processes each subtree returns pointers to the start and accepting states of its automaton. We conclude that construction of an e-NFA from a regular expression takes time that is hnear in the size of the expression. We can eliminate c-transitions from an n-state e-NFA, to make an ordinary NFA, in O(n) time, without increasing the number of states. However, proceeding to a DFA can take exponential time. If our representation is any kind of finite automaton, the emptiness question is whether there is any path whatsoever from the start state to some accepting state. If so, the language is nonempty, while if the accepting states are all separated from the start state, then the language is empty. Deciding whether we can reach an accepting state from the start state is a simple instance of graph-reachability, similar in spirit to the calculation of the e-closure that we discussed in Section 2.5.3. The algorithm can be summarized by this recursive process. BASIS: The start state is surely reachable from the start state, INDUCTION: If State g is reachable from the start state, and there is an arc from q top dth any label (an input symbol, or e if the automaton is an e-NFA), then p is reachable. In that manner we can compute the set of reachable states. H any accepting state is among them, w4 answer no (the language of the automaton is not empty), and otherwi.4e wc answer yes, Note that the reachability calculation takes no more time that 0{ti) if the automaton has n states, and in fact it is no worse than proportional to the number of arcs in the automatons transition cHagram, which could be less than n- and cannot be more than 0(n-). If wc are given a regular expression representing the language L, rather than an automaton, we could convert the expression to an e-NFA and proceed as above. Since the automaton that results from a regular expression of length n has at most 0{n) states and transitions, the algorithm takes 0{n) time. However, we can also inspect the regular expression to decide whether it is empty. Notice first that if the expression has no occurrence of 0, then its language is surely not empty. If there are 0s, the language may or may not be empty. The following recursive rules tell whether a regular expression denotes the empty language, BASIS: 0 denotes the empty language; e and a for any input symbol a do not, INDUCTION: Suppose is a regular expression. There are four cases to consider, corresponding to the ways that R could be constructed, 1. R = Ri+ R-i. Then L[R) is empty if and only if both L{Ri) and LiR) are empty. 2. R R-iR2. Then L{R) is empty i£ and only if either L{Ri) or Ь(Лз) is empty. 3. Л - Д. Then L{R) is not empty; it always includes at least e. A. R - {R\). Then L{R) is empty if and only if L{R\) is empty, since they are the same language. 4.3.3 Testing Membership in a Regular Language The next question of importance is, given a string w and a regular language L, is vj in L. While w is represented explicitly, L is represented by an automaton or regular expression. If L is represented by a DFA, the algorithm is simple. Simulate the DFA processing the string of input symbols w, beginning in the start state. If the DFA ends in an accepting state, the answer is yes ; otherwise the answer is no. This algorithm is extremely fast. If w = n, and the DFA is represented by a suitable data structure, such as a two-dimensional array that is the transition table, then each transition requires constant time, and the entire test takes 0{n) time. If L has any other representation besides a DFA, we could convert to a DFA and run the test аЬол-е, That approach could take time that is exponential in the size of the riipresentation, although it is linear in \w\. However, if the representation is an NFA or e.NFA, it is simpler and more efficient to simulate the NFA directly. That is, wc; proce.4s symbols of w one at a time, maintaining the set of staters the NFA can be in after following any path labeled with that prefix of -w. The idea was presented in Fig. 2.10. If il is of length n, and the NF.A. has s states, then the nmning time of this algorithm is 0(r5). Each input symbol can be processed by taking the previous set of states, which numbers at most s states, and looking at the successors of each of these states. W e take the miiou of at most s sets of at most states each, which requires О(д) time. I£ the NFA has e-transitions, then we must compute the e-closure before starting the sinmlatiori. Then the processing of each input symbol a has two stages, each of which reqinres О(й) time. First, wc take the previous set of states and find their successors on input symbol a. Next, we compute the e-closure of this set of states. The initial set of states for the simulation is the e-closure of the initial state of the NFA. Lastly, if the representation of L is a regular expression of size s, we can convert to an e-NFA with at most 2s states, in 0(s) time. We then perform the simulation above, taking 0{ns) time on an input ш of length тг. 4.3.4 Exercises for Section 4.3 * Exercise 4.3.1: Give an algorithm to tell whether a regular language L is infinite. Hint: Use the pumping lemma to show that if the language contains any string whose length is аЬоле a certain lower limit, then the language must be infinite. Exercise 4.3.2 ; Give an algorithm to tell whether a regular language L contains at least 100 strings. Exercise 4.3.3: Suppose i is a regular language with alphabet S. Give an algorithm to teU whether L = S*, i.e., all strings over its alphabet.
|