Строительный блокнот Automata methods and madness Notation for CFG Derivations There are a number of conventions in common use that help us remember the role of the symbols we use when discussing CPGs. Here are the conventions we shall use: 1. Lower-case letters near the beginning of the alphabet, o, b, and so on, are terminal symbols. We shall also assmne that digits and other characters such as + or parentheses are terminals. 2. Upper-ease letters near the beginning of the ali)habet, A, D, and so on, are variables. 3. Lower-case letters near the end of the alphabet, such as lu or z, arc strings of toruunals. This con\ention reminds us that the terminals are analogous to the input symbols of an automaton. 4. Upper-case letters neai- the end of the alphabet, such as Л or i, are either terminals or variables. 5. Lower-case Greek letters, such as q and .3, are strings consisting of terminals and/or variables. There is no special notation for strings that consist of variables only, since this concept plays no important role. However, a string named a or another Greek letter might happen to have only variables. the symbols and to indicate one or many rightmost derivation steps, rm rni respectively. Again, the name of the grammar may appear below these symbols if it is not clcai- which grammar is being used. Example 5.6 : The derivation of Example 5.5 was actually a leftmost derivation. Thus, we can describe the same derivaticui by: E E*E I* E a*E (m hn ill! Im a*{E) а*{Е-\-Е) a*{I + E) a*{a + E) Im im i im a*(a-\-1) a* [a+ 10) a* {a + /00) = a * (я + 600) hn Im hn Wc can also summarize th<; leftmost derivation by saying E a* (a-(-600), or express several steps of the derivation by expressions such as E * £ a* (E). There is a rightmost derivation that uses the same replacements for each variable, although it makes the replacements in different order. This rightmost derivation is: E*E E*{E) E*(E + E)=> rm rm rm rm E*[E + I) E*(E + IQ) E*(E + m) E*{E + b()0) rm rm rm rm E * (/ + 600) E*ia + 600) 7 * (a + 600) a*{a + 600) rm Tin rm This derivation allows us to conclude E U a* {a + 600), □ rm Any derivation has an equivalent leftmost and an equivalent rightmost derivation. That is, if w is a terminal string, and A a variable, then A = w if and only if ,4 w, and A w if and only if Л w. We shall also prove these 1ш rm claims ui Section 5,2, 5.1.5 The Language of a Grammar If G(y, T, P, S) is a CFG, the language of G, denoted L{G), is the set of termuial strings that have derivations from the start symbol. That is, L(G) = {w in T*\S w) If a language; L is the language of some context-free grammar, then L is said to be a context-frte language, or CFL. For instance, we asserted that the grammai-of Fig, 5.1 defined the language of palindromes over alphabet {0,1}. Thus, the set of palindromes is a context-free language. We can prove that statement, as follows. Theorem 5.7: L{Gpai), where Gpat is the grammar of Example 5.1, is the set of palindromes over {U, 1}. PROOF: Wc shall prove that a string w; in (0,1}* is in L{Gpai) if and only if it is a palindrome; i.e., w = ги. (If) Suppose !/; is a palindrome. We show by induction on that w is in L{Gpal}- BASIS: We use lengths 0 and 1 as the basis. If \v)\ = 0 or \ir\ - 1, then w is t, 0, or 1. Since there are productions P -> f, Г 0, and P -> 1, we conclude that P w in any of these basis cases. INDUCTION: Suppose > 2. Since w = w must begin and end with the same symbol That is, го = 0x0 or ги - 1:1. Moreover, x must be a palindrome; that is, X ~ x. Note that we; need the fact that \w\ > 2 to infer that there are two distinct Os or Гя, at either end of w. 5.1.6 Sentential Forms Derirations from the start symbol produce strings that have a special role. We call these s(;ntential forms. That is, if G = {V, T, P, S) is a CFG, then any string Q ill (V и Г)* such that S => a is a sentential form. If 5 a, then a is a left-sentential form, and if 5 => a, then a is a right-sentential form. Note that the language L{G) is those sentential forms that are in T; i.e., they consist solely of terminals. Example 5.8: Consider the grammar for expressions from Fig. 5.2. For example, E* (I + E) isd sentential form, since there is a derivation E=>E*EE*iE)E*{E + E)E*{I + E) However this deriration is neither leftmost nor rightmost, since at the last step, the middle E is replaced. As an example of a loft-sentential form, consider a* E, with the leftmost derivation If v! = Ox\), then we invoke the indnctive hypothesis to claim that P x. Then tliore is a derivation of w from P, namely P => QPO 0x0 = w. If w = 111, the argument is the same, but we use the production P IFl at tlie first step. In either ca.se, we conclude that w is in LiGpai) and complete the proof. (Only-if) Now, we assume that w is in L{Gpai); that is, P w. We must conclude that is a palindrome. The proof is an induction on the number of steps in a derivation of w from P. BASIS: If the derivation is one step, then it must use one of the three productions that do not have P in the body. That is, the d<>rivation is P e, P => 0, or P => 1. Since 6, 0, and 1 are all palindrom<!s, the biisis is proven. INDUCTION: Now, suppose that the derivation takes 7i-\-l steps, where n > 1, and the statement is true for all derirations of n steps. That is, if P i in n steps, then x is a palindrome. Consider an {n H- l)-step derivation of w. which must be of the form POPO 0x0 = iv or P => IPl 1x1 = w, since n + 1 steps is at least two steps, and the producticms P -> OPO and P IPl are the only productions whose use allows additional steps of a derivation. Note that in either case, P x mn steps. By the inductive hypothesis, we know that x is a palindrome; that is, ar = x. But if so, then 0x0 and 1x1 are also palindromes. For instance, (0x0) = 0x*0 = 0x0, We conclude that w is a palindrome, which completes the proof. □
|