Строительный блокнот Automata methods and madness if (Condition) -( if (Condition) Statement; else Statement; if (Condition) Statement; else Statement; Figure 5.10: An if-else structure; the two elses match their previous ifs, and the first if is unmatched 5.3.2 The YACC Parser-Generator The generation of a parser (function tliat creates parse trees from source programs) has been institutionalized in the YACC command that appears in all UNIX systems. The input to YACC is a CFG, in a notation that differs only in details from the one we have used here. Associated with each production is an action, which is a fragment of С code that is performed whenever a node of the parse tree that (with its children) corresponds to this production is created. Typically, the action is code to construct that node, although in some YACC A simple test (whose correctness we leave as an exercise), for whether a sequence of is and es is generated by the grammar is to consider each c, in turn from the left. Look for the first г to the left of the e being considered. If there is none, the string fails the test and is not in the language. If there is such an (, delete this i and the e being considered. Then, if there are no more rs the string passes the test and is in the language. If there are more es, proceed to consider the next one. Example 5.20: Consider the string iee. The first e is matched with the г to its left. They are removed, leaving the string e. Since there are more es we consider the next. However, there is no г to its left, so the test fails; iee is not in the language. Note that this conclusion is valid, since you cannot have more elses than if s in a С program. For another example, consider iieie. Matching the first e with the i to its left leaves He. Matching the remaining e with the i to its left leaves i. Now there are no more es, so the test succeeds. This conclusion also makes sense, because the sequence iieie corresponds to a С program whose structure is like that of Fig. 5.10. In fact, the matching algorithm also tells us (and the С compiler) which if matches any given else. That knowledge is essential if tin; compiler is to create the control-flow logic intended by the programmer. □ Figure 5.11; An example of a grammar in the YACC notation Notice the following correspondences between the YACC notation for grammars and ours: Tho colon is used as the production symbol, our All the productions with a given head are grouped together, and their bodies are separated by the vertical bar. We also allow this convention, as an option. The list of bodies for a given head ends with a semicolon. We have not used a terminating symbol. Terminals are quoted with single quotes. Several characters can appear within a single pair of quotes. Although we have not shown it, YACC allows its user to define symbolic terminals as well. The оссштепсе of thcse terminals in the source program arc detected by the lexical analyzer and signaled, through the return-value of the lexical analyzer, to the parser. Unquoted strings of letters and digits aie variable names. We have taken advantage of this capability to give our two variables more descriptive names - Exp and Id - although E and I could have been used. □ applications tlie tree is not actually constructed, and the action does something else, such as emit a piece of object code. Example 5.21: In Fig. 5.11 is a sample of a CFG in the YACC notation. The grammar is the same as that of Fig. 5.2. We have elided the actions, just showing their (required) curly braces and their position in the YACC input. Exp : Id -t...} I Exp Exp <...} I Exp * Exp -(...} I Exp <...} Id : a {...} I *b {...} I Id a -t...} I Id b {...} I Id 0 {...} I Id 1 {...} 196 CHAPTER 5. CONTEXT-FREE GRAMMARS AND LANGUAGES 5.3.3 Markup Languages We shall next consider a family uf languages called markup languages. The strings in these languages are documents with certain marks (called tags) in them. Tags tell us somctlung about the semantics of various strings within the document. The markup language with which you are probably most famihai- is HTML (HyperText Markup Language). This language has two major functions: creating links between documents and describing the format ( look ) of the a document. Wfe shall offer only a simplified view of the structure of HTML, but the following examples should suggest both its structure and how a CFG could be used both to describe the legal HTML documents and to guide the processing (i.e., the display on a monitor or printer) of a document. Example 5.22; Figure 5.12(a) shows a piece of text, comprising a fist of items, and Fig, 5.12(b) shows its expression in HTML. Notice from Fig. 5.12(b) that HTML consists of ordinary text interspersed with tags. Matching tags are of the form <,r> and </x> for some string x. For instance, we see the matching tags <EH> and </EH>, which indicate that the text between them should be em])liasized, that is, put in italics or another appropriate font. We also see the matching tags <0L> and </0L>, indicating an ordered list, i.e., an enumeration of list items. The things I hate: 1. Moldy bread. 2. People who drive too slow in the fast lane. (a) The text as viewed <P>The things I <EM>hate</EM>: <0L> <LI>Moldy bread. <LI>People who drive too slow in the fast lane. </QL> (b) The HTML source Figure 5.12: An HTML document and its printed version We also see two examples of lunnatched tags: <P> and <LI>, which uitroduce paragraphs and list items, respectively. HTML aUows, indeed encourages, that Sometimes the uitrotluciiig lag <x> iias more mlormationm W \,Ъап JUSC Vfit *tia, me !t: 1от Uie tag. HowevRr. w(? shfill not consider that possibility in examples.
|