Строительный блокнот  Automata methods and madness 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 [ 64 ] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175

if (Condition) -(

if (Condition) Statement; else Statement;

if (Condition) Statement; else Statement;

Figure 5.10: An if-else structure; the two elses match their previous ifs, and the first if is unmatched

5.3.2 The YACC Parser-Generator

The generation of a parser (function tliat creates parse trees from source programs) has been institutionalized in the YACC command that appears in all UNIX systems. The input to YACC is a CFG, in a notation that differs only in details from the one we have used here. Associated with each production is an action, which is a fragment of С code that is performed whenever a node of the parse tree that (with its children) corresponds to this production is created. Typically, the action is code to construct that node, although in some YACC

A simple test (whose correctness we leave as an exercise), for whether a sequence of is and es is generated by the grammar is to consider each c, in turn from the left. Look for the first г to the left of the e being considered. If there is none, the string fails the test and is not in the language. If there is such an (, delete this i and the e being considered. Then, if there are no more rs the string passes the test and is in the language. If there are more es, proceed to consider the next one.

Example 5.20: Consider the string iee. The first e is matched with the г to its left. They are removed, leaving the string e. Since there are more es we consider the next. However, there is no г to its left, so the test fails; iee is not in the language. Note that this conclusion is valid, since you cannot have more elses than if s in a С program.

For another example, consider iieie. Matching the first e with the i to its left leaves He. Matching the remaining e with the i to its left leaves i. Now there are no more es, so the test succeeds. This conclusion also makes sense, because the sequence iieie corresponds to a С program whose structure is like that of Fig. 5.10. In fact, the matching algorithm also tells us (and the С compiler) which if matches any given else. That knowledge is essential if tin; compiler is to create the control-flow logic intended by the programmer. □



Figure 5.11; An example of a grammar in the YACC notation

Notice the following correspondences between the YACC notation for grammars and ours:

Tho colon is used as the production symbol, our

All the productions with a given head are grouped together, and their bodies are separated by the vertical bar. We also allow this convention, as an option.

The list of bodies for a given head ends with a semicolon. We have not used a terminating symbol.

Terminals are quoted with single quotes. Several characters can appear within a single pair of quotes. Although we have not shown it, YACC allows its user to define symbolic terminals as well. The оссштепсе of thcse terminals in the source program arc detected by the lexical analyzer and signaled, through the return-value of the lexical analyzer, to the parser.

Unquoted strings of letters and digits aie variable names. We have taken advantage of this capability to give our two variables more descriptive names - Exp and Id - although E and I could have been used.

applications tlie tree is not actually constructed, and the action does something else, such as emit a piece of object code.

Example 5.21: In Fig. 5.11 is a sample of a CFG in the YACC notation. The grammar is the same as that of Fig. 5.2. We have elided the actions, just showing their (required) curly braces and their position in the YACC input.

Exp : Id -t...}

I Exp Exp <...}

I Exp * Exp -(...}

I Exp <...}

Id : a {...}

I *b {...}

I Id a -t...}

I Id b {...}

I Id 0 {...}

I Id 1 {...}



196 CHAPTER 5. CONTEXT-FREE GRAMMARS AND LANGUAGES 5.3.3 Markup Languages

We shall next consider a family uf languages called markup languages. The strings in these languages are documents with certain marks (called tags) in them. Tags tell us somctlung about the semantics of various strings within the document.

The markup language with which you are probably most famihai- is HTML (HyperText Markup Language). This language has two major functions: creating links between documents and describing the format ( look ) of the a document. Wfe shall offer only a simplified view of the structure of HTML, but the following examples should suggest both its structure and how a CFG could be used both to describe the legal HTML documents and to guide the processing (i.e., the display on a monitor or printer) of a document.

Example 5.22; Figure 5.12(a) shows a piece of text, comprising a fist of items, and Fig, 5.12(b) shows its expression in HTML. Notice from Fig. 5.12(b) that HTML consists of ordinary text interspersed with tags. Matching tags are of the form <,r> and </x> for some string x. For instance, we see the matching tags <EH> and </EH>, which indicate that the text between them should be em])liasized, that is, put in italics or another appropriate font. We also see the matching tags <0L> and </0L>, indicating an ordered list, i.e., an enumeration of list items.

The things I hate:

1. Moldy bread.

2. People who drive too slow in the fast lane.

(a) The text as viewed

<P>The things I <EM>hate</EM>:

<0L>

<LI>Moldy bread.

<LI>People who drive too slow

in the fast lane.

</QL>

(b) The HTML source

Figure 5.12: An HTML document and its printed version

We also see two examples of lunnatched tags: <P> and <LI>, which uitroduce paragraphs and list items, respectively. HTML aUows, indeed encourages, that

Sometimes the uitrotluciiig lag <x> iias more mlormationm W \,Ъап JUSC Vfit *tia, me !t: 1от Uie tag. HowevRr. w(? shfill not consider that possibility in examples.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 [ 64 ] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175