Before learning compilation principles, we should first know why we need to learn?

Why do college courses offer compiler principles? This course focuses on the generation principle and technical problems of compilers, which seems to be unrelated to the basic field of computer. However, compilation principle has always been a compulsory course for undergraduate students, and has also become a Graduate Entrance Examination The required content of. I think it is necessary to understand. The article should be summarized by myself to achieve the goal of passing the exam. If there is any mistake, please leave a message to correct it. Thank you~

Key points of exam estimation: Grammar Normal form automata , syntax derivation tree.

1、 What is compilation (understanding)

1.1 Computer programming language and compilation

 Compilation Principle: From Getting Started to Giving Up

 Compilation Principle: From Getting Started to Giving Up

1.2 The compiler's position in the language processing system

 Compilation Principle: From Getting Started to Giving Up

1.3 Compiling System Structure

 Compilation Principle: From Getting Started to Giving Up

1.4 Examples of manual English Chinese translation

 Compilation Principle: From Getting Started to Giving Up


 Compilation Principle: From Getting Started to Giving Up


 Compilation Principle: From Getting Started to Giving Up

1.5 Compiler structure

 Compilation Principle: From Getting Started to Giving Up

2、 Grammar (Mastery)

  • Understanding Terminators and Non Terminators
  • Types of grammar
  • Judge whether a string is a sentence pattern of a grammar

2.1 Understanding terminators and non terminators

Terminator: cannot appear on the left side of the derivation alone( commonly In lowercase letters)

Non terminator: elements that can be split, on the right side of the derivation( commonly In capital letters)

Example 1: Determine which of the following are terminators and which are non terminators?

 Grammar G2 [S] is: S->Ap S->Bq A->a A->cA B->b B->dB

Answer: S is the start character, S, A, B are non terminators, and p, q, a, b, c, d are terminators.

2.2 Types of grammar

2.2.1 Understanding of Several Grammars

  • Type-0 grammar

Let G=(Vn, Vt, P, S), Vn represents a non terminal character, Vt represents a terminal character, P represents the entire set, and S represents a start character. If every production α->β It is such a structure, α Belongs to (Vn ← Tt) ^ * and contains at least one nonterminal, and β If it belongs to (Vn ≠ Tt) ^ *, then G is a type 0 grammar, and type 0 grammar also becomes Phrasal grammar Type 0 grammar is the least restrictive one among these grammars.

For example, Aa ->b, A ->aBa are type-0 grammars, while ab-A, a ->A are not type-0 grammars.

  • Type 1 grammar

Type 1 grammar is also called About grammar , based on type 0 grammar, each α->β, All have| β|>=|α|, Here| β| express β Length of.

For example: B ->aB| β| The length is 2| α| A length of 1 is a type 1 grammar, while aB ->B is not a type 1 grammar.

  • Type 2 Grammar

Type 2 grammar is also called Context free grammar , which corresponds to the pushdown automaton. Type 2 grammar is based on type 1 grammar and meets the requirements of α->β All have α Is a nonterminal.

For example, A ->Ba belongs to type 2 grammar, while Ab ->Bab does not conform to type 2 grammar although it conforms to type 1 grammar.

  • Type 3 Grammar

Type 3 grammar is also called Formal grammar , which corresponds to the finite state automaton, and it is based on the type 2 grammar and then satisfies the following requirements: A ->a | aB (right linear) or

A ->a | Ba (left linear).

For example: A ->a, A ->aB, B ->a, B ->cB, then sign type 3 grammar, but if the derivation is A ->ab, A ->aB, B ->a, B ->cB or A ->a, A ->Ba, B ->a, B ->cB, then it does not conform to type 3 grammar.

Example 2: What kind of grammar does grammar G belong to?

 A-> ε| aB B->Ab|a

Answer:

1. Let's write separately. It should be: A -> ε A->aB B->Ab B->a
2. Let's first judge whether it conforms to the type 0 grammar: the type 0 grammar requires that there must be non terminators on the left, so these are all consistent.
3. Let's see if it conforms to type 1 grammar: type 1 grammar rules are also applicable when they are pushed from small to large.
4. Let's see whether it conforms to type 2 grammar: type 2 grammar requires that the left side must be a non terminator, which also meets the requirements.
5. Let's continue to see whether it conforms to type 3 grammar: it can only conform to right linearity or left linearity, so the first one should conform to right linearity, and the last one should conform to left linearity. So it doesn't conform to Type 3 grammar.

The final answer is that this grammar belongs to type 2 grammar.

2.2.2 Relationship of several grammars

 Compilation Principle: From Getting Started to Giving Up

3、 Normal form (master)

The normal form is also called regular expression , a normal form corresponds to a normal grammar.

3.1 Conversion between formal form and formal grammar

 |Grammar production| | ----- | ---------- | ------ | |Rule 1 | A ->xB, B ->y | A=xy| |Rule 2 | A ->xA | y | A=x ^ * y| |Rule 3 | A ->x, A ->y | A=x | y|

Example 3:

 Grammar G [S]: S ->xSx | y describes the language _____ (n>=0) A.(xyx)^* B.xyx^* C.xy^*x D.x^*yx^*

Solution idea: separate the original generating formula into S ->xS | y, S ->Sx | y, S ->xS | y regular formula is x ^ * y, S ->Sx | y regular formula is yx ^ *, so the combined formula is x ^ * yx ^ *, and select D for the correct answer

Example 4:

 The regular expression of language L={a ^ mb ^ n | m>=0, n>=1} is _____. A.a^*bb^* B.aa^*bb^* C.aa^*b^* D.a^*b^*

Solution idea: you can directly substitute the value to filter the answer, or you can directly judge, because n>=1, there is at least one b, a can not be, so the correct answer is A

4、 Finite automaton (master)

Finite automata is also called finite automata.

  • Definition of NFA and DFA
  • NFA converted to DFA
  • Transformation between regular expressions and finite automata

4.1 Definition of NFA and DFA

4.1.1 Definition of Determined Finite Automata (DFA)

A finite automaton M is a five tuple:

M=(S,∑,f,S0,Z)

S is a finite set of states

∑ is an alphabet, and each element of it is called an input character

F is a slave S ✖∑ Single value partial mapping to S, f (S, a)=s' means that when the current status is s and the input character is a, the status will be transferred to the next status s'. We call s' a successor state of s.

S0 ∈ S is the only initial state.

Z ⊆ S is a final state set.

Example 5: Draw the following DFA state transition diagram:

 DFA=({S, A, B, C, f}, {1,0}, F, S, {f}). To avoid confusion, F is represented by K below. Where: K (S, 0)=B, K (S, 1)=A, K (A, 0)=f, K (A, 1)=C, K (B, 0)=C, K (B, 1)=f, k (C, 1)=f;

Solution idea: {S, A, B, C, f} is a state set; {1,0} is the input character; F is a map, S is the initial state, and {f} is the final state. K (S, 0)=B indicates that the S state input 0 changes to state B, and others are similar.

So the state transition diagram is:

 Compilation Principle: From Getting Started to Giving Up

4.1.2 Definition of uncertain finite automata (NFA)

A finite automaton M is a five tuple:

M=(S,∑,f,S0,Z)

S is a finite set of states

∑ is an alphabet, and each element of it is called an input character

F is a slave S ✖∑ Single value partial mapping to S, f (S, a)=s' means that when the current status is s and the input character is a, the status will be transferred to the next status s'. We call s' a successor state of s.

S0 ⊆ S is a non empty initial state set.

Z ⊆ S is a final state set.

4.2 Conversion of NFA to DFA

Example 6: The finite automata (NFA) with known uncertainties is shown in the figure. The process of determining it into DFA by subset method is shown as follows:

 Compilation Principle: From Getting Started to Giving Up

I I0 I1
{S,1,2,3} {1,3,4,5,Z} {2,3}
{1,3,4,5,Z} T1 T3
{2,3} {4,5,Z} {2,3}
T2 {6} T3
T1 {1,3,4,5,6,Z} {5,Z}
{6} T3 {5,Z}
{5,Z} {6} T3

The state set T1 does not include the state numbered (1); The members in state set T2 are (2); State set T3 equals (3).

 (1)A.2   B.4   C.3   D.5 (2)A.1,3,4,5,Z   B.2,3   C.6   D.4,5,Z (3)A. {Z}   B.{6}   C.{4,5,Z}   D.{}

Solution idea: The corresponding I0 in the table represents the value I obtained by entering 0, and I1 represents the value I obtained by entering 1. The correct answer can be obtained by observing the NFA as follows:

A; D; D

Finally, draw DFA according to the state set

4.3 Conversion between normal form and finite automata

 Compilation Principle: From Getting Started to Giving Up

Example 7: Convert the following normal form to finite automata

 (_|a)(_|a|d)^*

Solution: underline a represents the letter set d represents the number set

The following finite automata can be drawn

 Compilation Principle: From Getting Started to Giving Up

Example 8: The state transition diagram of a certain nondeterministic finite automaton (NFA) is shown in the following figure, and the normal formula of the NFA equivalent is ____.

 A.0*|(0|1)0 B.(0|10)* C.0*((0|1)0)* D.0* (10)*

 Compilation Principle: From Getting Started to Giving Up

Solution idea: q0 is both the initial state and the final state (final state double loop) -->it can make empty strings (enter the initial state direct final state). Now ABCD is a closure and cannot exclude options. The obtained string may be all 0, 000000 (0 | 1) 0, 0000000000000 10 0000000, 101010101010......, 0 is discrete, but every 1 entered will be followed by a 0, according to the exclusion method, the answer can be B

5、 Syntax derivation tree (master)

5.1 Syntax tree

A syntax tree should have the following characteristics:

  1. Each node has a mark, which is a symbol of V;
  2. The root is marked with S;
  3. If a node n has at least one descendant except itself, and is marked with A, then A must be in Vn;
  4. If the direct descendants of node n, from left to right, are nodes n1, n2... nk, and their marks are A1, A2... Ak, then A ->A1, A2... Ak, must be a production formula in P.

Example 9:

 If grammar G={{a, b}, {S, A}, S, P}, where: S ->aAS | a; A->SbA|SS|ba; Please construct the derivation tree of sentence pattern aabAa.

Solution idea: First, disassemble the generative formula to facilitate our observation. Finally, the derivation tree is drawn as follows:

 Compilation Principle: From Getting Started to Giving Up

5.2 Phrases, simple phrases, handles

Ling G is a grammar, S is the beginning of grammar, and abc is a sentence pattern of grammar G.

If there are several steps S=>aAc and A=>b, then b is said to be the sentence pattern abc relative to the non terminator A phrase In particular, if A=>b, b is the sentence pattern abc relative to the rule A ->b immediate phrase (Also called Simple phrase )。 The leftmost direct phrase of a sentence pattern is called handle

Example 10: The derivation tree of a context free grammar generating sentence abbaa is shown in the figure below. Find the phrase, direct phrase and handle of the following syntax derivation tree.

 Compilation Principle: From Getting Started to Giving Up

Simple understanding of phrases, direct phrases, handles

Phrase: In any subtree, if the root node deduces the leaf node after several steps, the sequence of these leaf nodes is the phrase relative to the subtree;

Direct phrase: it belongs to a phrase, but it cannot be deduced in several steps. It must be deduced in one step. The sequence of these leaf nodes is the direct phrase relative to this subtree;

Handle: It belongs to the direct phrase. It is the direct phrase of the leftmost subtree of these subtrees with direct phrases.

answer:

Phrase: a1 ɛ、 b1、b2、a2、a3

Direct phrase: because the leaf node of this tree is derived through several steps without one step, there is no direct phrase.

Handle: find the leftmost direct phrase from these direct phrases, that is, handle. The handle of this question is a1

6、 LL (1) Grammar (Mastery)

  • Judge whether it is LL (1) grammar
  • Find the FIRST, FOLLOW and SELECT sets of the generative expression

6.1 What is LL (1) grammar

The first L stands for scanning the input symbol string from left to right, the second L stands for generating the leftmost derivation, and 1 stands for looking forward to an input symbol - the input symbol currently being processed - when performing each step of derivation in the analysis process.

The sufficient and necessary condition for the top-down grammatical analysis of the sentences of grammar G is that any two productions of G with the same left part A -> α|β The following conditions are met:

(1) If α、β Neither can be derived ε, FIRST( α) ∩ FIRST( β) = ∅。

(2) α And β At most one can be deduced ε。

(3) If β * ═> ε, FIRST( α) ∩ FOLLOW(A) = ∅。

The grammar that meets the above conditions is called LL (1) grammar.

6.2 Find FIRST set

Solution rules: When calculating FIRST (X) of each grammar symbol X, continuously apply the following rules until there is no new terminal symbol or ε It can be added to any FIRST set.

 1. If X is a terminal, FIRST (X)=X. 2. If X is a nonterminal symbol and X ->Y1Y2... Yk is a production, where k ≥ 1, then if for an i, a is in FIRST (Yi) and ε In all FIRST (Y1), FIRST (Y2) In FIRST (Yi-1), add a to FIRST (X). In other words, Y1... Yi-1=>* ε。 If more than all j=1,2,3, k ,  ε In FIRST (Yj), then ε  Add to FIRST (X). For example, all symbols in FIRST (Y1) must be in FIRST (X). If Y1 cannot be deduced ε , Then, we will not add any symbols to FIRST (X), but if Y1=>* ε , Then we will add FIRST (Y2), and so on. 3. If X -> ε  Is a production, then ε  Add to FIRST (X).

Example 11: Find FIRST set

 E → TE' E' → +TE' | ε T → FT ' T' → *FT ' | ε F → (E)|id

analysis:

FIRST (E): FIRST (E)=FIRST (T) can be obtained from E ->TE '
FIRST (T): FIRST (T)=FIRST (F) can be obtained from T ->FT '
FIRST (F): from F ->(E) and F ->id, FIRST (F)={(, id};
FIRST (T '): By T' -> FT 'and T' -> ε The available FIRST (T ')={ , ε };
FIRST (E '): from E' ->+TE 'and E' -> ε We can get FIRST (E ')={+, ε };

answer:

FIRST ( E ) = { ( id }
FIRST ( E' ) = { + ε }
FIRST ( T ) = { ( id }
FIRST ( T' ) = { * ε }
FIRST ( F ) = { ( id }
 Compilation Principle: From Getting Started to Giving Up

6.3 Finding the FILLOW set

Solution rules: When calculating the FOLLOW (A) set of all non terminal symbols A, the following rules are continuously applied until no new terminal symbols can be added to any FOLLOW set.

 1. Put $into FOLLOW (S), where S is the start symbol and $is the end mark on the right end of input. 2. If there is a production A -> α B β ,  Then FIRST( β) Middle division ε All symbols except are in FLOW (B). 3. If there is a production A -> α B. Or there is production A -> α B β  And FIRST( β) contain ε , Then all symbols in FOLLOW (A) are in FOLLOW (B).

Example 12: Find the FOLLOW set

 E → TE' E' → +TE' | ε T → FT ' T' → *FT ' | ε F → (E)|id

analysis:

FOLLOW (E): obtained from F ->(E), FOLLOW (E)={), $}
FOLLOW (E '): obtained from E ->TE' and E '->+TE', FOLLOW (E ')=FOLLOW (E)
FILLOW (T): From E ->TE ', E' ->+TE 'and E' -> ε Yes, FILLOW (T)=FIRST (E ')/ ε + FOLLOW(E) + FOLLOE(E’ )
FOLLOW (T '): available from T ->FT' and T '->* FT', FOLLOW (T ')=FOLLOW (T)
FLOW (F): From T ->FT ', T' ->* FT 'and T' -> ε Yes, FILLOW (F)=FIRST (T ')/ ε + FOLLOW(T) + FOLLOW(T’)

answer:

FOLLOW(E) = { ) ,$}
FOLLOW(E’) = FOLLOW(E) = { ) ,$}
FOLLOW(T) = FIRST(E’) / ε + FOLLOW(E) + FOLLOE(E’) = {+ , ) , $}
FOLLOW(T’) = FOLLOW(T) = {+ , ) , $}
FOLLOW(F) = FIRST(T’) / ε + FOLLOW(T) + FOLLOW(T’) = {* , + , ) , $}
 Compilation Principle: From Getting Started to Giving Up

6.4 Finding the SELECT set

Solution rules: apply the following two rules

 1. If ε∉ FIRST (a), then SELECT (A → a)=FIRST (a) 2. If ε∈ EFIRST (a), then SELECT (A → a)=(FIRST (a) - {}) U FLOW (A)

Example 13: Find the SELECT set

 (1) E → T E ' (2) E '→ + T E ' (3) E '→  ε (4) T → F T ' (5) T ' → * F T ' (6) T ' →  ε (7) F → ( E ) (8) F → id

analysis:

X FIRST( X ) FOLLOW( X )
E ( id $ )
E ' + ε $ )
T ( id + ) $
T ' * ε + ) $
F ( id * + ) $

answer:

SELECT (1)= { ( id }
SELECT (2)= { + }
SELECT (3)= { $ ) }
SELECT (4)= { ( id }
SELECT (5)= { * }
SELECT (6)= { + ) $ }
SELECT (7)= { ( }
SELECT (8)= { id }

<To be updated...>

enclosure:

Download the matching multiple choice questions of compilation principles

reference material:

Principles video compiled by Chen Yin of Mooc Harbin Institute of Technology, University of China: https://www.icourse163.org/course/HIT-1002123007

CSDN blog post "The original compilation principle can be learned like this"