编译原理 Principles of Compilers

时间:2021-10-08
本文章向大家介绍编译原理 Principles of Compilers,主要包括编译原理 Principles of Compilers使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

Overview

In this course we mainly learned about the function of compiler and process of compiling.

Compiler is like a translator,it translates source code into assembly language supported by the target machine.

Process of compiling has 6 steps:

graph TD start(Source Code)-->op1(Lexical Analysis) op1-->op2(Syntax Analysis) op2-->op3(Semantic Analysis) op3-->op4(Intermediate Code Generator) op4-->op5(Machine Independent Code Optimiser) op5-->op6(Code Optimiser) op6-->e(Target Code)

In this process, we focused on lexical analysis and syntax analysis.

一、Lexical Analysis

This phase scans the source code and as a stream of characters and recognize different types of tokens, like identifier, keyword or operator.

For example:

int val = 10;

To recognize a token, we need to use something to formalize the description of token.

Like regular expression and finite automata.

1. Regular Expression

A regular expression can represent a type of strings.

We use regular expressions to match the strings we need.

Like:

(a | b)*

This regx represent all strings consists of 'a' and 'b'.

2. Finite Automata

FA is a machine that recognizes regular expressions.

It has a set of states and rules for moving from one state to another.

There are two types of FA:

  • DFA: Deterministic Finite Automata
  • NFA: Nondeterministic Finite Automata

1) DFA

"Deterministic" means that, for each state, each input symbol corresponds to only one target state.

2) NFA

"Nondeterministic" means that, for each state, each input symbol corresponds to one or more target states.

NFA can be converted to DFA.

二、Syntax Analysis

This phase checks whether the token string given by the lexical analysis conforms to the grammar of source code language.

We use context-free grammar(CFG) to represent a grammar.

1. CFG

A CFG has four components:

  • Non-Terminals(V): It denote sets of strings.

  • Terminal Symbols(Σ): or a set of tokens.

  • Productions(P): The rules. In CFG, left side of productions are non-terminals.

  • Start symbol(S)

Example: rules of arithmetic expression:

\[E \rightarrow identifier\\ E \rightarrow E + E\\ E \rightarrow E * E \]

2. Syntax Analyzer

Syntax analyzer check the input according to the CFG. Output of this phase is a parse tree.

For example:

input: num1 + num2 * num1

Derivation Process:

\[E \rightarrow E * E\\ E \rightarrow E + E * E\\ E \rightarrow id + E * E\\ E \rightarrow id + id * E\\ E \rightarrow id + id * id \]

The Tree:

3. Two Types of Parsing

1) Top-Down

LL Parser.

2) Bottom-Up

SLR Parser, LR Parser and LALR Parser.

三、课程设计内容

The professor gave us three context-free grammars, and asked us to choose one grammar.

And then write a lexical analyzer, syntax analyzer based on SLR(1) parsing in C++ language.

We could also choose to write a semantic analyzer.

I chose the most difficult grammar and enlisted to write a semantic analyzer. But unfortunately I didn't make it to finish the semantic analyzer.

原文地址:https://www.cnblogs.com/danielwong2021/p/15380761.html