[Compiler] 11-1. Bottom-up parsing

학교/컴파일러

[Compiler] 11-1. Bottom-up parsing

daykim 2022. 5. 17. 15:25

Bottom-up parsing (상향식 구문 분석)

연산을 반전시켜, 문자열을 Start symbol로 Reduce 한다.

Rightmost derivation

Two actions in bottom-up parser

Shift (이동)

Input 처음 문자열을 parsing stack으로 이동

Reduce (감축)

a가 있을 때, P : A -> a에 매칭되어 a -> A로 reduce 한다.

Bottom-up parser example

G = ({S, S'}, {(, )}, P, S')
P :
S' -> S
S -> ( S ) S | ε

	Parsing stack	Input	Action
1	$	( ) $	shift
2	$ (	) $	reduce S -> ε
3	$ ( S	) $	shift
4	$ ( S )	$	reduce S -> ε
5	$ ( S ) S	$	reduce S -> ( S ) s
6	$ S	$	reduce S' -> S
7	$ S'	$	accept

Considerations on bottom-up parser

무슨 기준으로 rule을 선택하는 걸까? 어떻게 결정되는 걸까?

=> Stack lookahead, Input lookahead 통해 결정한다.

Conflicts

특정 상태에서, 두 개 이상의 작업으로 인해 유효한 구문 분석으로 이어질 수 있는 경우

Shift-reduce conflicts

특정 상태에서, Reduce할지 Shift 할지 모호할 때

Reduce-reduce conflicts

특정 상태에서, 서로 다른 2개의 productions이 있을 때

Stack lookahead

어떤 action을 수행할지 stack을 보고 결정

	Parsing stack	Input	Action
1	$	( ) $	shift
2	$ (	) $	reduce S -> ε
3	$ ( S	) $	shift
4	$ ( S )	$	reduce S -> ε
5	$ ( S ) S	$	reduce S -> ( S ) s
6	$ S	$	reduce S' -> S
7	$ S'	$	accept

Input lookahead

어떤 action을 수행할지 Input 보고 결정

ex)

	Parsing stack	Input	Action
1	$	n + n $	shift
2	$ n	+ n $	reduce E -> n
3	$ E	+ n $	shift
4	$ E +	n $	shift
5	$ E + n	$	reduce E -> E + n
6	$ E	$	reduce E' -> E
7	$ E'	$	accept

Augmented grammar

어떤 context-free grammar G = (VN, VT, P, S)가 주어졌을 때 이 문법을 context-free grammar G' = (V'N, VT, P', S')로 바꿀 수 있다.

L(G') = L(G)
V'N = VN ∪ S'
P' = P ∪ {S' -> S}

ex)

G = ({S}, {(, )}, P, S)
P : S -> ( S ) S | ε

G = ({S, S'}, {(, )}, P', S')
P :
S' -> S
S -> ( S ) S | ε

Right sentential form and viable prefix (올바른 문장형식 및 사용 가능한 접두사)

E' => E => E + n => n + n

- Right sentential form

derivation 과정에 나오는 모든 terminal과 non-terminal들의 string
parsing stack과 input의 구분
ex) E + n
E || + n
E + || n
E + n ||

- Viable prefix

parsing stack에서 symbol의 순서
Right sentential form E + n 의 viable prefix는 E, E +, E + n

Handle

문자를 reduce할 때 사용되는 문자열과 production을 handle이라고 한다.

shift-reduce parser는 다음 올바른 문장 형식을 얻기 위해 reduction을 수행할 수 있을 때까지 input에서 stack으로 terminal을 이동시킨다.

즉, reduce할게 없으면 shift

문법이 모호한 경우, 올바른 문장 형태로 두 개 이상의 Handle이 있을 수 있다.
문법이 모호하지 않으면, handle이 고유하다.
다음 핸들을 결정하는 것은 shift-reduce parser의 주요 작업니다.

LR Parsing

parsing이란 context-free grammar G = (VN, VT, P, S)가 주어졌을 때, 임의의 terminal 문자열 w ∈ ∑*에 대해, G에 의해 생성된 언어 L(G)에 속하는지 알아내는 것

만약 yes인 경우, w의 rightmost derivation을 구성한다.
LR parsing은 모든 context-free grammar에 대해 가능하지 않다.
모든 것에 적용 가능한 것은 CYK 알고리즘
특정 유형의 문법 (LR(k) 문법)에 대해, 특정 유형의 DDPDA는 선형 시간에 분석하는 문법으로 구성될 수 있다.

LR(k) parsing

의미
L : left-to-right으로 input을 scan
R : rightmost derivation
k : k개의 token을 lookahead 한다.

LR(0) item (item)

context-free grammar의 LR(0) item

k가 0이므로 lookahead를 하지 않는 것

right-hand side에 distinguished position을 가진 production
마침표(.)에 의한 위치 구분 (마침표는 V에 없는 기호 -> not text)
만약 A -> α가 production이고, β와 δ가 βδ = α인 어떤 두개의 문자열(ε을 포함한)을 아래와 같이 표현할 수 있다.
LR(0) item : A -> β . δ

Intuition behind the LR(0) item and initial and complete item

LR(0) item은 grammar rule의 right-hand side의 recognition하는 중간 단계를 기록한다.

즉, Parsing이 어디까지 되었는지 중간 상태를 알 수 있게된다.

문자열 ( int )를 고려한다.
(E || )는 shift-reduce parse의 상태다.
- ( E 는 T -> (E)의 right-hand side의 접두사다.
- next shift 후에 reduce 될 것이다.
T -> ( E . )
- 현재 parsing stack에 ( E 가 있고, 다음번에 ) 를 input으로 받으면 T로 reduce 가능하다.
- rule의 중간상태를 저장한 것이다.

Initial item and complete item

Item A -> . α : 앞의 parsing stack에는 아무것도 없고, a를 읽으려는 상황
- stack이 비어있는게 아니라 parsing 상태를 나타내는 것이다.
- A -> α를 이용해 A를 인식하려는 상황
Item A -> α . : 다 읽었고, 더이상 읽을게 없는 상황 Complete Item => A로 reduce 가능

ex)

P :
S' -> S
S -> ( S ) S | ε

LR(0) item

S' -> . S
S' -> S .
S -> . ( S ) S
S -> ( . S ) S
S -> ( S . ) S
S -> ( S ) . S
S -> ( S ) S
S -> .

Finite automata of LR(0) items (Knuth's LR(0)-characteristic automaton)

LR(0) items은 finite automaton의 상태로 사용할 수 있다.

parsing stack 및 Shift-Reduce parser의 진행 상태에 대한 정보를 유지한다.

비결정적 finite automaton으로 시작
하위 집합 구성을 사용하여 DFA 구성

Finite automata of LR(0) items

- 모든 terminal a에 대해, α, β ∈ V*인 경우 A -> α . a β 이면,

상태 A -> α . a β로 부터 A -> α a . β로 dot을 이동해 얻는 input a의 transition이 있다.
LR(0) Item 각각이 state가 된다.

- 모든 non-terminal B에 대해, 만약 α, β ∈ v*인 A -> α . B β 이면,

B가 non-terminal 일 때, B로 시작하는 LR(0) Item으로 transition 가능하다.
ε transition으로 이동

- start state : S' -> .S

- A state가 A -> β . 일 경우에만 final state이다.

점(dot)은 rightmost에 위치한다.

- FA는 문자열을 인식하지 않도록 parser의 state를 추적한다.

문자 인식을 위한 것이 아니다.

ex)

P :
S' -> S
S -> ( S ) S | ε

LR(0) item

S' -> . S
S' -> S .
S -> . ( S ) S
S -> ( . S ) S
S -> ( S . ) S
S -> ( S ) . S
S -> ( S ) S
S -> .

위의 그림을 ε-transition이 없도록 만들어 합쳐서 아래와 같이 만든다.

0 state에서 ε으로 갈 수 있는거 묶기 -> 이런식으로 반복

LR(0) Parsing algorithm

1. marker $와 start state 0을 stack에 push한다.

Parsing stack	Input
$ 0	Input string $

2. 다음 step은 token n을 stack에 shift하고, state 2로 간다.

Parsing stack	Input
$ 0 n 2	Rest of Input string $

LR(0) parsing algorithm은 현재 DFA state를 기반으로 action을 선택한다.

LR(0) 이므로 lookahead를 하는 것이 아니다.
state만 보고 shift를 할지 reduce를 할지 알 수 있다.
현재 DFA state를 보고 action을 결정한다.

S가 현재 state라고 생각해보자.

A -> α . B β 는 complete item이 아니다.
따라서 reduce를 하기 위해선 input에 있는 것을 parsing stack으로 shift 한다.
a -> α . 와 같은 complete item이 있다면 reduce를 수행한다.
- reduce 수행 방법
- input이 empty고, reduction rule이 S' -> S 와 같이 start symbol이면 accept이다.
- 그렇지 않으면 parsing stack의 값을 가져와 reduce하고 후속작업 처리 -> 뒤의 예시로 이해하자.

ex)

0 ( 3
0 : 현재 state
( : input
3 : action 후 이동한 state

input 보지 않고, 현재 state만 보고 action 수행

	Parsing stack	Input	Action
1	$ 0	( ( a ) ) $	shift
2	$ 0 ( 3	( a ) ) $	shift
3	$ 0 ( 3 ( 3	a ) ) $	shift
4	$ 0 ( 3 ( 3 a 2	) ) $	Reduce A -> a
5	$ 0 ( 3 ( 3 A 4	) ) $	shift
6	$ 0 ( 3 ( 3 A 4 ) 5	) $	Reduce A -> ( A )
7	$ 0 ( 3 A 4	) $	shift
8	$ 0 ( 3 A 4 ) 5	$	Reduce A -> ( A )
9	$ 0 A 1	$	accept S' -> A로 reduce되면 start symbol을 얻게 되기 때문에 parsing 성공

LR(0) conflicts

- shift / reduce conflict

state가 reduce item과 shift item을 가지고 있는 경우

- reduce / reduce conflict

state가 두 개의 reduce item을 가지고 있는 경우

한 state 안에 complete item과 아닌 것이 같이 있다면 무엇을 선택해야할까? => lookahead를 통해 결정한다.

LR(0) Parsing Table

빈칸은 Parsing error
Input : terminal symbol
Goto : non-terminal symbol

State	Action	Rule	Input			Goto
State	Action	Rule	(	a	)	A
0	Shift		3	2		1
1	reduce	S' -> A
2	reduce	A -> a
3	shift		3	2		4
4	shift				5
5	reduce	A -> ( A )

Not LR(0) Grammar

shift / reduce coflict가 포함되어있는 경우는 LR(0)로 처리할 수 없다.

=> 그래서 나온 것이 SLR(1) -> 1은 lookahead를 한다는 것

저작자표시 (새창열림)

'학교 > 컴파일러' 카테고리의 다른 글

[Compiler] 11-2. Bottom-up parsing (0)	2022.06.09
[Compiler] 12. Yacc (0)	2022.05.30
[Compiler] 10. Top-down parsing (0)	2022.05.16
[Compiler] 9. Pushdown automata (0)	2022.04.18
[Compiler] 8. Normal forms (정규 표현) (0)	2022.04.18

현재글[Compiler] 11-1. Bottom-up parsing

김올빵

방명록

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

JJING_LOG