[Compiler] 12. Yacc

학교/컴파일러

[Compiler] 12. Yacc

daykim 2022. 5. 30. 17:09

Yacc (Yet Another Compiler-Compiler)

parser(syntax analyzer) generator하는 프로그램

Input : BNF, Context-free grammar를 BNF로 변환 가능
Output : Parser
LALR(1) Parser

Compiler-Compiler (Compiler generator)

compiler를 generator하는 프로그램

Input : grammar (BNF)
Output : parser의 source code

An (basic) overview of Yacc

General format of Yacc source

{declarations} (선언) // optional
%%
{rules} (변환 규칙)
%%
{programs} (사용자 프로그램) // optional

ex)

%{
#include <stdio.h>
%}

%%

command : exp           {printf("%d\n", $1);}
exp     : exp '+' term	{$$ = $1 + $3;}
        | exp '-' term	{$$ = $1 - $3;}
        | term          {$$ = $1;}
        ;
term	: term '*' factor {$$ = $1 * $3;}
        | factor
        ;
factor  : NUMBER        {$$ = $1;}
        | '(' exp ')'	{$$ = $2;}
        ;

%%

main() {return yyparse();}

int yyerror(char *s) {
    printf(stderr, "%s\n", s);
    return 0;
}

Notation (표기법)

Terminal symbol (token)

Lexical analyzer로부터

rule의 왼쪽에 배치할 수 없다.
parser할 때 쓰이는 rule은 Context-free grammar
Context-free grammar에서 a -> b를 할 때 a는 VN
보통 모두 대문자를 사용한다.
exp '+' term과 같이 '' 내부는 VT로 간주

Non-Terminal symbol

보통 소문자 사용한다.

Shift/Reduce parsing

LR-parser이기 때문이다.

Declarations

- 변수를 정의하거나 모든 작업에 access할 수 있는 header 포함

- Token(Terminal symbol)을 나타내는 이름 선언

ex)

%{
#include <stdio.h>
%}

%token name1
%token NUMBER, DIGIT

- 우선순위 및 연관성 설정

ambiguous grammar를 해결하기 위해서

- start symbol

parser는 start symbol을 인식하기위해 디자인 된다. => why? : bottom-up parser이기 때문이다.
문법 규칙에 의해 설명되는 가장 크고 가장 일반적인 구조
grammar rule 가장 상위의 가장 왼쪽(default) 또는 %start keyword 사용해 표시 ex) %start symbol

Rules

- grammar rules를 표현 ( +action )

하나 이상의 문법 규칙으로 구성된다.
수정된 BNF form + C 코드 acion
연관된 문법 규칙이 인식될 때 마다, action이 실행된다.

- Basic form of rule

A : BODY;
A => Context-free grammar라 하나만 올 수 있다.
ex) date : month_name dya ',' year ;

- 일부 grammar rule이 같은 left side를 가지면, vertical bar '|'를 사용할 수 있다.

ex)

A : B C D;
A : E F;
A : G;

A : B C D
| E F
| G
;

장점
- left side를 여러번 쓸 필요 없다.
- Readability
- rule을 바꾸기 쉽다.

Action

- Action

기본적으로 C 코드를 작성
rule이 인식되었을 때, 어떤 action을 수행해라
subprogram 부르기, 변수명 변경 등 다양하게 할 수 있다.

ex)

A : '(' B ')' {hello(1, "abc");}
grammar rule	action

- action과 parser 사이의 Communication이 필요하다. -> 어떤 값을 인식했으니 어떠한 식으로 수행해라

$ : dollar sign (pseudovariable)

$$ : non-terminal symbol의 value
$1, $2, ..., $n : grammar rule의 오른쪽에서 연속되는 각 symbol의 values

ex) A : B C

$$ : A
$1 : B
$2 : C

- Output은 작업에 의해 직접 수행되지 않고, parse tree와 같은 데이터 구조다.

expr : expr '+' expr
{ $$ = node('+', $1, $3); }

- Yacc은 rule의 중간뿐 아니라 끝에도 action을 기록할 수 있도록 허용한다. (신속한 action) 즉 rule 사이에도 action을 기술할 수 있다.

ex)

decl -> type var-list

decl : type {current_type = $1;}
var-list
;

Example of Rules

BNF	Yacc rule
exp -> exp addop term \| term addop -> + \| - term -> term mulop factor \| factor mulop -> * factor -> ( exp ) \| number	exp : exp '+' term {$$ = $1 + $3;} \| exp '-' term {$$ = $1 - $3;} \| term {$$ = $1;} ; term : term '' factor {$$ = $1 $3;} \| factor ; factor : NUMBER {$$ = $1;} \| '(' exp ')' {$$ = $2;} ;

Ambiguous grammar

- 아래 예제대로 작성하면 여러개의 parse tree가 나온다.

exp : exp '+' exp
| exp '-' exp
| exp '*' exp
| NUMBER

ex1) 2+3*4

(2 + 3) * 4
2 + (3 * 4)

ex2) 2-3-4-5

2-(3-(4-5))
(2-3)-(4-5)
- : left associative
연산자마다 left / right associative가 다르다

- Two type of conflicts

Shift / Reduce conflict
Reduce / Reduce conflict

Ambifuous grammar Solution

- Default

Shift / Reduce conflict -> Shift
Reduce / Reduce conflict -> 더 상위의 grammar rule로 Reduce

1. 같은 input에 대해 conflicts가 발생하지 않도록 Grammar rule 재작성

exp : exp '+' exp
| exp '-' exp
| exp '*' exp
| NUMBER

exp : exp '+' term {$$ = $1 + $3;}
| exp '-' term {$$ = $1 - $3;}
| term {$$ = $1;}
;
term : term '*' factor {$$ = $1 * $3;}
| factor
;
factor : NUMBER {$$ = $1;}
| '(' exp ')' {$$ = $2;}
;

but 힘들다

2. disambiguating rules을 추가해준다.

우선순위 기술
associativity를 기술

ex)

%left '+' '-'
%left '*' '/'
%right '='
위로 갈수록 우선순위 낮아진다.
%nonassoc : 연산자가 한 level에서 반복될 수 없다.

%{
#include <stdio.h>
%}

%token INTEGER VARIABLE
%left '+' '-'
%left '*' '/'

%%
program :
        program statement '/n'
        ;
statement :
        expr
        | VARIABLE '=' expr
        ;

exp     : 
        INTEGER
        | VARIABLE             {$$ = sym[$1];}
        | expr '+' expr        {$$ = $1 + $3;}
        | expr '-' expr        {$$ = $1 - $3;}
        | expr '*' expr        {$$ = $1 * $3;}
        | expr '/' expr        {$$ = $1 / $3;}
        | '(' expr ')'         {$$ = $2;}
        ;

%%
main() {return yyparse();}

int yyerror(char *s) {
    fprintf(stderr, "%s\n", s);
    return 0;
}

Lex & Yacc

Building a parser with Lex / Yacc

Example

bison [-option] [sourceFile]

SourceFile
grammar rules
Output
y.tab.c
bison -y sourceFile => generate 'y.tab.c'

ex) 간단한 사칙연산 계산기

yylval를 통해 값을 parser에 전달

bison -d -y ex.y           // create y.tab.h, y.tab.c
lex ex.l                   // create lex.yy.c
gcc lex.yy.c y.tab.c -o ex // compile / link
./ex

ex.l

%{
#include <stdlib.h>
#include "y.tab.h"
%}

digit            [0-9]+

%%

[a-z]            {
                    yylval = *yytext - 'a';
                    return VARIABLE;
                 }

[A-Z]            {
                    yylval = *yytext - 'A';
                    return VARIABLE;
                 }

("+"|"-")?{digit} {
                    yylval = atoi(yytext);
                    return INTEGER;
                 }

[-+()=\/\*%\n]   { return *yytext; }

[ \t]            ;

.                yyerror("invalid character");

%%

int yywrap(void) {
    return 1;
}

ex.y

%{
#include <stdio.h>
int sym[26];
%}

%token  INTEGER VARIABLE
%left   '+' '-'
%left   '*' '/' '%'

%%
program :
            program statement '\n'
            |
            ;

statement :
            expr                  {printf("%d\n", $1);}
            | VARIABLE '=' expr   {sym[$1] = $3;}
            ;

expr :
            INTEGER
            | VARIABLE            {$$ = sym[$1];}
            | expr '+' expr       {$$ = $1 + $3;}
            | expr '-' expr       {$$ = $1 - $3;}
            | expr '*' expr       {$$ = $1 * $3;}
            | expr '/' expr       {$$ = $1 / $3;}
            | expr '%' expr       {$$ = $1 % $3;}
            | '(' expr ')'        {$$ = $2;}
            ;

%%
int main() { return yyparse(); }

int yyerror (char *s) {
    fprintf(stderr, "%s\n", s);
    return 0;
}

저작자표시

'학교 > 컴파일러' 카테고리의 다른 글

[Compiler] 13. Semantic analyzer (0)	2022.06.10
[Compiler] 11-2. Bottom-up parsing (0)	2022.06.09
[Compiler] 11-1. Bottom-up parsing (0)	2022.05.17
[Compiler] 10. Top-down parsing (0)	2022.05.16
[Compiler] 9. Pushdown automata (0)	2022.04.18

현재글[Compiler] 12. Yacc

김올빵

방명록

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

JJING_LOG

[Compiler] 12. Yacc

Yacc (Yet Another Compiler-Compiler)

Compiler-Compiler (Compiler generator)

An (basic) overview of Yacc

General format of Yacc source

Notation (표기법)

Declarations

Rules

Action

Example of Rules

Ambiguous grammar

Ambifuous grammar Solution

Lex & Yacc

Building a parser with Lex / Yacc

Example

'학교 > 컴파일러' 카테고리의 다른 글

'학교/컴파일러'의 다른글

티스토리툴바

[Compiler] 12. Yacc

Yacc (Yet Another Compiler-Compiler)

Compiler-Compiler (Compiler generator)

An (basic) overview of Yacc

General format of Yacc source

Notation (표기법)

Declarations

Rules

Action

Example of Rules

Ambiguous grammar

Ambifuous grammar Solution

Lex & Yacc

Building a parser with Lex / Yacc

Example

'학교 > 컴파일러' 카테고리의 다른 글

'학교/컴파일러'의 다른글

관련글

티스토리툴바