Nano雞排: lex ＆ yacc

2009年8月18日星期二

lex ＆ yacc - lex introduction

lex被稱為Lexical Analyzer(中文要叫做語彙分析器?有點怪)，用來產生辨識字詞的工具，透過regular expression定義pattern，當字詞符合某個pattern，就做特定的action。簡單的說就是切token。 lex檔分成三個部分： 1. definition section(declarations)：用於初始化C和lex的，比如變數的宣告。 2. rule section(rules)：定義pattern與相對應的action。 3. user subroutine section(programs)：就是C code。

%{
/* comment: this is demo code
 * file name: 01.l
 */
%}
%%
[\t ]+      /* ignore space */ ;
hello |
world { printf("I can recognize the word \"%s\"\n", yytext); }
%%
int main()
{
    yylex();
    return 0;
}

brook@debian:~/src/lex$ flex 01.l -o 01.yy.c
brook@debian:~/src/lex$ flex 01.l
brook@debian:~/src/lex$ gcc lex.yy.c -ll -Wall
lex.yy.c:1085: warning: 'yyunput' defined but not used
lex.yy.c:1128: warning: 'input' defined but not used
brook@debian:~/src/lex$ ./a.out
hello world
hello world! brook
I can recognize the word "hello"
I can recognize the word "world"
!brook

regular expression

.	代表任何一個字元，但不含換行(\n)。
*	重覆前一個比對零次以上。
[]	比對[]中任一個字元。
[^]	[]的反向。
$	每行的結尾。
{n,m}	前一個比對至少重複n次，最多m次。
+	重覆前一個比對一次以上。
?	前一個比對可出現一次或零次。
\|	or
( )	定義subexpression。

比如： ".*"：表示比對任何一個字元零次以上。 [0-9]+：表示比對數字一次以上，如0921等等。 -?[0-9]+：表示負號可出現可不出現，即表示正負數。 ([0-9]+) | ([0-9]+\.[0-9]+)：( )分成兩個subexpression，|表示其中一個比對成功即可，也就是整數或小數。 [0-9]{1,3}：表示有1~3個數字。

Nano雞排

2009年8月18日星期二

lex ＆ yacc - lex introduction

熱門文章

關於我自己

網誌存檔

搜尋此網誌

標籤

Nano雞排

2009年8月18日 星期二

lex ＆ yacc - lex introduction

熱門文章

關於我自己

網誌存檔

搜尋此網誌

標籤

2009年8月18日星期二