您的位置:首页 > 编程语言

深入研究Clang(七) Clang Lexer代码阅读笔记之Lexer

2016-08-10 17:10 330 查看
作者:史宁宁(snsn1984)
源码位置:clang/lib/Lexer.cpp
源码网络地址:http://clang.llvm.org/doxygen/Lexer_8cpp_source.html

Lexer.cpp这个文件,是Clang这个前端的词法分析器的主要文件,它的内容是对Lexer这个类的具体实现,原文件的注释中:“This file implements the Lexer and Token interfaces.” 这么解释这个文件的,但是Token只有两个简单函数的实现,剩下的都是Lexer的实现。所以要想搞清楚Clang的词法分析器是怎么实现的,那么必须对这个文件有着深入的理解。

从Lexer的初始化函数开始入手:
void Lexer::InitLexer(const char *BufStart, const char *BufPtr,
56 const char *BufEnd) {
57 BufferStart = BufStart;
58 BufferPtr = BufPtr;
59 BufferEnd = BufEnd;
60
61 assert(BufEnd[0] == 0 &&
62 "We assume that the input buffer has a null character at the end"
63 " to simplify lexing!");
64
65 // Check whether we have a BOM in the beginning of the buffer. If yes - act
66 // accordingly. Right now we support only UTF-8 with and without BOM, so, just
67 // skip the UTF-8 BOM if it's present.
68 if (BufferStart == BufferPtr) {
69 // Determine the size of the BOM.
70 StringRef Buf(BufferStart, BufferEnd - BufferStart);
71 size_t BOMLength = llvm::StringSwitch<size_t>(Buf)
72 .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM
73 .Default(0);
74
75 // Skip the BOM.
76 BufferPtr += BOMLength;
77 }
78
79 Is_PragmaLexer = false;
80 CurrentConflictMarkerState = CMK_None;
81
82 // Start of the file is a start of line.
83 IsAtStartOfLine = true;
84 IsAtPhysicalStartOfLine = true;
85
86 HasLeadingSpace = false;
87 HasLeadingEmptyMacro = false;
88
89 // We are not after parsing a #.
90 ParsingPreprocessorDirective = false;
91
92 // We are not after parsing #include.
93 ParsingFilename = false;
94
95 // We are not in raw mode. Raw mode disables diagnostics and interpretation
96 // of tokens (e.g. identifiers, thus disabling macro expansion). It is used
97 // to quickly lex the tokens of the buffer, e.g. when handling a "#if 0" block
98 // or otherwise skipping over tokens.
99 LexingRawMode = false;
100
101 // Default to not keeping comments.
102 ExtendedTokenMode = 0;
103 }
这个初始化函数,是在Lexer类的两个构造函数里被调用的,具体代码如下:

104
105 /// Lexer constructor - Create a new lexer object for the specified buffer
106 /// with the specified preprocessor managing the lexing process. This lexer
107 /// assumes that the associated file buffer and Preprocessor objects will
108 /// outlive it, so it doesn't take ownership of either of them.
109 Lexer::Lexer(FileID FID, const llvm::MemoryBuffer *InputFile, Preprocessor &PP)
110 : PreprocessorLexer(&PP, FID),
111 FileLoc(PP.getSourceManager().getLocForStartOfFile(FID)),
112 LangOpts(PP.getLangOpts()) {
113
114 InitLexer(InputFile->getBufferStart(), InputFile->getBufferStart(),
115 InputFile->getBufferEnd());
116
117 resetExtendedTokenMode();
118 }
119
120 void Lexer::resetExtendedTokenMode() {
121 assert(PP && "Cannot reset token mode without a preprocessor");
122 if (LangOpts.TraditionalCPP)
123 SetKeepWhitespaceMode(true);
124 else
125 SetCommentRetentionState(PP->getCommentRetentionState());
126 }
127
128 /// Lexer constructor - Create a new raw lexer object. This object is only
129 /// suitable for calls to 'LexFromRawLexer'. This lexer assumes that the text
130 /// range will outlive it, so it doesn't take ownership of it.
131 Lexer::Lexer(SourceLocation fileloc, const LangOptions &langOpts,
132 const char *BufStart, const char *BufPtr, const char *BufEnd)
133 : FileLoc(fileloc), LangOpts(langOpts) {
134
135 InitLexer(BufStart, BufPtr, BufEnd);
136
137 // We *are* in raw mode.
138 LexingRawMode = true;
139 }
这两个构造函数各有不同,从输入参数上就可以看出。也有相同的地方,就是对一些参数只是引用的关系,并没有获取这些参数的所有权。

Lexer的构造函数,在自己的类内部,分别被以下的函数所调用:
Create_PragmaLexer: Lexer constructor - Create a new lexer object for _Pragma expansion. http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#ac7f3b1ce4f2eeaec8d787d22bf197cd0
getSpelling - This method is used to get the spelling of a token into a preallocated buffer, instead of as an std::string. http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#a94f2c5710332ae19d7955c609ac37adb
getRawToken
http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#adac8b8cf001621ec3b109d82a7074f05
 getBeginningOfFileToken
http://clang.llvm.org/doxygen/Lexer_8cpp.html#a4845396d18432c436e605303b057dbb4
findLocationAfterToken
http://clang.llvm.org/doxygen/classclang_1_1Lexer.html#a099b99b2d19ef5cdd8fcb80d8cf4064e
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: