您的位置:首页 > 其它

Studying note of GCC-3.4.6 source (1)

2010-02-21 15:06 281 查看
About 4 years ago, I joined GDNT – the coventure of Nortel in China and worked in project of Radio access network of 3G UMTS; where I used GCC first time. At that time Nortel widely used GCC as the official compiler for its giant project of UMTS, CDMA, self developed tools etc (later GCC is also used in the 4G projects of Wimax and LTE). Every week I needed build several project loads to verify my code of bug fix; in every loadbuild the files got compiled amounted to thousands, and the final object file was larger than 100 megabytes. GCC worked with surprising stability. In these years I only encounter two collapses of GCC, one in which dued to I try to make separate compilation of template which GCC still doesn’t support yet (as far as I can tell, till now only the front-end of EDG can do that. And before the C++ standard being stable【ISO-IEC - 14882-1998】, the EDG front-end is the pratical standard. At http://www.edg.com/ can find something about EDG), and fed GCC with weird codes. And cause generating wrong immediate tree code which triggers assertion within GCC (GCC can’t find out the semantics error). Though GCC exitted immediately, it gave detailed diagnostical dump – an elegant give up.
This fantastic tool attracts me deeply! Though I had learnt principle of compilation before, but facing it I just found I knew little about it. Thanks to its open source, I can peep into its mystical body (at least to me). By these years digging into its source code, I have known something about this compiler but its veil is still not fully opened. I am just glad to share all of you with my studying note of GCC source (the GCC concerned is v3.4.6, C++ front-end, host: x86/Linux, target: x86/Linux); the note is far from complete and still growing.
Reference
[1] Programming language pragmatics, 2nd edition
[2] gccint, version 3.4.6
[3] ISO-IEC-14882-2003
[4] The C Preprocessor April 2001, for GCC V3
[5] cppinternals
[6] Using the GNU Compiler Collection
[7] Inside The C++ Object Model, Stanley B.Lippman
[8] GCC Complete Reference
[9] The design and evolution of C++, by Bjarne Stroustrup
[10] Linkers & Loaders, by John R. Levine
[11] Efficient Instruction Scheduling Using Finite State Automata, by Vasanth Bala, Norman Rubin
[12] Compilers: Principles, Techniques, and Tools, 2nd edition
[2] and [5] can be found at YOUR-GCC-SOURCE-DIR/gcc/doc. [1], [7], [9], [10] and [12] give some useful background knowledge.
Preparation before going ahead
Some non-trivial GCC’s codes are generated by GCC’s tools. Before really dipping into the source code; we need first compile the GCC project to make them generated. I just skip the steps to download the source(http://gcc.gnu.org/mirrors.html is the official site for download), configure and compile it. There are plent of links in the net tell about it (Note: if already installed GCC, using g++ -### can check the configure command).
Software architecture of GCC
The GCC as a whole can be separated into two parts: the front-end, and the back-end. Preprocessor (if there is), lexer (for syntax analysis), parser (for semantic analysis) are implemented in front-end, the purpose of front-end is to transform the source language into intermediate form which is source language independent. In theory, to introduce new language, what needs to do is to realize preprocessor, lexer and parser, but in fact, we still need to wrtie codes to setup the needed environment.
In GCC of version 3.4.6, the common intermediate language is RTL (register transfer language). RTL is a simple langauge, which can be easily degraded into assemble language; so it is inappropriate to reduce the source code into it. In fact, the front-end will reduce the code first into an intermediate tree and do lot of manipulation before turning it into RTL form which will be fed to the back-end.
The aim of back-end is to generate assemble code. As a widely used compiler, GCC can be built to support varies host platforms. To achieve this goal, GCC uses files called machine description which describes not only the instruction set, but also the pipeline’s characteristices, and chance for optimization due to the architecture. For specified target, these files will be handled by several tools to generate related files which will be used in compiling the GCC. To introduce new machine, the major work is to offer the new correct machine description file (usually, it also needs define some helper functions to implement the necessary processing logic).
And At here http://www.ibm.com/developerworks/linux/library/l-gcc4/index.html?S_TACT=105AGX52&S_CMP=content, find information about the newest version of GCC.
The front end
When we call “gcc –o xxx xxx.c”, we just call a shell, this shell will parse the command line, do some preparation upon the host platform (pay attention to the difference between host machine and target machine) following to the options it recognizes, and then calls the appropriate compiler according to the postfix of the file and passes those unrecognized options to the compiler. We don’t see this shell here, what we focus is the real compiler.

1. Overview

The front-end will read in and parse the program written in specified language and transform it into a language independent form. In theory, every front-end can generate distinctive form; however, in GCC to reuse code as possible, all intermediate trees built by different front-ends just use the same set of tree node. Of course, the nodes in tree must be diverse enough to cover the languages. This form of tree is so important in handling C/C++ program; it is worth focus and effort first.

1.1. Tree representation in front-ends

To accommodate the already existing and to be added front-ends, there are tens kinds of nodes (not all can be leaf). All nodes begin with a common part as below. It occupies the beginning of all the structures.

129 struct tree_common GTY(()) in tree.h
130 {
131 tree chain;
132 tree type;
133
134 ENUM_BITFIELD(tree_code) code : 8;
135
136 unsigned side_effects_flag : 1;
137 unsigned constant_flag : 1;
138 unsigned addressable_flag : 1;
139 unsigned volatile_flag : 1;
140 unsigned readonly_flag : 1;
141 unsigned unsigned_flag : 1;
142 unsigned asm_written_flag: 1;
143 unsigned unused_0 : 1;
144
145 unsigned used_flag : 1;
146 unsigned nothrow_flag : 1;
147 unsigned static_flag : 1;
148 unsigned public_flag : 1;
149 unsigned private_flag : 1;
150 unsigned protected_flag : 1;
151 unsigned deprecated_flag : 1;
152 unsigned unused_1 : 1;
153
154 unsigned lang_flag_0 : 1;
155 unsigned lang_flag_1 : 1;
156 unsigned lang_flag_2 : 1;
157 unsigned lang_flag_3 : 1;
158 unsigned lang_flag_4 : 1;
159 unsigned lang_flag_5 : 1;
160 unsigned lang_flag_6 : 1;
161 unsigned unused_2 : 1;
162 };

Above at line 134, ENUM_BITFIELD for Version 3.46, will be expanded into __extension__ enum, and chain at line 131 will link the node into the tree if needed.
The meaning of some flags defined in the structure, and macros defined to access them are shown in below (words in red).
Ø TREE_TYPE ((NODE)->common.type)
In all nodes that are expressions, this is the data type of the expression.
² In POINTER_TYPE nodes, this is the type that the pointer points to.
² In ARRAY_TYPE nodes, this is the type of the elements.
² In VECTOR_TYPE nodes, this is the type of the elements.
Ø TREE_ADDRESSABLE((NODE)->common.addressable_flag)
² In VAR_DECL nodes, nonzero means address of this is needed. So it cannot be in a register.
² In a FUNCTION_DECL, nonzero means its address is needed. So it must be compiled even if it is an inline function.
² In a FIELD_DECL node, it means that the programmer is permitted to construct the address of this field. This is used for aliasing purposes: see record_component_aliases
² In CONSTRUCTOR nodes, it means object constructed must be in memory.
² In LABEL_DECL nodes, it means a goto for this label has been seen from a place outside all binding contours that restore stack levels.
² In *_TYPE nodes, it means that objects of this type must be fully addressable. This means that pieces of this object cannot go into register parameters, for example.
² In IDENTIFIER_NODEs, this means that some extern decl for this name had its address taken. That matters for inline functions.
Ø TREE_STATIC ((NODE)->common.static_flag)
² In a VAR_DECL, nonzero means allocate static storage.
² In a FUNCTION_DECL, nonzero if function has been defined.
² In a CONSTRUCTOR, nonzero means allocate static storage.
Ø TREE_VIA_VIRTUAL ((NODE)->common.static_flag)
² Nonzero for a TREE_LIST or TREE_VEC node means that the derivation chain is via a `virtual' declaration.
Ø TREE_CONSTANT_OVERFLOW ((NODE)->common.static_flag)
² In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST this means there was an overflow in folding. This is distinct from TREE_OVERFLOW because ANSI C requires a diagnostic when overflows occur in constant expressions.
Ø TREE_SYMBOL_REFERENCED
(IDENTIFIER_NODE_CHECK (NODE)->common.static_flag)
² In an IDENTIFIER_NODE, this means that assemble_name was called with this string as an argument.
Ø CLEANUP_EH_ONLY ((NODE)->common.static_flag)
² In a TARGET_EXPR, WITH_CLEANUP_EXPR, CLEANUP_STMT, or element of a block's cleanup list, means that the pertinent cleanup should only be executed if an exception is thrown, not on normal exit of its scope.
Ø TREE_OVERFLOW ((NODE)->common.public_flag)
² In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means there was an overflow in folding, and no warning has been issued for this subexpression. TREE_OVERFLOW implies TREE_CONSTANT_OVERFLOW, but not vice versa.
Ø TREE_PUBLIC((NODE)->common.public_flag)
² In a VAR_DECL or FUNCTION_DECL, nonzero means name is to be accessible from outside this module. In an IDENTIFIER_NODE, nonzero means an external declaration accessible from outside this module was previously seen for this name in an inner scope.
Ø TREE_PRIVATE ((NODE)->common.private_flag)
² Used in classes in C++.
Ø CALL_EXPR_HAS_RETURN_SLOT_ADDR ((NODE)->common.private_flag)
² In a CALL_EXPR, means that the address of the return slot is part of the argument list.
Ø TREE_PROTECTED ((NODE)->common.protected_flag)
² Used in classes in C++. In a BLOCK node, this is BLOCK_HANDLER_BLOCK.
Ø CALL_FROM_THUNK_P ((NODE)->common.protected_flag)
² In a CALL_EXPR, means that the call is the jump from a thunk to the thunked-to function.
Ø TREE_SIDE_EFFECTS ((NODE)->common.side_effects_flag)
² In any expression, nonzero means it has side effects or reevaluation of the whole expression could produce a different value. This is set if any subexpression is a function call, a side effect or a reference to a volatile variable.
² In a *_DECL, this is set only if the declaration said `volatile'.
Ø TREE_THIS_VOLATILE ((NODE)->common.volatile_flag)
² Nonzero means this expression is volatile in the C sense: its address should be of type `volatile WHATEVER *'. In other words, the declared item is volatile qualified. This is used in *_DECL nodes and *_REF nodes.
² In a *_TYPE node, means this type is volatile-qualified. But use TYPE_VOLATILE instead of this macro when the node is a type, because eventually we may make that a different bit. If this bit is set in an expression, so is TREE_SIDE_EFFECTS.
Ø TYPE_VOLATILE (TYPE_CHECK (NODE)->common.volatile_flag)
² Nonzero in a type considered volatile as a whole.
Ø TREE_READONLY ((NODE)->common.readonly_flag)
² In a VAR_DECL, PARM_DECL or FIELD_DECL, or any kind of *_REF node, nonzero means it may not be the lhs of an assignment.
² In a *_TYPE node, means this type is const-qualified (but the macro TYPE_READONLY should be used instead of this macro when the node is a type).
Ø TYPE_READONLY (TYPE_CHECK (NODE)->common.readonly_flag)
² Means this type is const-qualified.
Ø TREE_CONSTANT ((NODE)->common.constant_flag)
² Value of expression is constant. Always appears in all *_CST nodes. May also appear in an arithmetic expression, an ADDR_EXPR or a CONSTRUCTOR if the value is constant.
Ø TREE_UNSIGNED ((NODE)->common.unsigned_flag)
² In INTEGER_TYPE or ENUMERAL_TYPE nodes, means an unsigned type. In FIELD_DECL nodes, means an unsigned bit field.
Ø TREE_ASM_WRITTEN ((NODE)->common.asm_written_flag)
² Nonzero in a VAR_DECL means assembler code has been written.
² Nonzero in a FUNCTION_DECL means that the function has been compiled. This is interesting in an inline function, since it might not need to be compiled separately.
² Nonzero in a RECORD_TYPE, UNION_TYPE, QUAL_UNION_TYPE or ENUMERAL_TYPE if the sdb debugging info for the type has been written.
² In a BLOCK node, nonzero if reorder_blocks has already seen this block.
Ø TREE_USED ((NODE)->common.used_flag)
² Nonzero in a *_DECL if the name is used in its scope.
² Nonzero in an expr node means inhibit warning if value is unused.
² In IDENTIFIER_NODEs, this means that some extern decl for this name was used.
Ø TREE_NOTHROW ((NODE)->common.nothrow_flag)
² In a FUNCTION_DECL, nonzero means a call to the function cannot throw an exception. In a CALL_EXPR, nonzero means the call cannot throw.
Ø TYPE_ALIGN_OK (TYPE_CHECK (NODE)->common.nothrow_flag)
² In a type, nonzero means that all objects of the type are guaranteed by the language or front-end to be properly aligned, so we can indicate that a MEM of this type is aligned at least to the alignment of the type, even if it doesn't appear that it is. We see this, for example, in object-oriented languages where a tag field may show this is an object of a more-aligned variant of the more generic type.
Ø TREE_DEPRECATED ((NODE)->common.deprecated_flag)
² Nonzero in an IDENTIFIER_NODE if the use of the name is defined as a deprecated feature by __attribute__((deprecated)).
List 1 flags in tree_common

1.1.1. Tree node definition

The definition of tree node is given below.

45 typedef union tree_node *tree; in coretypes.h

1772 union tree_node GTY ((ptr_alias (union lang_tree_node), in tree.h
1773 desc ("tree_node_structure (&%h)")))
1774 {
1775 struct tree_common GTY ((tag ("TS_COMMON"))) common;
1776 struct tree_int_cst GTY ((tag ("TS_INT_CST"))) int_cst;
1777 struct tree_real_cst GTY ((tag ("TS_REAL_CST"))) real_cst;
1778 struct tree_vector GTY ((tag ("TS_VECTOR"))) vector;
1779 struct tree_string GTY ((tag ("TS_STRING"))) string;
1780 struct tree_complex GTY ((tag ("TS_COMPLEX"))) complex;
1781 struct tree_identifier GTY ((tag ("TS_IDENTIFIER"))) identifier;
1782 struct tree_decl GTY ((tag ("TS_DECL"))) decl;
1783 struct tree_type GTY ((tag ("TS_TYPE"))) type;
1784 struct tree_list GTY ((tag ("TS_LIST"))) list;
1785 struct tree_vec GTY ((tag ("TS_VEC"))) vec;
1786 struct tree_exp GTY ((tag ("TS_EXP"))) exp;
1787 struct tree_block GTY ((tag ("TS_BLOCK"))) block;
1788 };

No doubt, the node is defined as union. Notice ptr_alias at line 1772, it tells GTY (garbage collection service in GCC, we ignore it here for the moment) that pointer to tree_node in fact points to lang_tree_node which is front-end specified and builds up from tree_node with extra fields for the language characteristics (so it can be pointed by pointer of tree_node). In C++ front-end, lang_tree_node has following definition.

472 union lang_tree_node GTY((desc ("cp_tree_node_structure (&%h)"), in cp-tree.h
473 chain_next ("(union lang_tree_node *)TREE_CHAIN (&%h.generic)")))
474 {
475 union tree_node GTY ((tag ("TS_CP_GENERIC"),
476 desc ("tree_node_structure (&%h)"))) generic;
477 struct template_parm_index_s GTY ((tag ("TS_CP_TPI"))) tpi;
478 struct ptrmem_cst GTY ((tag ("TS_CP_PTRMEM"))) ptrmem;
479 struct tree_overload GTY ((tag ("TS_CP_OVERLOAD"))) overload;
480 struct tree_baselink GTY ((tag ("TS_CP_BASELINK"))) baselink;
481 struct tree_wrapper GTY ((tag ("TS_CP_WRAPPER"))) wrapper;
482 struct tree_default_arg GTY ((tag ("TS_CP_DEFAULT_ARG"))) default_arg;
483 struct lang_identifier GTY ((tag ("TS_CP_IDENTIFIER"))) identifier;
484 };
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: