non-virtual thunk for Virtual Function in multiple inheritance
2014-06-30 11:29
519 查看
转载自
http://thomas-sanchez.net/computer-sciences/2011/08/15/what-every-c-programmer-should-know-the-hard-part/
Previously, I explained how C++ does to handle the classes and inheritance between them. But, I did not cover how the virtual is
handled.
It adds a lot of complexity, C++ is compiled and when a binary is linked against a library they have to speak the same language: they have to share the same ABI. The C++ creators had to find a way to give along the program lifetime metadata about the manipulated
classes.
They chose the Virtual Tables.
When a C++ program is compiled, the binary embedded some information about the manipulated classes by the program. When a class inherits from an interface, the actual implementation of the method should always be accessible. The Virtual Table (VTable) are generated
during the compilation process,they can be seen as array of method pointers.
Let’s take an example:
I recall that all the test have been done on a Linux 64bits.
The size of a
but 16bytes. The memory dump shows that the first field of the class is not the value of
a strange value and our field come next to it. Our ‘strange’ value is actually a pointer, in fact it is a pointer inside our binary.
I will explain after why there is a difference of some bytes between the two. So this pointer represent the location of the
We can now check its content.
As I said, a VTable is a kind of array of method pointer.
To get a pointer on it, it is simply:
And if we check the new address printed on the output we can see that it is actually our pointer on method.
We can play a little bite more to test deeper:
The VTable are determined along the compilation. When the compiler see a virtual method in a class in start to construct a VTable associated to this class. When this class is inherited by another one, it will automatically duplicate and receive a pointer on
a VTable for the current parsed class. Each entry of the VTable will be filled when the actual definition of the method is encountered. It is always the last definition which is kept.
The index of the method in the VTable is the same as the apparition order in the source file, that's why it's very important that all the part of a project is compiled with consistent header. It is always embarrassing when the bad method is called in a project
without knowing why…
Here is the complete code:
In conclusion, when virtual appears an instance should be seen like this:
And the instance is heavier of
As usual, we are going to start with a trivial code:
As you can note, the two table used are different. When the types are manipulated, this is not always (never?) the concrete type used but the abstract one. With multiple inheritance it can be a
a
used and the actual implementation is in
method should be accessible. That's why there is another VTable pointer.
However, when an instance of type
a
cannot be called directly. Indeed, the instance pointer needs to be adjusted to match a
To solve this problem, there are the Thunk function.
If we print the first entry of the VTable and if we disassemble the code a this location, we have this:
These two instructions perform pointer adjustment by subtracting the size of the
(and then match the
you have multiple inheritance with method you can add some indirection very easily:
Get the VTable;
Move to the wanted method (apply an offset on the VTable pointer, for example 8 to get the second method);
Call the method;
Adjust the this pointer;
Jump to the actual method definition.
Yes, method pointer have a cost. Contrary to the C where function pointers have no overhead, the C++ had to deal with the difference between:
From which instance the method is accessed;
Is the method virtual?
The first point require a pointer adjustment. The second point, well, lot of things.
Firstly, the size of a method pointer is 16 bytes (against 8 in C). The method pointer is in three parts:
Offset
Address/index
virtual?
The first one is on 8 bytes, the second on 8 bytes also. The third part is on one byte and is merged with the second one. If the last byte is set then the second part should be seen as an index (the index of the method in the VTable), otherwise it is the address
of the method.
Therefore, calling a method pointer require ~ 20 asm instructions (in the worst case):
Get the offset to apply on the instance pointer;
Apply it;
Check if we call a virtual member function;
If yes, subtract 1;
Get the VTable;
Get the method address;
Call the method.
In a next article I'll cover the VTable prefix and the virtual inheritance but there are less common in C++ code. In these two articles I tried to put some light on C++'s internal mechanism. The C++ is a fast language but it can become much less efficient because
of complex class relation. I don't say: "don't use virtual and method pointer", I think programmers should be aware of these counterparts.
I think the readability is more important than performances. Yes, you can have a lot of overhead in C++ but it will still be more efficient than a lot of languages. But sometimes you can avoid virtualization. For example, the common ways for a beginner (and
sometimes less beginners C++ programmers) to do an abstraction is to define an interface and for the different implementation, define a new class which inherits from this interface.
Sometimes, ok it is the right thing to do, sometimes not. If you are asked to write an abstraction to the filesystem on Linux and Windows if you follow the described way, you'll write an
a
It'll work well but you can do even better: You can write a
define a new type
the code is compiled, on Linux we could imagine something like this:
With a code like this, you'll avoid some overheard due to the interface. It works well on abstraction of platform specific features but it does not work on data abstraction and you'll need an interface.
Here are some resources:
CRTP
Wikipedia
http://thomas-sanchez.net/computer-sciences/2011/08/15/what-every-c-programmer-should-know-the-hard-part/
What every C++ programmer should know, The hard part
Previously, I explained how C++ does to handle the classes and inheritance between them. But, I did not cover how the virtual ishandled.
It adds a lot of complexity, C++ is compiled and when a binary is linked against a library they have to speak the same language: they have to share the same ABI. The C++ creators had to find a way to give along the program lifetime metadata about the manipulated
classes.
They chose the Virtual Tables.
The Virtual Table
When a C++ program is compiled, the binary embedded some information about the manipulated classes by the program. When a class inherits from an interface, the actual implementation of the method should always be accessible. The Virtual Table (VTable) are generatedduring the compilation process,they can be seen as array of method pointers.
Let’s take an example:
01 | #include <iostream> |
02 |
03 | struct Interface |
04 | { |
05 | Interface() : i(0x424242) {} |
06 | virtual void test_method() = 0; |
07 | virtual ~Interface(){} |
08 | int i; |
09 | }; |
10 |
11 | struct Daughter : public Interface |
12 | { |
13 | void test_method() |
14 | { |
15 | std::cout << "This is a call to the method" << std::endl; |
16 | std::cout << "This: " << this << std::endl; |
17 | } |
18 | }; |
19 |
20 | int main() |
21 | { |
22 | Daughter* d = new Daughter; |
23 | Interface* i = d; |
24 |
25 | i->test_method(); |
26 |
27 | std::cout << sizeof (Daughter) |
28 | std::cout <<*(( void **)i) |
29 | std::cout <<(( void **)i)[1] |
30 | } |
The size of a
Daughterinstance is not 8 as we could expect
but 16bytes. The memory dump shows that the first field of the class is not the value of
ibut
a strange value and our field come next to it. Our ‘strange’ value is actually a pointer, in fact it is a pointer inside our binary.
nm -C test | grep 400d 0000000000400de0 V vtable for Daughter
I will explain after why there is a difference of some bytes between the two. So this pointer represent the location of the
DaughterVTable.
We can now check its content.
As I said, a VTable is a kind of array of method pointer.
To get a pointer on it, it is simply:
size_t** vtable = *(size_t***)i;
std::cout <<vtable[0] <<std::endl;
And if we check the new address printed on the output we can see that it is actually our pointer on method.
nm -C test | grep -E 400c6a 0000000000400c6a W Daughter::test_method()
We can play a little bite more to test deeper:
typedef void (*VtablePtr) (Daughter*); VtablePtr ptr = (VtablePtr)vtable[0]; ptr(d);
The VTable are determined along the compilation. When the compiler see a virtual method in a class in start to construct a VTable associated to this class. When this class is inherited by another one, it will automatically duplicate and receive a pointer on
a VTable for the current parsed class. Each entry of the VTable will be filled when the actual definition of the method is encountered. It is always the last definition which is kept.
The index of the method in the VTable is the same as the apparition order in the source file, that's why it's very important that all the part of a project is compiled with consistent header. It is always embarrassing when the bad method is called in a project
without knowing why…
Here is the complete code:
01 | #include <iostream> |
02 |
03 | struct Interface |
04 | { |
05 | Interface() : i(0x424242) {} |
06 | virtual void test_method() = 0; |
07 | virtual ~Interface(){} |
08 | int i; |
09 | }; |
10 |
11 | struct Daughter : public Interface |
12 | { |
13 | void test_method() |
14 | { |
15 | std::cout << "This is a call to the method" << std::endl; |
16 | std::cout << "This: " << this << std::endl; |
17 | } |
18 | }; |
19 |
20 | int main() |
21 | { |
22 | Daughter* d = new Daughter; |
23 | Interface* i = d; |
24 |
25 | i->test_method(); |
26 |
27 | std::cout << sizeof (Daughter) |
28 | std::cout <<*(( void **)i) |
29 | std::cout <<(( void **)i)[1] |
30 |
31 | size_t ** vtable = *( size_t ***)i; |
32 | std::cout <<vtable[0] <<std::endl; |
33 |
34 | typedef void (*VtablePtr) (Daughter*); |
35 | VtablePtr ptr = (VtablePtr)vtable[0]; |
36 | ptr(d); |
37 |
38 | } |
VPTR Base1 Daughter
And the instance is heavier of
sizeof(void*)*nb_of_vptrbytes.
Virtual in multiple inheritance
As usual, we are going to start with a trivial code:01 | #include <iostream> |
02 |
03 | struct Mother |
04 | { |
05 | virtual void mother()=0; |
06 | virtual ~Mother() {} |
07 | int i; |
08 | }; |
09 |
10 | struct Father |
11 | { |
12 | virtual void father()=0; |
13 | virtual ~Father() {} |
14 | int j; |
15 | }; |
16 |
17 | struct Daughter : public Mother, public Father |
18 | { |
19 | void mother() |
20 | { "Mother: " << this << std::endl; } |
21 |
22 | void father() |
23 | { "Father: " << this << std::endl; } |
24 |
25 | int k; |
26 | }; |
27 |
28 | int main() |
29 | { |
30 | Daughter* d = new Daughter; |
31 | Mother* m = d; |
32 | Father* f = d; |
33 |
34 | std::cout << "Daughter: " << ( void *)d |
35 | std::cout << "Father : " << ( void *)f |
36 | std::cout << sizeof (*d) |
37 |
38 | std::cout <<*(( void **)d) |
39 | std::cout <<*(( void **)f) |
40 | } |
Motheror
a
Fatherinstances, so when a
Fatheris
used and the actual implementation is in
Daughter, the
method should be accessible. That's why there is another VTable pointer.
However, when an instance of type
Daughteris used through
a
Fatherpointer,
Daughtermethod
cannot be called directly. Indeed, the instance pointer needs to be adjusted to match a
Daughterinstance.
To solve this problem, there are the Thunk function.
If we print the first entry of the VTable and if we disassemble the code a this location, we have this:
1 | 0000000000400cf4 <non- virtual thunk to Daughter::father()>: |
2 | 400cf4: |
3 | 400cf8: |
Motherclass
(and then match the
Daughterinstance). Therefore, if
you have multiple inheritance with method you can add some indirection very easily:
Get the VTable;
Move to the wanted method (apply an offset on the VTable pointer, for example 8 to get the second method);
Call the method;
Adjust the this pointer;
Jump to the actual method definition.
Method Pointer
Yes, method pointer have a cost. Contrary to the C where function pointers have no overhead, the C++ had to deal with the difference between:From which instance the method is accessed;
Is the method virtual?
The first point require a pointer adjustment. The second point, well, lot of things.
Firstly, the size of a method pointer is 16 bytes (against 8 in C). The method pointer is in three parts:
Offset
Address/index
virtual?
The first one is on 8 bytes, the second on 8 bytes also. The third part is on one byte and is merged with the second one. If the last byte is set then the second part should be seen as an index (the index of the method in the VTable), otherwise it is the address
of the method.
Therefore, calling a method pointer require ~ 20 asm instructions (in the worst case):
Get the offset to apply on the instance pointer;
Apply it;
Check if we call a virtual member function;
If yes, subtract 1;
Get the VTable;
Get the method address;
Call the method.
Conclusion
In a next article I'll cover the VTable prefix and the virtual inheritance but there are less common in C++ code. In these two articles I tried to put some light on C++'s internal mechanism. The C++ is a fast language but it can become much less efficient becauseof complex class relation. I don't say: "don't use virtual and method pointer", I think programmers should be aware of these counterparts.
I think the readability is more important than performances. Yes, you can have a lot of overhead in C++ but it will still be more efficient than a lot of languages. But sometimes you can avoid virtualization. For example, the common ways for a beginner (and
sometimes less beginners C++ programmers) to do an abstraction is to define an interface and for the different implementation, define a new class which inherits from this interface.
Sometimes, ok it is the right thing to do, sometimes not. If you are asked to write an abstraction to the filesystem on Linux and Windows if you follow the described way, you'll write an
iFSinterface,
a
WindowsFSand a
LinuxFS.
It'll work well but you can do even better: You can write a
WindowsFSand
LinuxFSand
define a new type
FSaccording to the platform where
the code is compiled, on Linux we could imagine something like this:
typedef LinuxFS FS;
With a code like this, you'll avoid some overheard due to the interface. It works well on abstraction of platform specific features but it does not work on data abstraction and you'll need an interface.
Here are some resources:
CRTP
Wikipedia
相关文章推荐
- non-virtual thunk for Virtual Function in multiple inheritance
- Memory Layout for Multiple and Virtual Inheritance (一) (部分翻译)
- Memory Layout for Multiple and Virtual Inheritance (转载--By Edsko de Vries, January 2006)
- a missing vtable usually means the first non-inline virtual member function has no definition.
- C++ 多继承和虚继承的内存布局(Memory Layout for Multiple and Virtual Inheritance)
- NOTE: a missing vtable usually means the first non-inline virtual member function has no definition.
- NOTE: a missing vtable usually means the first non-inline virtual member function has no definition.
- Memory Layout for Multiple and Virtual Inheritance
- A missing vtable usually means the first non-inline virtual member function has no definition
- 条款二十四:了解virtual functions、multiple inheritance、virtual base class、runtime type identification的成本
- Memory Layout for Multiple and Virtual Inheritance
- C++ 多继承和虚继承的内存布局(Memory Layout for Multiple and Virtual Inheritance )
- JNI: Passing multiple parameters in the function signature for GetMethodID
- a missing vtable usually means the first non-inline virtual member function 问题
- Memory Layout for Multiple and Virtual Inheritance
- Question 20: Which allocator member function do standard containers use to acquire storage for their elements in C++?
- [C++]Call virtual member function in constructor or destructor
- [GOOD Article] How to disable web.config Inheritance for Child Applications in Subfolders in ASP.NET?
- Fatal error: Call to a member function setAttribute() on a non-object in Magento
- Question 48: In C++, which of the following statements accurately describe a base class destructor calling a virtual function ov