您的位置:首页 > 产品设计 > UI/UE

Want Speed? Pass by Value.

2013-12-24 14:06 351 查看


PRE-C++11



Want Speed? Pass by Value.

This entry is part of a series, RValue
References: Moving Forward»Entries
in this series:

Want Speed? Pass by Value.

Making Your Next Move

Your Next Assignment...

Exceptionally Moving!

Onward, Forward!

Powered
by Hackadelic Sliding Notes 1.6.5
Be honest: how does the following code make you feel?

std::vector<std::string> get_names();
…
std::vector<std::string> const names = get_names();

Frankly, even though I should know better, it makes me nervous.In principle, when
get_names()
returns, we have to copy a
vector
of
string
s.
Then, we need to copy it again when we initialize
names
, and we need to destroy the first copy.
If there are N strings in the vector, each copy could require as many as N+1 memory allocations and a whole slew of cache-unfriendly data accesses as the string contents are copied.
Rather than confront that sort of anxiety,I’ve often fallen back on pass-by-reference to avoid needless copies:

get_names(std::vector<std::string>& out_param );
…
std::vector<std::string> names;
get_names( names );

Unfortunately, this approach is far from ideal.

The code grew by 150%

We’ve had to drop
const
-ness because we’re mutating
names
.

As functional programmers like to remind us, mutation makes code more complex to reason about by undermining referential
transparency and equational reasoning.

We no longer have strict value semantics1 for
names
.

But is it really necessary to mess up our code in this way to gain efficiency? Fortunately, the answer turns out to be no (and especially not if you are using C++0x). This article is the first in a series that explores
rvalues and their impliciations for efficient value semantics in C++.

RValues

Rvalues are expressions that create anonymous temporary objects. The name rvalue refers to the fact that an rvalue expression of builtin type can only appear on the right-hand side of an
assignment. Unlike lvalues, which, when non-
const
, can always be used on the left-hand-side of an assignment, rvalue expressions yield objects without any persistent identity to assign into.2
The important thing about anonymous temporaries for our purposes, though, is that they can only be used once in an expression. How could you possibly refer to such an object a second time? It doesn’t have a name
(thus, “anonymous”); and after the full expression is evaluated, the object is destroyed (thus, “temporary”)!
Once you know you are copying from an rvalue, then, it should be possible to “steal” the expensive-to-copy resources from the source object and use them in
the target object without anyone noticing. In this case that would mean transferring ownership of the source vector’s dynamically-allocated array of strings to the target vector. If we could somehow get the compiler to execute that “move” operation
for us, it would be cheap–almost free–to initialize
names
from a vector returned by-value.
That would take care of the second expensive copy, but what about the first?When
get_names
returns, in principle, it has to copy the function’s return value
from the inside of the function to the outside. Well, it turns out that return values have the same property as anonymous temporaries: they are about to be destroyed, and won’t be used again. So, we could eliminate the first expensive copy in the same way,
transferring the resources from the return value on the inside of the function to the anonymous temporary seen by the caller.

Copy Elision and the RVO

The reason I kept writing above that copies were made “in principle” is that the compiler is actually allowed to perform some optimizations based on the same principles we’ve just discussed.
This class of optimizations is known formally as copy elision. For example, in the Return Value Optimization (RVO),the
calling function allocates space for the return value on its stack,and passes the address of that memory to the callee.The callee can then construct a return value directly into that space,
which eliminates the need to copy from inside to outside.The copy is simply elided, or “edited out,” by the compiler. So in code like the following, no copies are required:

std::vector<std::string> names = get_names();

Also, although the compiler isnormally required to make a copy when a function parameter is passed by value
(so modifications to the parameter inside the function can’t affect the caller), it is allowed to elide the copy, and simply use the source object itself, when the source is an rvalue.

1
2
3
4
5
6
7
8
9
10
11
12

std::vector<std::string>
sorted(std::vector<std::string> names)
{
std::sort(names);
return names;
}

// names is an lvalue; a copy is required so we don't modify names
std::vector<std::string> sorted_names1 = sorted( names );

// get_names() is an rvalue expression; we can omit the copy!
std::vector<std::string> sorted_names2 = sorted( get_names() );

This is pretty remarkable. In principle, in line 12 above, the compiler can eliminate all the worrisome copies, making
sorted_names2
the same object as the one created in
get_names()
.
In practice, though, the principle won’t take us quite that far, as I’ll explain later.

Implications

Although copy elision is never required by the standard, recent versions of every compiler I’ve tested do perform these optimizations today. But even if you don’t
feel comfortable returning heavyweight objects by value, copy elision should still change the way you write code.
Consider this cousin of our original
sorted(…)
function, which takes
names
by
const
reference and makes an explicit copy:

std::vector<std::string>
sorted2(std::vector<std::string> const& names) // names passed by reference
{
std::vector<std::string> r(names);        // and explicitly copied
std::sort(r);
return r;
}

Although
sorted
and
sorted2
seem at first to be identical, there could be a huge performance difference if a compiler does copy elision. Even if the actual argument to
sorted2
is
an rvalue, the source of the copy,
names
, is an lvalue,3 so
the copy can’t be optimized away. In a sense, copy elision is a victim of the separate compilation model: inside the body of
sorted2
, there’s no information about whether the actual argument to the function is an rvalue; outside, at the call site,
there’s no indication that a copy of the argument will eventually be made.
That realization leads us directly to this guideline:

Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying.

At worst, if your compiler doesn’t elide copies, performance will be no worse. At best, you’ll see an enormous performance boost.
One place you can apply this guideline immediately is in assignment operators. The canonical, easy-to-write, always-correct, strong-guarantee, copy-and-swap assignment operator is often seen written this way:

T& T::operator=(T const& x) // x is a reference to the source
{
T tmp(x);          // copy construction of tmp does the hard work
swap(*this, tmp);  // trade our resources for tmp's
return *this;      // our (old) resources get destroyed with tmp
}

but in light of copy elision, that formulation is glaringly inefficient! It’s now “obvious” that the correct way to write a copy-and-swap assignment is:

T& operator=(T x)    // x is a copy of the source; hard work already done
{
swap(*this, x);  // trade our resources for x's
return *this;    // our (old) resources get destroyed with x
}

Reality Bites

Of course, lunch is never really free, so I have acouple of caveats.
First, when you pass parameters by reference and copy in the function body, the copy constructor is called from one central location. However, when you pass parameters
by value, the compiler generates calls to the copy constructor at the site of each call where lvalue arguments are passed. If the function will be called from many places and code size or localityare serious considerations
for your application, it could have a real effect.
On the other hand, it’s easy to build a wrapper function that localizes the copy:

std::vector<std::string>
sorted3(std::vector<std::string> const& names)
{
// copy is generated once, at the site of this call
return sorted(names);
}

Since the converse doesn’t hold—you can’t get back a lost opportunity for copy elision by wrapping—I recommend you start by following the guideline, and make changes
only as you find them [Edit ”only as you find wrapping func is necessary] to be necessary.

Second, I’ve
[Edit "not" is missing here ?] yet to find a compiler that will elide the copy when a function parameter is returned, as in our implementation of
sorted
. When you think about how these elisions are done,
it makes sense: without some form of inter-procedural optimization, the caller of
sorted
can’t know that the argument (and not some other object)will
eventually be returned, so the compiler must allocate separate space on the stack for the argument and the return value.
If you need to return a function parameter, you can still get near-optimal performance by swapping into a default-constructed return value (provided default construction
and swap are cheap, as they should be):

std::vector<std::string>
sorted(std::vector<std::string> names)
{
std::sort(names);
std::vector<std::string> ret;  # [Edit this is smart!]
swap(ret, names);
return ret;
}

More To Come

Hopefully you now have the ammunition you need to stave off anxiety about passing and returning nontrivial objects by value. But we’re not done yet: now that we’ve covered rvalues, copy elision, and the RVO,
we have all the background we need to attack move semantics, rvalue references, perfect forwarding, and more as we continue this article series. See you soon!

Follow this link to the next installment.

Acknowledgements

Howard Hinnant is responsible for key insights that make this article series possible. Andrei Alexandrescu was posting on comp.lang.c++.moderated about how to leverage copy elision years before I took it seriously.
Most of all, though, thanks in general to all readers and reviewers!

Googling for a good definition of value semantics turned up nothing for me. Unless someone else can point to one (and maybe even if they can), we’ll be running an article on that topic—in which I promise
you a definition—soon.

For a detailed treatment of rvalues and lvalues, please see this excellent article
by Dan Saks

Except for
enum
s and non-type template parameters, every
value with a name is an lvalue.

===========================

Valuable Comments from Howard Hinnant

I think this article should be updated for C++11. There are two things wrong with it:

It leaves the impression that one should always write your assignment operator like so:

T& operator=(T x)    // x is a copy of the source; hard work already done
{
swap(*this, x);  // trade our resources for x's
return *this;    // our (old) resources get destroyed with x
}


But in some important cases, this is a large performance penalty. Vector-like classes where heap memory can be reused during the copy assignment is a classic example. I’ve just written a short example showing as
high as a 7X performance penalty.

In C++11 the correct way to write sorted is:

std::vector<std::string>
sorted(std::vector<std::string> names)
{
std::sort(names.begin(), names.end());
return names;
}


Implicit return-by-move from by-value parameters is now required.

The basic point of the article is sound: Passing by value is an important tool in the tool box. But I’ve seen too many references to this article that mistakenly throw design and testing out the window on this issue, and translate this article into “always pass
by value”.
#include <cstddef>

#include <new>

#include <utility>

template <class T>

class MyVector

{

T* begin_;

T* end_;

T* capacity_;

public:

MyVector()

: begin_(nullptr),

end_(nullptr),

capacity_(nullptr)

{}

~MyVector()

{

clear();

::operator delete(begin_);

}

MyVector(std::size_t N, const T& t)

: MyVector()

{

if (N > 0)

{

begin_ = end_ = static_cast<T*>(::operator new(N*sizeof(T)));

capacity_ = begin_ + N;

for (; N > 0; --N, ++end_)

::new(end_) T(t);

}

}

MyVector(const MyVector& v)

: MyVector()

{

std::size_t N = v.size();

if (N > 0)

{

begin_ = end_ = static_cast<T*>(::operator new(N*sizeof(T)));

capacity_ = begin_ + N;

for (std::size_t i = 0; i < N; ++i, ++end_)

::new(end_) T(v[i]);

}

}

MyVector(MyVector&& v)

: begin_(v.begin_),

end_(v.end_),

capacity_(v.capacity_)

{

v.begin_ = nullptr;

v.end_ = nullptr;

v.capacity_ = nullptr;

}

#ifndef USE_SWAP_ASSIGNMENT

MyVector& operator=(const MyVector& v)

{

if (this != &v)

{

std::size_t N = v.size();

if (capacity() < N)

{

clear();

::operator delete(begin_);

begin_ = end_ = static_cast<T*>(::operator new(N*sizeof(T)));

capacity_ = begin_ + N;

}

T* p = begin_;

const T* q = v.begin_;

for (; p < end_ && q < v.end_; ++p, ++q)

*p = *q;

if (q < v.end_)

{

for (; q < v.end_; ++q, ++end_)

::new(end_) T(*q);

}

else

{

while (end_ > p)

{

--end_;

end_->~T();

}

}

}

return *this;

}

MyVector& operator=(MyVector&& v)

{

clear();

swap(v);

return *this;

}

#else

MyVector& operator=(MyVector v)

{

swap(v);

return *this;

}

#endif

void clear()

{

while (end_ > begin_)

{

--end_;

end_->~T();

}

}

std::size_t size() const

{return static_cast<std::size_t>(end_ - begin_);}

std::size_t capacity() const

{return static_cast<std::size_t>(capacity_ - begin_);}

const T& operator[](std::size_t i) const

{return begin_[i];}

T& operator[](std::size_t i)

{return begin_[i];}

void swap(MyVector& v)

{

std::swap(begin_, v.begin_);

std::swap(end_, v.end_);

std::swap(capacity_, v.capacity_);

}

};

template <class T>

inline

void

swap(MyVector<T>& x, MyVector<T>& y)

{

x.swap(y);

}

#include <iostream>

#include <string>

#include <chrono>

int main()

{

MyVector<std::string> v1(1000, "1234567890123456789012345678901234567890");

MyVector<std::string> v2(1000, "1234567890123456789012345678901234567890123456789");

typedef std::chrono::high_resolution_clock Clock;

typedef std::chrono::duration<double, std::micro> US;

auto t0 = Clock::now();

v2 = v1;

auto t1 = Clock::now();

std::cout << US(t1-t0).count() << " microseconds\n";

}

$ clang++ -stdlib=libc++ -std=c++11 -O3 -DUSE_SWAP_ASSIGNMENT test.cpp

$ a.out

174.516 microseconds

$ a.out

180.83 microseconds

$ a.out

175.848 microseconds

$ clang++ -stdlib=libc++ -std=c++11 -O3 test.cpp

$ a.out

26.339 microseconds

$ a.out

24.179 microseconds

$ a.out

24.103 microseconds

From

http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: