您的位置：首页 > 运维架构

zz Overlooked Essentials For Optimizing Code

2010-10-09 17:49 363 查看

1 Using Profiler to find out where the bottleneck is,

2 By reading assembly codes to find out why the bottleneck is raised

http://www.drdobbs.com/blog/archives/2010/09/overlooked_esse.html

Overlooked Essentials For Optimizing Code

Sep 10, 2010

I've been programming for 35 years now, and I've done a lot of work optimizing programs for speed (an example
), and watching others optimize. Two essential techniques are consistently ignored.

Nope, it isn't avoiding premature optimization. It isn't replacing
bubble sort with quicksort (i.e. algorithmic improvements). It's not
what language used, nor is it how good the compiler is. It isn't writing
i<<2 instead of i*4.

It is:

Using a profiler

Looking at the assembly code being executed

The people who do those are successful in writing fast code, the ones who do not are not. Let me explain.

Using A Profiler

The old programming saw is that a program spends 90% of its time in
10% of the code. I've found that to not be true. Over and over, I've
found that programs spends 99% of its time in 1% of the code. But which
1%? A profiler will tell you. Spending 100 hours of dev time on that 1%
will yield real benefits, while 100 hours on the other 99% will not
produce much of anything worthwhile. (Comment: 80-20 principle, even more

)

What's the problem? Don't people use profilers? Nope. One place I
worked at had a fancy expensive profiler that was still in its shrink
wrap 3 years after purchase. Why don't people use profilers? I don't
really know. I once got into a heated exchange with a colleague who
insisted he knew where the bottlenecks were; after all, he was an
experienced professional. I finally ran the profiler myself on his
project, and of course the bottleneck was in a completely unexpected
place.

Consider auto racing. The team that wins has sensors and logging on
just about everything they can stick a sensor on. You can race using
seat-of-the-pants tuning and have a jolly good time on the track, but
you won't win and you won't even be competitive. You won't know if your
poor speeds are caused by the engine, the exhaust, the aerodynamics, the
tire pressure, or the driver. Why should programming be any different?
You can't improve what you can't measure.

There are lots of profilers available. You can get ones that look at
the hierarchy of function calls, function times, times broken down for
each statement, and even at the instruction level. I've seen too many
programmers eschew profilers, preferring instead to whittle away their
time with useless and misdirected "optimizations" and getting trounced
by their competitors.

Looking At The Assembly Code

Years ago, I had a colleague, Mary Bailey, who taught remedial
algebra at the University of Washington. She told me once that when she
wrote on the board:

x + 3 = 5

and asked her students to "solve for x", they couldn't answer. But, if she wrote:

__ + 3 = 5

and asked the students to "fill in the blank" all of them could do
it. It seems that the magic word "x" seemed to cause them to reflexively
think "x means algebra, I don't understand algebra, I can't do this."

Assembler is the algebra of the programming world. If someone asks me
"was my function inlined by the compiler" or "if I write i*4, will the
compiler optimize it to a left shift" I'll suggest they look at the asm
output of the compiler. The reaction is how rude and unhelpful could I
be? The person will follow up by saying he doesn't know assembler. Even
C++ experts will say this. (Comment: Assembly codes are just simple, don't be afraid. You think they are hard just because you think they are assembler just like the students can't get X because they think "x means algebra")

Assembler is the simplest language (especially compared with C++!). For example,

ADD ESI,x

is (expressed in C style):

ESI += x;

and:

CALL foo

is:

foo();

Details vary among CPUs, but that's how it works. It's not even
really necessary to know that. Just looking at the assembler output and
comparing it to the source code will tell a LOT.

How does this help optimization? For example, I knew a programmer
years ago who thought he'd discovered a new, faster algorithm to do X.
I'm being deliberately vague to protect him. He had the benchmarks to
prove it, and wrote a nice article about it. But then someone looked at
the assembler output of the regular way, and his new fast way. It turns
out that the way he'd written his improved version had allowed the
compiler to replace two DIV instructions with one. This had really
nothing to do with his algorithm. But DIV is an expensive instruction,
and this was in the inner loop, and so his algorithm appeared to be
faster. The regular implementation could also be recoded slightly to use
only one DIV, too, and it would perform just as fast as the new
algorithm. He had discovered nothing.

For my next example, a D user posted a benchmark showing that dmd
(Digital Mars D compiler) was lousy at integer arithmetic, while ldc
(LLVM D compiler) was much better. Being very concerned about such a
result, I promptly looked at the assembler output. It was pretty much
equivalent, nothing stood out as being accountable for a 2:1 difference.
But there was a long divide in there, done with a call to a runtime
library function. That function call completely dominated the timing
results, all the adds and subtracts in the benchmark had no significant
impact on the speed. Unexpectedly, the benchmark wasn't about arithmetic
code generation at all, it was about long division only. It turns out
that dmd's runtime library function had a crummy implementation of long
division in it. Fixing that brought the speed up to par. It wasn't the
code generation at fault at all, but this was not discoverable without
looking at the assembler.

Looking at the assembler often gives unexpected insight into why a
program performs as it does. Unexpected function calls, unanticipated
bloat, things that shouldn't be there, etc., all are exposed when
looking at it. It isn't necessary to be an assembler crackerjack to be
able to pick that up.

Conclusion

If you feel the need for speed, the way to get it is to use a
profiler and be willing to examine the assembler for the bottlenecks.
Only then is it time to think about better algorithms, faster languages,
etc.

Conventional wisdom has it that choosing the best algorithm trumps
any micro-optimizations. Though that is undeniably true, there are two
caveats that don't get taught in schools. First and most importantly,
choosing the best algorithm for a part of the program that has no
participation to the performance profile has a negative effect on
optimization because it wastes your time that could be better invested
in making actual progress, and diverts attention from the parts that
matter. Second, algorithms' performance always varies with the
statistics of the data they operate on. Even bubble sort, the butt of
all jokes, is still the best on almost-sorted data that has only a few
unordered items. So worrying about using good algorithms without
measuring where they matter is a waste of time - your's and computer's. (Commen
ac1f
ts: Only after you find the where and why the performance sucks, then you can take the proper ways to fix them, by modifying codes, by changing algorithm or by using faster language, etc.

)

Just like ordering speed parts from an auto racing catalog isn't
going to put you anywhere near the winner's circle (even if you get them
installed right), without profiling, you won't know where the problems
are without a profiler. Without looking at the assembler, you may know
where the problem is, but often won't know why.

Thanks to Bartosz Milewski, David Held, and Andrei Alexandrescu for their helpful comments on a draft of this.

.topcommenttools .tools {
display: table;
font-size: .7em;
font-weight: bold;
padding-bottom: 10px;
width: 100%;
}
.topcommenttools .tools .reply { float: left; }
.topcommenttools .tools .share-email-report { float: right; }
.topcommenttools .tools .share-email-report span { margin-left: 10px; }
.topcommenttools .tools .reply a { background: transparent url('http://i.cmpnet.com/ddj/v2/images/reply.gif') no-repeat 0px 3px; padding-left: 10px; padding-right: 10px;}
.topcommenttools .tools .share { background: transparent url('http://i.cmpnet.com/ddj/v2/images/share_icon.gif') no-repeat bottom left; padding-left: 23px; }
.topcommenttools .tools .email { background: transparent url('http://i.cmpnet.com/ddj/v2/images/mail_icon.gif') no-repeat bottom left; padding-left: 22px; }
.topcommenttools .tools .report { background: transparent url('http://i.cmpnet.com/ddj/v2/images/report_icon.gif') no-repeat bottom left; padding-left: 16px; }

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： compiler profiler assembly algorithm performance function

相关文章推荐

新的分享

章节导航