zz Overlooked Essentials For Optimizing Code
2010-10-09 17:49
363 查看
1 Using Profiler to find out where the bottleneck is,
2 By reading assembly codes to find out why the bottleneck is raised
Overlooked Essentials For Optimizing Code
Sep 10, 2010I've been programming for 35 years now, and I've done a lot of work optimizing programs for speed (an example
), and watching others optimize. Two essential techniques are consistently ignored.
Nope, it isn't avoiding premature optimization. It isn't replacing
bubble sort with quicksort (i.e. algorithmic improvements). It's not
what language used, nor is it how good the compiler is. It isn't writing
i<<2 instead of i*4.
It is:
Using a profiler
Looking at the assembly code being executed
The people who do those are successful in writing fast code, the ones who do not are not. Let me explain.
Using A Profiler
The old programming saw is that a program spends 90% of its time in10% of the code. I've found that to not be true. Over and over, I've
found that programs spends 99% of its time in 1% of the code. But which
1%? A profiler will tell you. Spending 100 hours of dev time on that 1%
will yield real benefits, while 100 hours on the other 99% will not
produce much of anything worthwhile. (Comment: 80-20 principle, even more
)
What's the problem? Don't people use profilers? Nope. One place I
worked at had a fancy expensive profiler that was still in its shrink
wrap 3 years after purchase. Why don't people use profilers? I don't
really know. I once got into a heated exchange with a colleague who
insisted he knew where the bottlenecks were; after all, he was an
experienced professional. I finally ran the profiler myself on his
project, and of course the bottleneck was in a completely unexpected
place.
Consider auto racing. The team that wins has sensors and logging on
just about everything they can stick a sensor on. You can race using
seat-of-the-pants tuning and have a jolly good time on the track, but
you won't win and you won't even be competitive. You won't know if your
poor speeds are caused by the engine, the exhaust, the aerodynamics, the
tire pressure, or the driver. Why should programming be any different?
You can't improve what you can't measure.
There are lots of profilers available. You can get ones that look at
the hierarchy of function calls, function times, times broken down for
each statement, and even at the instruction level. I've seen too many
programmers eschew profilers, preferring instead to whittle away their
time with useless and misdirected "optimizations" and getting trounced
by their competitors.
Looking At The Assembly Code
Years ago, I had a colleague, Mary Bailey, who taught remedialalgebra at the University of Washington. She told me once that when she
wrote on the board:
x + 3 = 5
and asked her students to "solve for x", they couldn't answer. But, if she wrote:
__ + 3 = 5
and asked the students to "fill in the blank" all of them could do
it. It seems that the magic word "x" seemed to cause them to reflexively
think "x means algebra, I don't understand algebra, I can't do this."
Assembler is the algebra of the programming world. If someone asks me
"was my function inlined by the compiler" or "if I write i*4, will the
compiler optimize it to a left shift" I'll suggest they look at the asm
output of the compiler. The reaction is how rude and unhelpful could I
be? The person will follow up by saying he doesn't know assembler. Even
C++ experts will say this. (Comment: Assembly codes are just simple, don't be afraid. You think they are hard just because you think they are assembler just like the students can't get X because they think "x means algebra")
Assembler is the simplest language (especially compared with C++!). For example,
ADD ESI,x
is (expressed in C style):
ESI += x;
and:
CALL foo
is:
foo();
Details vary among CPUs, but that's how it works. It's not even
really necessary to know that. Just looking at the assembler output and
comparing it to the source code will tell a LOT.
How does this help optimization? For example, I knew a programmer
years ago who thought he'd discovered a new, faster algorithm to do X.
I'm being deliberately vague to protect him. He had the benchmarks to
prove it, and wrote a nice article about it. But then someone looked at
the assembler output of the regular way, and his new fast way. It turns
out that the way he'd written his improved version had allowed the
compiler to replace two DIV instructions with one. This had really
nothing to do with his algorithm. But DIV is an expensive instruction,
and this was in the inner loop, and so his algorithm appeared to be
faster. The regular implementation could also be recoded slightly to use
only one DIV, too, and it would perform just as fast as the new
algorithm. He had discovered nothing.
For my next example, a D user posted a benchmark showing that dmd
(Digital Mars D compiler) was lousy at integer arithmetic, while ldc
(LLVM D compiler) was much better. Being very concerned about such a
result, I promptly looked at the assembler output. It was pretty much
equivalent, nothing stood out as being accountable for a 2:1 difference.
But there was a long divide in there, done with a call to a runtime
library function. That function call completely dominated the timing
results, all the adds and subtracts in the benchmark had no significant
impact on the speed. Unexpectedly, the benchmark wasn't about arithmetic
code generation at all, it was about long division only. It turns out
that dmd's runtime library function had a crummy implementation of long
division in it. Fixing that brought the speed up to par. It wasn't the
code generation at fault at all, but this was not discoverable without
looking at the assembler.
Looking at the assembler often gives unexpected insight into why a
program performs as it does. Unexpected function calls, unanticipated
bloat, things that shouldn't be there, etc., all are exposed when
looking at it. It isn't necessary to be an assembler crackerjack to be
able to pick that up.
Conclusion
If you feel the need for speed, the way to get it is to use aprofiler and be willing to examine the assembler for the bottlenecks.
Only then is it time to think about better algorithms, faster languages,
etc.
Conventional wisdom has it that choosing the best algorithm trumps
any micro-optimizations. Though that is undeniably true, there are two
caveats that don't get taught in schools. First and most importantly,
choosing the best algorithm for a part of the program that has no
participation to the performance profile has a negative effect on
optimization because it wastes your time that could be better invested
in making actual progress, and diverts attention from the parts that
matter. Second, algorithms' performance always varies with the
statistics of the data they operate on. Even bubble sort, the butt of
all jokes, is still the best on almost-sorted data that has only a few
unordered items. So worrying about using good algorithms without
measuring where they matter is a waste of time - your's and computer's. (Commen
ac1f
ts: Only after you find the where and why the performance sucks, then you can take the proper ways to fix them, by modifying codes, by changing algorithm or by using faster language, etc.
)
Just like ordering speed parts from an auto racing catalog isn't
going to put you anywhere near the winner's circle (even if you get them
installed right), without profiling, you won't know where the problems
are without a profiler. Without looking at the assembler, you may know
where the problem is, but often won't know why.
Thanks to Bartosz Milewski, David Held, and Andrei Alexandrescu for their helpful comments on a draft of this.
.topcommenttools .tools {
display: table;
font-size: .7em;
font-weight: bold;
padding-bottom: 10px;
width: 100%;
}
.topcommenttools .tools .reply { float: left; }
.topcommenttools .tools .share-email-report { float: right; }
.topcommenttools .tools .share-email-report span { margin-left: 10px; }
.topcommenttools .tools .reply a { background: transparent url('http://i.cmpnet.com/ddj/v2/images/reply.gif') no-repeat 0px 3px; padding-left: 10px; padding-right: 10px;}
.topcommenttools .tools .share { background: transparent url('http://i.cmpnet.com/ddj/v2/images/share_icon.gif') no-repeat bottom left; padding-left: 23px; }
.topcommenttools .tools .email { background: transparent url('http://i.cmpnet.com/ddj/v2/images/mail_icon.gif') no-repeat bottom left; padding-left: 22px; }
.topcommenttools .tools .report { background: transparent url('http://i.cmpnet.com/ddj/v2/images/report_icon.gif') no-repeat bottom left; padding-left: 16px; }
相关文章推荐
- Overlooked Essentials For Optimizing Code
- Overlooked Essentials For Optimizing Code
- Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code
- 40 Tips for optimizing your php Code 优化PHP代码的40条建议
- 《Tips for Optimizing C/C++ Code》译文
- The Essentials for Using COM in Managed Code
- Optimizing Code for ARM Cortex-A8 with NEON SIMD
- 《Tips for Optimizing C/C++ Code》译文
- Optimizing C and C++ Code ZZ
- 《Tips for Optimizing C/C++ Code》译文
- Tips for Optimizing C/C++ Code
- (转)Tips for Optimizing C/C++ Code
- 40 Tips for optimizing your php Code
- Mean Shift Code for the Edge Detection and Image SegmentatiON system
- Educational Codeforces Round 33 (Rated for Div. 2)
- Java Code Examples for org.apache.ibatis.annotations.Insert
- Native Client (NaCl) is an open-source technology for running native compiled code in the browser
- Educational Codeforces Round 33 (Rated for Div. 2)
- code signing is required for product type 'Application' in SDK 'iOS 7.0' 错误分析以及解决方案
- WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform