您的位置：首页 > 编程语言 > Python开发

High Performance Python 笔记（Python是门不错的语言，全栈程序员就用它好了！）

2014-10-07 22:48 288 查看

High Performance Python

Understanding Performant Python

Profiling

Lists and Tuples

内部实现都是array？

Dictionaries and Sets

字典元素：__hash__ + __eq__/__cmp__
entropy（熵）
locals() globals() __builtin__
列表理解/生成器理解：（一个用[]，一个用()）
[<value> for <item> in <sequence> if <condition>] vs (<value> for <item> in <sequence> if <condition>)

itertools：

imap, ireduce, ifilter, izip, islice, chain, takewhile, cycle

p95 Knuth's online mean algorithm？

Iterators and Generators

Matrix and Vector Computation

老是在举‘循环不变式’的例子，这是编译器没优化好吧？
$ perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,\

cache-references,cache-misses,branches,branch-misses,task-clock,faults,\

minor-faults,cs,migrations -r 3 python diffusion_python_memory.py
numpy

np.roll([[1,2,3],[4,5,6]], 1, axis=1)
？Cython能够优化数据结构吗？还是说只能处理代码？
In-place operations, such as +=, *=

=> numexpr

from numexpr import evaluate
evaluate("next_grid*D*dt+grid", out=next_grid)

？Creating our own roll function

scipy

from scipy.ndimage.filters import laplace
laplace(grid, out, mode='wrap')
page-faults显示scipy分配了大量内存？instructions显示scipy函数太过通用？

Compiling to C

编译到C：

Cython

zmq也用到了？
setup.py
from distutils.core import setupfrom distutils.extension import Extensionfrom Cython.Distutils import build_extsetup( cmdclass = {'build_ext': build_ext},

ext_modules = [Extension("calculate", ["cythonfn.pyx"])])

$ python setup.py build_ext --inplace
Cython Annotations：代码行更黄代表“more calls into the Python virtual machine,”
添加Type Annotations

cdef unsigned int i, n

禁止边界检查：#cython: boundscheck=False（修饰函数）
Buffer标记协议？

def calculate_z(int maxiter, double complex[:] zs, double complex[:] cs): ...

OpenMP

prange
-fopenmp（对GCC？）
schedule="guided"

Shed Skin：for non- numpy code

shedskin --extmod test.py
额外的0.05s：用于从Python环境复制数据

Pythran

基于LLVM的Numba：specialized for numpy

使用Continuum’s Anaconda版本
from numba import jit

@jit()

Experimental GPU support is also available？
#pythran export evolve(float64[][], float)

VM & JIT：PyPy

GC行为：Whereas CPython uses reference counting, PyPy uses a modified mark and sweep（从而可能回收不及时）
Note that PyPy 2.3 runs as Python 2.7.3.
STM：尝试移除GIL

其他工具：Theano Parakeet PyViennaCL Nuitka Pyston（Dropbox的）PyCUDA（低级代码无法移植？）
ctypes、cffi（来自PyPy）、f2py、CPython模块

$ f2py -c -m diffusion --fcompiler=gfortran --opt='-O3' diffusion.f90

JIT Versus AOT

Concurrency

并发：避免I/O wait的浪费
In Python, coroutines are implemented as generators.
For Python 2.7 implementations of future-based concurrency, ... ？

gevent（适合于mainly CPU-based problems that sometimes involve heavy I/O）

gevent monkey-patches the standard I/O functions to be asynchronous
Greenlet

wait
The futures are created with gevent.spawn
控制同时打开的资源数：from gevent.coros import Semaphore

requests = [gevent.spawn(download, u, semaphore) for u in urls]

import grequests？
69x的加速？这是否意味着对应的不必要的IO waits？
event loop可能either underutilizing or overutilizing

tornado（By Facebook，适合于mostly I/O-bound的异步应用）

from tornado import ioloop, gen
from functools import partial
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient", max_clients=100)
@gen.coroutine

... responses = yield [http_client.fetch(url) for url in urls] #生成Future对象？
response_sum = sum(len(r.body) for r in responses)
raise gen.Return(value=response_sum)

_ioloop = ioloop.IOLoop.instance()
run_func = partial(run_experiment, base_url, num_iter)
result = _ioloop.run_sync(run_func)
缺点：tracebacks can no longer hold valuable information

In Python 3.4, new machinery introduced to easily create coroutines and have them still return values

asyncio

yield from：不再需要raise异常，以便从coroutine中返回结果
very low-level => import aiohttp

@asyncio.coroutine
def http_get(url): #
<span style="white-space:pre">	</span>nonlocal semaphore
<span style="white-space:pre">	</span>with (yield from semaphore):
<span style="white-space:pre">		</span>response = yield from aiohttp.request('GET', url)
<span style="white-space:pre">		</span>body = yield from response.content.read()
<span style="white-space:pre">		</span>yield from response.wait_for_close()
<span style="white-space:pre">	</span>return body
return http_get

tasks = [http_client(url) for url in urls]
for future in asyncio.as_completed(tasks):
<span style="white-space:pre">	</span>data = yield from future
loop = asyncio.get_event_loop()
result = loop.run_until_complete(run_experiment(base_url, num_iter))

allows us to unify modules like tornado and gevent by having them run in the same event loop

multiprocessing

Process Pool Queue Pipe Manager ctypes（用于IPC？）
In Python 3.2, the concurrent.futures module was introduced (via PEP 3148)
PyPy完全支持multiprocessing，运行更快
from multiprocessing.dummy import Pool（多线程的版本？）
hyperthreading can give up to a 30% perf gain，如果有足够的计算资源
It is worth noting that the negative of threads on CPU-bound problems is reasonably solved in Python 3.2+
使用外部的队列实现：Gearman, 0MQ, Celery（使用RabbitMQ作为消息代理）, PyRes, SQS or HotQueue
manager = multiprocessing.Manager()

value = manager.Value(b'c', FLAG_CLEAR)
rds = redis.StrictRedis()

rds[FLAG_NAME] = FLAG_SET
value = multiprocessing.RawValue(b'c', FLAG_CLEAR) #无同步机制？
sh_mem = mmap.mmap(-1, 1) # memory map 1 byte as a flag

sh_mem.seek(0)

flag = sh_mem.read_byte()
Using mmap as a Flag Redux（？有点看不明白，略过）
$ ps -A -o pid,size,vsize,cmd | grep np_shared
lock = lockfile.FileLock(filename)

lock.acquire/release()
lock = multiprocessing.Lock()

value = multiprocessing.Value('i', 0)

lock.acquire()

value.value += 1

lock.release()

Clusters and Job Queues

$462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy

版本升级造成不一致？但API应该版本化...

Skype's 24-Hour Global Outage

some versions of the Windows client didn’t properly handle the delayed responses and crashed.

To reliably start the cluster's components when the machine boots, we tend to use either a cron job,Circus
or
supervisord, or sometimes
Upstart (which is being replaced by
systemd)
you might want to introduce a random-killer tool like Netflix's
ChaosMonkey
Make sure it is cheap in time and money to deploy updates to the system
Make sure you use a deployment system like
Fabric,
Salt,
Chef, or
Puppet
早期预警：Pingdom andServerDensity
状态监控：Ganglia
3 Clustering Solutions

Parallel Python

ppservers = ("*",) # set IP list to be autodiscovered
job_server = pp.Server(ppservers=ppservers, ncpus=NBR_LOCAL_CPUS)
... job = job_server.submit(calculate_pi, (input_args,), (), ("random",))

IPython Parallel

via
ipcluster
？Schedulers hide the synchronous nature of the engines and provide an asynchronous interface

NSQ（分布式消息系统，Go编写）

Pub/sub：Topicd -> Channels -> Consumers
writer = nsq.Writer(['127.0.0.1:4150', ])
handler = partial(calculate_prime, writer=writer)
reader = nsq.Reader(message_handler = handler, nsqd_tcp_addresses = ['127.0.0.1:4150', ], topic = 'numbers', channel = 'worker_group_a',)
nsq.run()

其他集群工具

Using Less RAM

IPython #memit
array模块
DAWG/DAFSA
Marisa trie（静态树）
Datrie（需要一个字母表以包含所有的key？）
HAT trie
HTTP微服务（使用Flask）：https://github.com/j4mie/postcodeserver/
Probabilistic Data Structures

HyperLogLog++结构？
Very Approximate Counting with a 1-byte Morris Counter

2^exponent，使用概率规则更新：random(0,1)<=2^-exponent

K-Minimum Values/KMV（记住k个最小的hash值，假设hash值分布均匀）
Bloom Filters

This method gives us no false negatives and a controllable rate of false positives（可能误判为有）
？用2个独立的hash仿真任意多个hash
very sensitive to initial capacity
scalable Bloom filters：By chaining together multiple bloom filters ...

LogLog Counter
bit_index = trailing_zeros(item_hash)if bit_index > self.counter:

self.counter = bit_index

变体：SuperLogLog HyperLogLog

Lessons from the Field

Sentry is used to log and diagnose Python stack traces
Aho-Corasick trie？
We use
Graphite with collectd and statsd to allow us to draw pretty graphs of what's going on
Gunicorn was used as a WSGI and its IO loop was executed by Tornado

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

High Performance Python 笔记（Python是门不错的语言，全栈程序员就用它好了！）