Benchmark of Python WSGI Servers
2014-05-01 22:48
405 查看
Benchmark of Python WSGI Servers
Nicholas Piël |March 15,2010
It has been a while since the Socket Benchmark of Asynchronous server. That benchmark looked specifically at the raw socket performance of various frameworks,which was being benchmarked by doing a regular HTTP
request against the TCP server. The serveritself was dumb and did not actually understand the headers being send to it. In this benchmark I will be looking at how different WSGI servers
performat exactly that task; the handling of a full HTTP request.
I should immediately start with a word of caution. I tried my best to present an objective benchmark of the different WSGI servers. And I truly believe that a benchmark is one of the best methods to present an
unbiased comparison. However,a benchmark measures the performance on a very specific domain and it could very well be that this domain is slanted towards certain frameworks. But,if we keep that in mind we can actually put some measurements behind all those
‘faster than’ or ‘lighter than’ claims you will find everywhere. It is my opinion that such comparison claims without any detailed description of how they are measured are worse than a biased but detailed benchmark. The specific domain of this benchmark is,
yet again,the PingPong benchmark as used earlier in my Async Socket Benchmark. However,there are some differences:
We will fire multiple requests over a single connection,when possible,by using a HTTP 1.1 keepalive connection
It is a distributed benchmark with multiple clients
We will use an identical WSGI application for all servers instead of specially crafted code to return the reply
We expect the serverto understand our HTTP request and reply with the correct error codes
This benchmark is a conceptually simple one and you could claimthat this is not representable for most common web application which rely heavily on blocking database connections. I agree with that to some extent
as this is mostly the case. However,the push towards HTML5’s websockets and highly interactive web applications will require servers that are capable to serve lots of concurrent connections with low latency.
The benchmark
We will run the following WSGI application ‘pong.py’ on all servers.def application(environ, start_response): |
status = '200 OK' |
output = 'Pong!' |
response_headers = [( 'Content-type' , 'text/plain' ), |
( 'Content-Length' , str ( len (output)))] |
start_response(status, response_headers) |
return [output] |
echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240
The serveris a virtual machine with only one assigned processor. I have explicitly limited the amount of available processors to make sure that it is a fair comparison. Whether or not the serverscales over
multiple processors is an interesting and useful feature but this is not something I will measure in this benchmark. The reason for this is that it isn’t that difficult to scale up your application to multiple processors by using a reverse proxy and multiple
serverprocesses (this can even be managed for you by special applications such as Spawning or Grainbows).
The serverand clients run Debian Lenny with Python 2.6.4 on the amd64 architecture. I made sure that all WSGI servers have a backlog set of at least 500 and that (connection/error) logging is disabled,when this was not directly possible fromthe callable
I modified the library. The serverand the clients have 1GB of ram.
I benchmarked the HTTP/1.0 request rate of all serverand the HTTP/1.1 request rate on the subset of servers that support pipelining multiple requests over a single connection. While the lack of HTTP 1.1 keepalive
support is most likely a non issue in current deployment situations I expect it to become an important feature in applications that depend heavily on low latency connections. You should think about comet-style web applications or applications that use HTML5
websockets.
I categorize a serveras HTTP/1.1 capable by its behaviour,not by its specs. For example the Paster serversays that it has some support for HTTP 1.1 keep alives but I was unable to pipeline multiple requests. This
reported bug might be relevant to this situation and might apply to some of the other “HTTP 1.0 Servers”.
The benchmark will be performed by running a recompiled httperf (which
bypasses the static compiled file limit in the debian package) on 3 different specially setup client machines. To initialize the different request rates and aggregate the results I will use a tool called autobench.
Note: this is not ApacheBench (ab).
The command to benchmark HTTP/1.0 WSGI servers is:
httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=1
And the command for HTTP/1.1 WSGI servers is:
httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=10
The Contestants
Python is really rich with WSGI servers,i have made a selection of different servers which are listed below.Name | Version | http 1.1 | Flavour | Repo. | Blog | Community |
---|---|---|---|---|---|---|
Gunicorn | 0.6.4 | No | processor/thread | GIT | ? | #gunicorn |
uWSGI | Trunk (253) | Yes | processor/thread | repo | ? | Mailing List |
FAPWS3 | 0.3.1 | No | processor/thread | GIT | WilliamOs4y | Google Groups |
Aspen | 0.8 | No | processor/thread | SVN | Chad Whitacre | Google Groups |
Mod_WSGI | 3.1 | Yes | processor/thread | SVN | GrahamDumpleton | Google Groups |
wsgiref | Py 2.6.4 | No | processor/thread | SVN | None | Mailing List |
CherryPy | 3.1.2 | Yes | processor/thread | SVN | Planet CherryPy | Planet,IRC |
MagnumPy | 0.2 | No | processor/thread | SVN | Matt Gattis | Google Groups |
Twisted | 10.0.0 | Yes | processor/thread | SVN | Planet Twisted | Community |
Cogen | 0.2.1 | Yes | callback/generator | SVN | Maries Ionel | Google Groups |
GEvent | 0.12.2 | Yes | lightweight threads | Mercurial | Denis Bilenko | Google Groups |
Tornado | 0.2 | Yes | callback/generator | GIT | Google Groups | |
Eventlet | 0.9.6 | Yes | lightweight threads | Mercurial | Eventlet | Mailinglist |
Concurrence | tip | Yes | lightweight threads | GIT | None | Google Groups |
the concurrency model the serveruses and I identify 3 different flavours:
Processor / Thread model
The p/t model is the most common flavour. Every requests runs in its own cleanly separated thread. A blocking request (such as a synchronous database call or a function call in a C extension) will not influence
other requests. This is convenient as you do not need to worry about how everything is implemented,but it does come at a price. The maximumamount of concurrent connections is limited by your number of workers or threads and this is known to scale badly when
you have the need for lots of concurrent users.
Callback / Generator model
The callback/generator model handles multiple concurrent connections in a single thread,thereby removing the thread barrier. A single blocking call will block the whole event loop however and has to be prevented.
The servers that have this flavour usually provide a threadpool to integrate blocking calls in their async framework or provide alternative non-blocking database connectors. In order to provide flow control this flavour uses callbacks or generators. Some think
that this is a beautiful way to create a formof event driven programming others think that it is snake pit that quickly changes your clean code to an entangled mess of callbacks or yield statements.
Lightweight Threads
The lightweight flavour uses greenlets to provide concurrency. This also works by providing concurrency froma single thread but in a less obtrusive way then with the callbacks or generator approach. But of course
one has to be careful with blocking connections as this will stop the event loop. To prevent this fromhappening,Eventlet and Gevent can monkeypatch the socket module to stop it fromblocking so when you are using a pure python database connector this should
never block the loop. Concurrence provides an asynchronous database adapter.
Implementation specifics for each WSGI server
Aspen
Ruby might be full with all kinds of rockstar programmers (whatever that might mean) but if i have to nominate just one Python programmer with some sort of ‘rockstar award’ i would definitely nominate Chad Whitacre.Its not only the great tools he created; Testosterone,Aspen,Stephane. But mostly how he promotes themwith the mostawesome screencasts i
have ever seen.
Anyway,Aspen is a neat little Web serverwhich is also able to serve WSGI applications. It can be easily installed with ‘pip install aspen’ and uses a special directory structure for configuration and if you
want more information i amgoing to point you to his screencasts.
CherryPy
![](http://nichol.as/wp-content/uploads/2010/03/cherrypy.png)
CherryPy
is actually an object oriented Python framework but features an excellent WSGI server. Installation can be done with a simple ‘pip install cherrypy’. I ran the following script to test out the performance of the WSGI server:
from cherrypy import wsgiserver |
from pong import application |
# Here we set our application to the script_name '/' |
wsgi_apps = [( '/' , application)] |
server = wsgiserver.CherryPyWSGIServer(( '0.0.0.0' , 8070 ), = 500 , server_name = 'localhost' ) |
if __name__ = = '__main__' : |
try : |
server.start() |
except KeyboardInterrupt: |
server.stop() |
Cogen
The code to have Cogen run a WSGI application is as follows:from cogen.web import wsgi |
from cogen.common import * |
from pong import application |
m = Scheduler(default_priority = priority.LAST, default_timeout = 15 ) |
server = wsgi.WSGIServer( |
( '0.0.0.0' , 8070 ), |
application, |
m, |
server_name = 'pongserver' ) |
m.add(server.serve) |
try : |
m.run() |
except (KeyboardInterrupt, SystemExit): |
pass |
Concurrence
Concurrence is an asynchronous framework under development by Hyves (you might call itthe Dutch Facebook) built upon Libevent (I used the latest stable version 1.4.13),I fired up the pongapplication as
follows:
from concurrence import dispatch |
from concurrence.http import WSGIServer |
from pong import application |
server = WSGIServer(application) |
# Concurrence has a default backlog of 512 |
dispatch(server.serve(( '0.0.0.0' , 8080 ))) |
Eventlet
Eventlet is a full featured asynchronous framework which also provides WSGI serverfunctionality. It isin development by Linden Labs (makers of Second Life). To run the application I used the following code:
import eventlet |
from eventlet import wsgi |
from pong import application |
wsgi.server(eventlet.listen(('', 8090 ), backlog = 500 ), = 8000 ) |
FAPWS3
FAPWS3 is a WSGI serverbuild around the LibEV library(I used version 3.43-1.1). When LibEV has been installed,FAPWS can be easily installed with pip. The philosophy behind FAPWS3 is to stay
the simplest and fastest webserver. The script I used to start up the WSGI application is as follows:
import fapws._evwsgi as evwsgi |
from fapws import base |
from pong import application |
def start(): |
evwsgi.start( "0.0.0.0" , 8080 ) |
evwsgi.set_base_module(base) |
evwsgi.wsgi_cb(( "/" , application)) |
evwsgi.set_debug( 0 ) |
evwsgi.run() |
if __name__ = = "__main__" : |
start() |
Gevent
Gevent is one of the best performing Async frameworks in my previous socket benchmark. Gevent extends Libeventand uses its HTTP serverfunctionality extensively. To install Gevent you need Libevent installed after which you can pull in Gevent with PIP.
from gevent import wsgi |
from pong import application |
wsgi.WSGIServer(('', 8088 ), = None ).serve_forever() |
Gunicorn
![](http://nichol.as/wp-content/uploads/2010/03/gunicorn1.png)
Gunicorn stands for
‘Green Unicorn’,everybody knows that a unicorn is a mix of the the awesome narwhal and the
magnificent pony the green does however have nothing to do with the greatgreenlets as it really has a threaded flavour.
Installation is easy and can be done with a simple ‘pip install gunicorn’ Gunicorn provides you with a simple command to run wsgi applications,all I had to do was:
gunicorn -b :8000 -w 1 pong:application
Update: I had some suggestions in the comment section that using a single worker and having a client connect to the naked serveris not the correct way to work with Gunicorn.
So I took their suggestions and moved Gunicorn behind NGINX and increased the worker count to the suggested number of workers,2*N+1 where N is 1 which makes 3. The result of this is depicted in the graphs as gunicorn-3w.
The run Gunicorn with more workers can be done such as:
gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application
MagnumPy
![](http://nichol.as/wp-content/uploads/2010/03/magnum_pi_tom_selleck-241x300.jpg)
MagnumPy has
to be the serverwith the most awesome name. This is still a very young project but its homepage is making some strong statements about its performance so it is worth testing out. It does not feel as polished as the other contestants and installing is basically
pushing the ‘magnum’ directory on your PYTHONPATH edit ‘./magnum/config.py’ after which you can start the serverby running ‘./magnum/serve.py start’
#config.py |
import magnum |
import magnum.http |
import magnum.http.wsgi |
from |
WORKER_PROCESSES = 1 |
WORKER_THREADS_PER_PROCESS = 1000 |
HOST |
HANDLER_CLASS = magnum.http.wsgi.WSGIWrapper(application) |
DEBUG = False |
PID_FILE = '/tmp/magnum.pid' |
Mod_WSGI
Mod_WSGI is the successor of Mod_Python,it allows you to easily integrate Python code with theApache server. My first python web app experience was with mod_python and PSP templates,WSGI and cool frameworks such as Pylons have really made life a lot easier.
Mod_WSGI is a great way to get your application deployed quickly. Installing ‘Mod_WSGI’ is with most Linux distributions really easy. For example:
aptitude install libapache2-mod-wsgi
Is all you need to do on a pristine Debian distro to get a working Apache (MPM-Worker) serverwith Mod_WSGI enabled. To point Apache to your WSGI app just add a single line to ‘/etc/apache2/httpd.conf’:
WSGIScriptAlias / /home/nicholas/benchmark/wsgibench/pong.py
The problemis,that most people already have Apache installed and that they are using it for *shudder* serving PHP. PHP is not thread safe,meaning that you are forced to use a pre-forking Apache server. In
this benchmark I amusing the threaded Apache version and use mod_wsgi in embedded mode (as it gave me the best performance).
I disabled all unnecessary modules and configured Apache to provide me with a single worker,lots of threads and disabled logging (note: i tried various settings):
<IfModule mpm_worker_module> |
ServerLimit 1 |
ThreadLimit 1000 |
StartServers 1 |
MaxClients 1000 |
MinSpareThreads 25 |
MaxSpareThreads 75 |
ThreadsPerChild 1000 |
MaxRequestsPerChild 0 |
</IfModule> |
CustomLog /dev/null combined |
ErrorLog /dev/null |
Paster
The Paster webserveris the webserverprovided with Python Paste it is Pylons default webserver. You canrun a WSGI application as follows:
from pong import application |
from paste import httpserver |
httpserver.serve(application, '0.0.0.0' , request_queue_size = 500 ) |
Tornado
![](http://nichol.as/wp-content/uploads/2010/03/tornado.png)
Tornado is
the non-blocking webserverthat powers FriendFeed. It provides some WSGI serverfunctionality which can be used as described below. In the previous benchmark I have shown that it provides excellent raw-socket performance.
import os |
import tornado.httpserver |
import tornado.ioloop |
import tornado.wsgi |
import sys |
from pong import application |
sys.path.append( '/home/nicholas/benchmark/wsgibench/' ) |
def main(): |
container = tornado.wsgi.WSGIContainer(application) |
http_server = tornado.httpserver.HTTPServer(container) |
http_server.listen( 8000 ) |
tornado.ioloop.IOLoop.instance().start() |
if __name__ = = "__main__" : |
main() |
Twisted
![](http://nichol.as/wp-content/uploads/2010/03/TwistedLogo.png)
After
installing Twisted with PIP you get a tool ‘twistd’ which allows you to easily serve WSGI applications fe:
wistd –pidfile=/tmp/twisted.pid -no web –wsgi=pong.application –logfile=/dev/null
But you can also run a WSGI application as follows:
from twisted.web.server import Site |
from twisted.web.wsgi import WSGIResource |
from twisted.internet import reactor |
from pong import application |
resource = WSGIResource(reactor, |
reactor.listenTCP( 8000 ,Site(resource)) |
reactor.run() |
uWSGI
![](http://nichol.as/wp-content/uploads/2010/03/uwsgi.png)
uWSGI is
a serverwritten in C,it is not meant to run stand-alone but has to be placed behind a webserver. It provides modules for Apache,NGINX,Cherokee and Lighttpd. I have placed it behind NGINX which i configured as follows:
worker_processes 1; |
events { |
worker_connections 30000; |
} |
http { |
include mime.types; |
default_type application/octet-stream; |
keepalive_timeout 65; |
upstream |
ip_hash; |
server unix:/var/nginx/uwsgi.sock; |
} |
server { |
listen 9090; |
server_name localhost; |
location / { |
uwsgi_pass pingpong; |
include uwsgi_params; |
} |
error_page 500 502 503 504 /50x.html; |
location = /50x.html { |
root html; |
} |
} |
} |
./uwsgi -s /var/nginx/uwsgi.sock -i -H /home/nicholas/benchmark/wsgibench/ -M -p 1 -w pong-z 30 -l 500 -L
WsgiRef
WsgiRef is the default WSGI serverincluded with Python since version 2.5. To have this serverrun my application I use the following code which disables logging and increases the backlog.from pong import application |
from wsgiref import simple_server |
class PimpedWSGIServer(simple_server.WSGIServer): |
# To increase the backlog |
request_queue_size = 500 |
class PimpedHandler(simple_server.WSGIRequestHandler): |
# to disable logging |
def log_message( self , * args): |
pass |
httpd = PimpedWSGIServer(('', 8000 ), PimpedHandler) |
httpd.set_app(application) |
httpd.serve_forever() |
Results
Below you will find the results as plotted with Highcharts,the line will thicken when hovered over and you can easily enable or disable plotted results by clicking on the legend.
HTTP 1.0 Server results
0100020003000400001000200030004000Reply
rate
Reply Rate
on an increasing amount of requests (more is better)
aspencherrypy
eventlet
fapws3
gevent
gunicorn
modwsgi
tornado
twisted
uwsgi
gunicorn-3w
Highcharts.com
Disqualified servers
Fromthe above graph it should be clear that some of the web servers are missing,the reason is that I was unable to have themcompletely benchmarked as they stopped replying when the request rate passed a certain
critical value. The servers that are missing are:
MagnumPy,i was able to obtain a reply rate of 500 RPS,but when the request rate passed the 700 RPS mark,MagnumPy crashed
Concurrence,I was able to obtain a successful reply rate of 700 RPS,but it stopped replying when we fired more than 800 requests a second at the server. However,since Concurrence does support
HTTP/1.1 keep alive connections and behaves correctly when benchmarked under a lower connection rate but higher request rate you can find its results in the HTTP/1.1 benchmark
Cogen,was able to obtain a reply rate of 800 per second but stopped replying when the request rate was above 1500 per second. It does have a complete benchmark under the HTTP/1.1 test though.
WSGIRef,I obtained a reply rate of 352 but it stopped reacting when we passed the 1900 RPS mark
Paster,obtained a reply rate of 500 but it failed when we passed the 2000 RPS mark
Interpretation
Fromthe servers that passed the benchmark we can see that they all have an admirable performance. At the bottomwe have Twisted and Gunicorn,the performance of Twisted is somewhat expected as well it isn’t
really tuned for WSGI performance. I find the performance of Gunicorn somewhat disappointing,also because for example Aspen which is a pure Python froma few years back,shows a significant better performance. We can see however,that increasing the worker
count does in fact improve the performance as it is able to obtain a reply rate competitive with Aspen.
The other pure python servers,CherryPy and Tornado seemto be performing on par with ModWSGI. It looks that CherryPy has a slight performance edge over Tornado. So,if you are thinking to change fromModWSGI
or CherryPy to Tornado because of increased performance you should think again. Not only does this benchmark show that there isn’t that much to gain. But you will also abandon the process/thread model meaning that you should be cautious with code blocking
your interpreter.
The top performers are clearly FAPWS3,uWSGI and Gevent. FAPWS3 has been designed to be fast and lives up the expectations,this has been noted by others as well as it looks like it is being used
in production at Ebay. uWSGI is used successfully in production at (and in development by) the Italian ISP Unbit. Gevent is a relatively young project but
already very successful. Not only did it performgreat in the previous async serverbenchmark but its reliance on the Libevent HTTP servergives it a performance beyond the other asynchronous frameworks.
I should note that the difference between these top 3 is too small to declare a clear winner of the ‘reply rate contest’. However,I want to stress that with almost all servers I had to be careful to keep the
amount of concurrent connections low since threaded servers aren’t that fond of lots concurrent connections. The async servers (Gevent,Eventlet,and Tornado) were happy to work on whatever was being thrown at them. This really gives a great feeling of stability
as you do not have to worry about settings such as poolsize,worker count etc..
01000200030004000
01000200030004000Response
Time (ms)
Response Time
on an increasing amount of requests (less is better)
aspencherrypy
eventlet
fapws3
gevent
gunicorn
modwsgi
tornado
twisted
uwsgi
gunicorn-3w
Highcharts.com
Most of the servers have an acceptable response time. Twisted and Eventlet are somewhat on the slow side but Gunicorn shows,unfortunately,a dramatic increase in latency when the request rate passes the 1000
RPS mark. Increasing the Gunicorn worker count lowers this latency by a lot but it still on the high side compared with for example Aspen or CherryPy.
01000200030004000
050100150200Error
Rate
Error Rate
on an increasing amount of requests (less is better)
aspencherrypy
eventlet
fapws3
gevent
gunicorn
modwsgi
tornado
twisted
uwsgi
gunicorn-3w
Highcharts.com
The low error rates for CherryPy,ModWSGI,Tornado,uWSGI should give everybody confidence in their suitability for a production environment.
HTTP 1.1 Server results
In the HTTP/1.1 benchmark we have a different list of contestants as not all servers were able to pipeline multiple requests over a single connection. In this test the connection rate is relatively low,for examplea request rate of 8000 per second is about 800 connections per second with 10 requests per connection. This means that some servers that were not able to complete the HTTP/1.0 benchmark (with connection rates up to 5000 per second) are able to complete the
HTTP/1.1 benchmark (Cogen and Concurrence for example).
02000400060008000
0200040006000800010000Request
rate
Succesful Reply Rate
on an increasing amount of requests (more is better)
uwsgimodwsgi
cherrypy
twisted
cogen
gevent-spawn
gevent
tornado
eventlet
concurrence
Highcharts.com
This graph shows the achieved request rate of the servers and we can clearly see that the achieved request rate is higher than in the HTTP/1.0 test. We could increase the total request rate even more by increasing
the number of pipelined requests but this would then lower the connection rate. I think that 10 pipelined requests is a ok generalization of a webbrowser opening an average page.
The graph shows a huge gap in performance difference,with the fastest serverGevent we are able to obtain about 9000 replies per second,with Twisted,Concurrence and Cogen we get about 1000. In the middle we
have CherryPy and ModWSGI with themwe are able to obtain a reply rate around the 4000. It is interesting that Tornado while being close to CherryPy and ModWSGI seems to have an edge in this benchmark compared to the edge CherryPy had in the HTTP/1.0 benchmark.
This is along the lines of our expectations as pipelined requests in Tornado are cheaper (since it is Async) then in ModWSGI or CherryPy. We expect this gap to widen if we increase the number of pipelined requests. However,it falls to be seen how much of
a performance boost this would provide in a deployment setup as Tornado and CherryPy will then probably be sitting behind a reverse proxy,for example NGINX. In such a setting the connection type between the upstreamand the proxy is usually limited to HTTP/1.0,
NGINX for example does not even support HTTP/1.1 keep alive connections to its upstreams.
The best performers are clearly uWSGI and Gevent. I benchmarked Gevent with the ’spawn=none’ option to prevent Gevent fromspawning a Greenlet,this seems fair in a benchmark like this. However,when you want
to do something interesting with lots of concurrent connections you want each request to have its own Greentlet as this allows you to have thread like flow control. Thus I also benchmarked that version which can be seen in the Graph under the name ‘Gevent-Spawn’,
fromits results we can see that performance penalty is small.
02000400060008000
050010001500200025003000Response
Time (ms)
Response Time
on an increasing amount of requests (less is better)
uwsgimodwsgi
cherrypy
twisted
cogen
gevent-spawn
gevent
tornado
eventlet
concurrence
Highcharts.com
Cogen is getting a high latency after about 2000 requests per second,Eventlet and Twisted show an increased latency fairly early as well.
02000400060008000
0510152025Error
Rate
Error Rate
on an increasing amount of requests (less is better)
uwsgimodwsgi
cherrypy
twisted
cogen
gevent-spawn
gevent
tornado
eventlet
concurrence
Highcharts.com
The error rate shows that Twisted,Concurrence and Cogen have some trouble keeping up,I think all other error rates are acceptable.
Memory Usage
I also monitored the memory usage of the different frameworks during the benchmark. The benchmark noted below is the peak memory usage of all accumulated processes. As this benchmark does not really benefit fromadditional processes (as there is only one available processor) I limited the amount of workers when possible.
36231223024771744276993233157
AspenCherryPyCogenConcurrenceEventletFAPWS3geventGunicornGunicorn-3wMagnumPyModWSGIPasterTornadoTwisteduWSGIWsgiRef
0255075100125150Memory
Usage (Megabytes)
Accumulated Peak Memory Usage per WSGI server
Fromthese results there is one thing that really stands out and that is the absolutely low memory usage of uWSGI,Gevent and FAPWS3. Especially if we take their performance into account. It looks like Cogen
is leaking memory,but I haven’t really looked into that. Gunicorn-3w shows compared with Gunicorn a relatively high memory usage. But it should be noted that this is mainly caused by the switch fromthe naked deployment to the deployment after NGINX as we
now also have to add the memory usage of NGINX. A single Gunicorn worker only takes about 7.5Mb of memory.
Let’s Kick it up a notch
![](http://nichol.as/wp-content/uploads/2010/03/610x1-300x201.jpg)
The
first part of this post focussed purely on the RPS performance of the different frameworks under a high load. When the WSGI serverwas working hard enough it could simply answer all requests froma certain user and move on to the next user. This keeps the
amount of concurrent connections relatively low making such a benchmark suitable for threaded web servers.
However,if we are going to increase the amount of concurrent connections we will quickly run into systemlimits as explained in the introduction. This is commonly known as the C10K problem. Asynchronous servers
use a single thread to handle multiple connections and when efficiently implemented with for example EPoll or KQueue are perfectly able to handle a large amount of concurrent connections.
So that is what we are going to do,we are going to take the top-3 performing WSGI servers namely Tornado,Gevent and uWSGI (FAPWS3 lack of HTTP/1.1 support made it unsuitable for this benchmark) and give them
5 minutes of ping-pongmayhem.
You see,ping-pongis a simple game and it isn’t really the complexity that makes it interesting it is the speed and the reaction of the players. Now,what is 5 minutes of pingpongmayhem? Imagine that for 5
minutes long every second an Airbus loaded with ping-pongplayers lands (500 clients) and each of those players is going to slamyou exactly 12 balls (with a 5 second interval). This would mean that after 5 seconds you would already have to return the volleys
of 2000 different players at once.
Tsung Benchmark Setup
To performthis benchmark I amgoing to use Tsung,which is a multi-protocol distributed loadtesting tool written in Erlang. I will then have 3 different machines simulating the ping-pongrampage. I used the following Tsung script.
<?xml version="1.0"?> |
<!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []> |
<tsung loglevel="warning"> |
<clients> |
<client host="tsung2" use_controller_vm="false" maxusers="800"/> |
<client host="tsung3" use_controller_vm="false" maxusers="800"/> |
<client host="bastet" use_controller_vm="false" maxusers="800"/> |
</clients> |
<servers> |
<server host="tsung1" port="8000" type="tcp"/> |
</servers> |
<monitoring> |
<monitor host="tsung1" type="erlang"/> |
</monitoring> |
<load> |
<arrivalphase phase="1" duration="5" unit="minute"> |
<users interarrival="0.002" unit="second"/> |
</arrivalphase> |
</load> |
<sessions> |
<session name='wsgitest' probability='100' type='ts_http'> |
<for from="0" to="12" incr="1" var="counter"> |
<request> |
<http url='http://tsung1:8000/' version='1.1' method='GET'/> |
</request> |
<thinktime random='false' value='5'/> |
</for> |
</session> |
</sessions> |
</tsung> |
Tsung Benchmark Results
010020030040050060005000100001500020000Concurrent
Connections
Concurrent Connections
measured over time (in seconds)
tornadouwsgi
gevent
Highcharts.com
0250500
00.511.5
SystemLoad
0250500
0255075100125CPU
Usage
CPU Usage
0250500
7008009001000Memory
Free
Free memory
Let me first state that all the three frameworks are perfectly capable to handle this kind of load,none of the frameworks dropped connection or ignored requests. Which I must say is already quite an achievement,
considering that they had to handle about 2 million requests each.
Below the concurrent connection graph we can see the systemload,the cpu usage and the free memory on the systemduring the benchmark. We can clearly see that Gevent put less strain on the systemas the CPU
and Load graph indicate. In the memory graph we can see that all frameworks used a consistent amount of memory.
The readers that still pay close attention to this article should note that the memory graph displays 4 lines instead of 3. The fourth line is Gevent compiled against Libevent
2.0.4a,the new release of Libevent has been said to show considerable performance improvements
in its HTTP server. But it is still an alpha version and the memory graph shows that this version is leaking memory. Not something you want on your production site.
0100200300400500600
0100200300400500Response
Time (ms)
Server Latency
measured over time (in seconds)
tornadouwsgi
gevent
Highcharts.com
The final graph shows the latency of the 3 frameworks we can see a clear difference between Tornado and its competitors as Tornado’s response time hovers around 100ms,uWSGI around 5ms and geventaround 3ms.
This is quite a difference and I amreally amazed by the low latency of both Gevent and uWSGI during this onslaught.
Summary and Remarks
The above results show that as a Python web developer we have lots of different methods to deploy our applications. Some of these seemto performbetter than others but by focussing only on serverperformanceI will not justify most of the tested servers as they differ greatly in functionality. Also,if you are going to take some stock web framework and won’t do any optimizations or caching,the performance of your webserveris not going to matter as this will
not be the bottleneck. If there is one thing which made this benchmark clear is that most Python Web servers offer great performance and if you feel things are slow the first thing to look at is really your own application.
When you are just interested in quickly hosting your threaded application you really can’t go wrong with Apache ModWSGI. Even though Apache ModWSGI might put a little more strain on your memory requirements there
is a lot to go for in terms of functionality. For example,protecting part of your website by using a LDAP serveris as easy as enabling a module. Standalone CherryPy also shows great performance and functionality and is really a viable (fully Python) alternative
which can lower memory requirements.
When you are a little more adventurous you can look at uWSGI and FAPWS3,they are relatively new compared to CherryPy and ModWSGI but they show a significant performance increase and do have lower memory requirements.
Concerning Tornado and performance,I do not think Tornado is an alternative for CherryPy or even ModWSGI. Not only does it hardly show any increases in performance but it also requires you to rethink your code.
But Tornado can be a great option if you do not have any code using blocking connections or are just wanting to look at something new.
And then there is Gevent,it really showed amazing performance at a low memory footprint,it might need some
adjustments to your legacy code but then again the monkey patching of the socket module could help and I really love the cleanness of Greenlets. There has already been some
reports of deploying Gevent successfully even with SQLAlchemy.
And if you want to dive into high performance websockets with lots of concurrent connections you really have to go with an asynchronous framework. Gevent seems like the perfect companion for that,at least that
is what we are going to use.
From
http://nichol.as/benchmark-of-python-web-servers
相关文章推荐
- Asynchronous Servers in Python
- python文件调用另一个python class文件中的方法
- double_linked_list in Python
- python -----example
- Leetcode_divide-two-integers(python version)
- [leetcode]Valid Number @ Python
- python 第6章抽象
- python 第6章抽象
- python 安装django
- Python修饰器的函数式编程
- Python inpu和raw_input的区别
- 修复升级python版本之后ibus输入法不显示控制面板和候选框的问题
- Python3打包成单个exe文件
- python--py2exe打包方法
- python--打包成可执行文件
- python错误解决:SyntaxError: Non-ASCII character '\xd3' in file crawler.py
- 莱斯大学Python课程第五周演示程序
- python的多态性学习思考1
- 运维的我要学开发--Python(4)
- 运维的我要学开发--Python(3)