您的位置:首页 > 编程语言 > Python开发

Benchmark of Python WSGI Servers

2014-05-01 22:48 405 查看


Benchmark of Python WSGI Servers

Nicholas Piël |
March 15,2010

It has been a while since the Socket Benchmark of Asynchronous server. That benchmark looked specifically at the raw socket performance of various frameworks,which was being benchmarked by doing a regular HTTP
request against the TCP server. The serveritself was dumb and did not actually understand the headers being send to it. In this benchmark I will be looking at how different WSGI servers
performat exactly that task; the handling of a full HTTP request.
I should immediately start with a word of caution. I tried my best to present an objective benchmark of the different WSGI servers. And I truly believe that a benchmark is one of the best methods to present an
unbiased comparison. However,a benchmark measures the performance on a very specific domain and it could very well be that this domain is slanted towards certain frameworks. But,if we keep that in mind we can actually put some measurements behind all those
‘faster than’ or ‘lighter than’ claims you will find everywhere. It is my opinion that such comparison claims without any detailed description of how they are measured are worse than a biased but detailed benchmark. The specific domain of this benchmark is,
yet again,the PingPong benchmark as used earlier in my Async Socket Benchmark. However,there are some differences:

We will fire multiple requests over a single connection,when possible,by using a HTTP 1.1 keepalive connection
It is a distributed benchmark with multiple clients
We will use an identical WSGI application for all servers instead of specially crafted code to return the reply
We expect the serverto understand our HTTP request and reply with the correct error codes

This benchmark is a conceptually simple one and you could claimthat this is not representable for most common web application which rely heavily on blocking database connections. I agree with that to some extent
as this is mostly the case. However,the push towards HTML5’s websockets and highly interactive web applications will require servers that are capable to serve lots of concurrent connections with low latency.


The benchmark

We will run the following WSGI application ‘pong.py’ on all servers.

def
application(environ,
start_response):
status
=
'200
OK'
output
=
'Pong!'
response_headers
=
[(
'Content-type'
,
'text/plain'
),
(
'Content-Length'
,
str
(
len
(output)))]
start_response(status,
response_headers)
return
[output]
We will also tune both client and serverby running the following commands. This basically enables the serverto open LOTS of concurrent connections.

echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_range

sysctl -w fs.file-max=128000

sysctl -w net.ipv4.tcp_keepalive_time=300

sysctl -w net.core.somaxconn=250000

sysctl -w net.ipv4.tcp_max_syn_backlog=2500

sysctl -w net.core.netdev_max_backlog=2500

ulimit -n 10240

The serveris a virtual machine with only one assigned processor. I have explicitly limited the amount of available processors to make sure that it is a fair comparison. Whether or not the serverscales over
multiple processors is an interesting and useful feature but this is not something I will measure in this benchmark. The reason for this is that it isn’t that difficult to scale up your application to multiple processors by using a reverse proxy and multiple
serverprocesses (this can even be managed for you by special applications such as Spawning or Grainbows).
The serverand clients run Debian Lenny with Python 2.6.4 on the amd64 architecture. I made sure that all WSGI servers have a backlog set of at least 500 and that (connection/error) logging is disabled,when this was not directly possible fromthe callable
I modified the library. The serverand the clients have 1GB of ram.
I benchmarked the HTTP/1.0 request rate of all serverand the HTTP/1.1 request rate on the subset of servers that support pipelining multiple requests over a single connection. While the lack of HTTP 1.1 keepalive
support is most likely a non issue in current deployment situations I expect it to become an important feature in applications that depend heavily on low latency connections. You should think about comet-style web applications or applications that use HTML5
websockets.
I categorize a serveras HTTP/1.1 capable by its behaviour,not by its specs. For example the Paster serversays that it has some support for HTTP 1.1 keep alives but I was unable to pipeline multiple requests. This
reported bug might be relevant to this situation and might apply to some of the other “HTTP 1.0 Servers”.
The benchmark will be performed by running a recompiled httperf (which
bypasses the static compiled file limit in the debian package) on 3 different specially setup client machines. To initialize the different request rates and aggregate the results I will use a tool called autobench.
Note: this is not ApacheBench (ab).
The command to benchmark HTTP/1.0 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=1

And the command for HTTP/1.1 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=10


The Contestants

Python is really rich with WSGI servers,i have made a selection of different servers which are listed below.
NameVersionhttp 1.1FlavourRepo.BlogCommunity
Gunicorn0.6.4Noprocessor/threadGIT?#gunicorn
uWSGITrunk (253)Yesprocessor/threadrepo?Mailing List
FAPWS30.3.1Noprocessor/threadGITWilliamOs4yGoogle Groups
Aspen0.8Noprocessor/threadSVNChad WhitacreGoogle Groups
Mod_WSGI3.1Yesprocessor/threadSVNGrahamDumpletonGoogle Groups
wsgirefPy 2.6.4Noprocessor/threadSVNNoneMailing List
CherryPy3.1.2Yesprocessor/threadSVNPlanet CherryPyPlanet,IRC
MagnumPy0.2Noprocessor/threadSVNMatt GattisGoogle Groups
Twisted10.0.0Yesprocessor/threadSVNPlanet TwistedCommunity
Cogen0.2.1Yescallback/generatorSVNMaries IonelGoogle Groups
GEvent0.12.2Yeslightweight threadsMercurialDenis BilenkoGoogle Groups
Tornado0.2Yescallback/generatorGITFacebookGoogle Groups
Eventlet0.9.6Yeslightweight threadsMercurialEventletMailinglist
ConcurrencetipYeslightweight threadsGITNoneGoogle Groups
Most of the information in this table should be rather straightforward,I specify the version benchmarked and whether or not the serverhas been found capable of HTTP 1.1. The flavour of the serverspecifies
the concurrency model the serveruses and I identify 3 different flavours:
Processor / Thread model
The p/t model is the most common flavour. Every requests runs in its own cleanly separated thread. A blocking request (such as a synchronous database call or a function call in a C extension) will not influence
other requests. This is convenient as you do not need to worry about how everything is implemented,but it does come at a price. The maximumamount of concurrent connections is limited by your number of workers or threads and this is known to scale badly when
you have the need for lots of concurrent users.
Callback / Generator model
The callback/generator model handles multiple concurrent connections in a single thread,thereby removing the thread barrier. A single blocking call will block the whole event loop however and has to be prevented.
The servers that have this flavour usually provide a threadpool to integrate blocking calls in their async framework or provide alternative non-blocking database connectors. In order to provide flow control this flavour uses callbacks or generators. Some think
that this is a beautiful way to create a formof event driven programming others think that it is snake pit that quickly changes your clean code to an entangled mess of callbacks or yield statements.
Lightweight Threads
The lightweight flavour uses greenlets to provide concurrency. This also works by providing concurrency froma single thread but in a less obtrusive way then with the callbacks or generator approach. But of course
one has to be careful with blocking connections as this will stop the event loop. To prevent this fromhappening,Eventlet and Gevent can monkeypatch the socket module to stop it fromblocking so when you are using a pure python database connector this should
never block the loop. Concurrence provides an asynchronous database adapter.


Implementation specifics for each WSGI server


Aspen

Ruby might be full with all kinds of rockstar programmers (whatever that might mean) but if i have to nominate just one Python programmer with some sort of ‘rockstar award’ i would definitely nominate Chad Whitacre.
Its not only the great tools he created; Testosterone,Aspen,Stephane. But mostly how he promotes themwith the mostawesome screencasts i
have ever seen.
Anyway,Aspen is a neat little Web serverwhich is also able to serve WSGI applications. It can be easily installed with ‘pip install aspen’ and uses a special directory structure for configuration and if you
want more information i amgoing to point you to his screencasts.


CherryPy



CherryPy
is actually an object oriented Python framework but features an excellent WSGI server. Installation can be done with a simple ‘pip install cherrypy’. I ran the following script to test out the performance of the WSGI server:

from
cherrypy
import
wsgiserver
from
pong
import
application
#
Here we set our application to the script_name '/'
wsgi_apps
=
[(
'/'
,
application)]
server
=
wsgiserver.CherryPyWSGIServer((
'0.0.0.0'
,
8070
),
wsgi_apps,request_queue_size
=
500
,
server_name
=
'localhost'
)
if
__name__
=
=
'__main__'
:
try
:
server.start()
except
KeyboardInterrupt:
server.stop()


Cogen

The code to have Cogen run a WSGI application is as follows:

from
cogen.web
import
wsgi
from
cogen.common
import
*
from
pong
import
application
m
=
Scheduler(default_priority
=
priority.LAST,
default_timeout
=
15
)
server
=
wsgi.WSGIServer(
(
'0.0.0.0'
,
8070
),
application,
m,
server_name
=
'pongserver'
)
m.add(server.serve)
try
:
m.run()
except
(KeyboardInterrupt,
SystemExit):
pass


Concurrence

Concurrence is an asynchronous framework under development by Hyves (you might call it
the Dutch Facebook) built upon Libevent (I used the latest stable version 1.4.13),I fired up the pongapplication as
follows:

from
concurrence
import
dispatch
from
concurrence.http
import
WSGIServer
from
pong
import
application
server
=
WSGIServer(application)
#
Concurrence has a default backlog of 512
dispatch(server.serve((
'0.0.0.0'
,
8080
)))


Eventlet

Eventlet is a full featured asynchronous framework which also provides WSGI serverfunctionality. It is
in development by Linden Labs (makers of Second Life). To run the application I used the following code:

import
eventlet
from
eventlet
import
wsgi
from
pong
import
application
wsgi.server(eventlet.listen(('',
8090
),
backlog
=
500
),
application,max_size
=
8000
)


FAPWS3

FAPWS3 is a WSGI serverbuild around the LibEV library
(I used version 3.43-1.1). When LibEV has been installed,FAPWS can be easily installed with pip. The philosophy behind FAPWS3 is to stay
the simplest and fastest webserver. The script I used to start up the WSGI application is as follows:

import
fapws._evwsgi
as evwsgi
from
fapws
import
base
from
pong
import
application
def
start():
evwsgi.start(
"0.0.0.0"
,
8080
)
evwsgi.set_base_module(base)
evwsgi.wsgi_cb((
"/"
,
application))
evwsgi.set_debug(
0
)
evwsgi.run()
if
__name__
=
=
"__main__"
:
start()


Gevent

Gevent is one of the best performing Async frameworks in my previous socket benchmark. Gevent extends Libevent
and uses its HTTP serverfunctionality extensively. To install Gevent you need Libevent installed after which you can pull in Gevent with PIP.

from
gevent
import
wsgi
from
pong
import
application
wsgi.WSGIServer(('',
8088
),
application,spawn
=
None
).serve_forever()
The above code will run the pongapplication without spawning a Greenlet on every request. If you leave out the argument ’spawn=None’ Gevent will spawn a Greenlet for every new request.


Gunicorn



Gunicorn stands for
‘Green Unicorn’,everybody knows that a unicorn is a mix of the the awesome narwhal and the
magnificent pony the green does however have nothing to do with the greatgreenlets as it really has a threaded flavour.
Installation is easy and can be done with a simple ‘pip install gunicorn’ Gunicorn provides you with a simple command to run wsgi applications,all I had to do was:

gunicorn -b :8000 -w 1 pong:application

Update: I had some suggestions in the comment section that using a single worker and having a client connect to the naked serveris not the correct way to work with Gunicorn.
So I took their suggestions and moved Gunicorn behind NGINX and increased the worker count to the suggested number of workers,2*N+1 where N is 1 which makes 3. The result of this is depicted in the graphs as gunicorn-3w.
The run Gunicorn with more workers can be done such as:

gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application


MagnumPy



MagnumPy has
to be the serverwith the most awesome name. This is still a very young project but its homepage is making some strong statements about its performance so it is worth testing out. It does not feel as polished as the other contestants and installing is basically
pushing the ‘magnum’ directory on your PYTHONPATH edit ‘./magnum/config.py’ after which you can start the serverby running ‘./magnum/serve.py start’

#config.py
import
magnum
import
magnum.http
import
magnum.http.wsgi
from
pongimport application
WORKER_PROCESSES
= 1
WORKER_THREADS_PER_PROCESS
= 1000
HOST
= ('',8050)
HANDLER_CLASS
= magnum.http.wsgi.WSGIWrapper(application)
DEBUG
= False
PID_FILE
= '/tmp/magnum.pid'


Mod_WSGI

Mod_WSGI is the successor of Mod_Python,it allows you to easily integrate Python code with the
Apache server. My first python web app experience was with mod_python and PSP templates,WSGI and cool frameworks such as Pylons have really made life a lot easier.
Mod_WSGI is a great way to get your application deployed quickly. Installing ‘Mod_WSGI’ is with most Linux distributions really easy. For example:

aptitude install libapache2-mod-wsgi

Is all you need to do on a pristine Debian distro to get a working Apache (MPM-Worker) serverwith Mod_WSGI enabled. To point Apache to your WSGI app just add a single line to ‘/etc/apache2/httpd.conf’:

WSGIScriptAlias / /home/nicholas/benchmark/wsgibench/pong.py

The problemis,that most people already have Apache installed and that they are using it for *shudder* serving PHP. PHP is not thread safe,meaning that you are forced to use a pre-forking Apache server. In
this benchmark I amusing the threaded Apache version and use mod_wsgi in embedded mode (as it gave me the best performance).
I disabled all unnecessary modules and configured Apache to provide me with a single worker,lots of threads and disabled logging (note: i tried various settings):

<IfModule
mpm_worker_module>
ServerLimit
1
ThreadLimit
1000
StartServers
1
MaxClients
1000
MinSpareThreads
25
MaxSpareThreads
75
ThreadsPerChild
1000
MaxRequestsPerChild
0
</IfModule>
CustomLog
/dev/null combined
ErrorLog
/dev/null


Paster

The Paster webserveris the webserverprovided with Python Paste it is Pylons default webserver. You can
run a WSGI application as follows:

from
pong
import
application
from
paste
import
httpserver
httpserver.serve(application,
'0.0.0.0'
,
request_queue_size
=
500
)


Tornado



Tornado is
the non-blocking webserverthat powers FriendFeed. It provides some WSGI serverfunctionality which can be used as described below. In the previous benchmark I have shown that it provides excellent raw-socket performance.

import
os
import
tornado.httpserver
import
tornado.ioloop
import
tornado.wsgi
import
sys
from
pong
import
application
sys.path.append(
'/home/nicholas/benchmark/wsgibench/'
)
def
main():
container
=
tornado.wsgi.WSGIContainer(application)
http_server
=
tornado.httpserver.HTTPServer(container)
http_server.listen(
8000
)
tornado.ioloop.IOLoop.instance().start()
if
__name__
=
=
"__main__"
:
main()


Twisted



After
installing Twisted with PIP you get a tool ‘twistd’ which allows you to easily serve WSGI applications fe:

wistd –pidfile=/tmp/twisted.pid -no web –wsgi=pong.application –logfile=/dev/null

But you can also run a WSGI application as follows:

from
twisted.web.server
import
Site
from
twisted.web.wsgi
import
WSGIResource
from
twisted.internet
import
reactor
from
pong
import
application
resource
=
WSGIResource(reactor,
reactor.getThreadPool(),application)
reactor.listenTCP(
8000
,Site(resource))
reactor.run()


uWSGI



uWSGI is
a serverwritten in C,it is not meant to run stand-alone but has to be placed behind a webserver. It provides modules for Apache,NGINX,Cherokee and Lighttpd. I have placed it behind NGINX which i configured as follows:

worker_processes
1;
events
{
worker_connections
30000;
}
http
{
include
mime.types;
default_type
application/octet-stream;
keepalive_timeout
65;
upstream
pingpong{
ip_hash;
server
unix:/var/nginx/uwsgi.sock;
}
server
{
listen
9090;
server_name
localhost;
location
/ {
uwsgi_pass
pingpong;
include
uwsgi_params;
}
error_page
500 502 503 504  /50x.html;
location
= /50x.html {
root
html;
}
}
}
This made NGINX listen on a unix socket,now all i needed to do was have uWSGI connect to that same unix socket,which i did with the following command:

./uwsgi -s /var/nginx/uwsgi.sock -i -H /home/nicholas/benchmark/wsgibench/ -M -p 1 -w pong-z 30 -l 500 -L


WsgiRef

WsgiRef is the default WSGI serverincluded with Python since version 2.5. To have this serverrun my application I use the following code which disables logging and increases the backlog.

from
pong
import
application
from
wsgiref
import
simple_server
class
PimpedWSGIServer(simple_server.WSGIServer):
#
To increase the backlog
request_queue_size
=
500
class
PimpedHandler(simple_server.WSGIRequestHandler):
#
to disable logging
def
log_message(
self
,
*
args):
pass
httpd
=
PimpedWSGIServer(('',
8000
),
PimpedHandler)
httpd.set_app(application)
httpd.serve_forever()


Results

Below you will find the results as plotted with Highcharts,the line will thicken when hovered over and you can easily enable or disable plotted results by clicking on the legend.


HTTP 1.0 Server results

01000200030004000

01000200030004000Reply
rate



Reply Rate


on an increasing amount of requests (more is better)

aspen

cherrypy

eventlet

fapws3

gevent

gunicorn

modwsgi

tornado

twisted

uwsgi

gunicorn-3w

Highcharts.com

Disqualified servers
Fromthe above graph it should be clear that some of the web servers are missing,the reason is that I was unable to have themcompletely benchmarked as they stopped replying when the request rate passed a certain
critical value. The servers that are missing are:

MagnumPy,i was able to obtain a reply rate of 500 RPS,but when the request rate passed the 700 RPS mark,MagnumPy crashed
Concurrence,I was able to obtain a successful reply rate of 700 RPS,but it stopped replying when we fired more than 800 requests a second at the server. However,since Concurrence does support
HTTP/1.1 keep alive connections and behaves correctly when benchmarked under a lower connection rate but higher request rate you can find its results in the HTTP/1.1 benchmark
Cogen,was able to obtain a reply rate of 800 per second but stopped replying when the request rate was above 1500 per second. It does have a complete benchmark under the HTTP/1.1 test though.
WSGIRef,I obtained a reply rate of 352 but it stopped reacting when we passed the 1900 RPS mark
Paster,obtained a reply rate of 500 but it failed when we passed the 2000 RPS mark

Interpretation
Fromthe servers that passed the benchmark we can see that they all have an admirable performance. At the bottomwe have Twisted and Gunicorn,the performance of Twisted is somewhat expected as well it isn’t
really tuned for WSGI performance. I find the performance of Gunicorn somewhat disappointing,also because for example Aspen which is a pure Python froma few years back,shows a significant better performance. We can see however,that increasing the worker
count does in fact improve the performance as it is able to obtain a reply rate competitive with Aspen.
The other pure python servers,CherryPy and Tornado seemto be performing on par with ModWSGI. It looks that CherryPy has a slight performance edge over Tornado. So,if you are thinking to change fromModWSGI
or CherryPy to Tornado because of increased performance you should think again. Not only does this benchmark show that there isn’t that much to gain. But you will also abandon the process/thread model meaning that you should be cautious with code blocking
your interpreter.
The top performers are clearly FAPWS3,uWSGI and Gevent. FAPWS3 has been designed to be fast and lives up the expectations,this has been noted by others as well as it looks like it is being used
in production at Ebay. uWSGI is used successfully in production at (and in development by) the Italian ISP Unbit. Gevent is a relatively young project but
already very successful. Not only did it performgreat in the previous async serverbenchmark but its reliance on the Libevent HTTP servergives it a performance beyond the other asynchronous frameworks.
I should note that the difference between these top 3 is too small to declare a clear winner of the ‘reply rate contest’. However,I want to stress that with almost all servers I had to be careful to keep the
amount of concurrent connections low since threaded servers aren’t that fond of lots concurrent connections. The async servers (Gevent,Eventlet,and Tornado) were happy to work on whatever was being thrown at them. This really gives a great feeling of stability
as you do not have to worry about settings such as poolsize,worker count etc..

01000200030004000

01000200030004000Response
Time (ms)



Response Time


on an increasing amount of requests (less is better)

aspen

cherrypy

eventlet

fapws3

gevent

gunicorn

modwsgi

tornado

twisted

uwsgi

gunicorn-3w

Highcharts.com

Most of the servers have an acceptable response time. Twisted and Eventlet are somewhat on the slow side but Gunicorn shows,unfortunately,a dramatic increase in latency when the request rate passes the 1000
RPS mark. Increasing the Gunicorn worker count lowers this latency by a lot but it still on the high side compared with for example Aspen or CherryPy.

01000200030004000

050100150200Error
Rate



Error Rate


on an increasing amount of requests (less is better)

aspen

cherrypy

eventlet

fapws3

gevent

gunicorn

modwsgi

tornado

twisted

uwsgi

gunicorn-3w

Highcharts.com

The low error rates for CherryPy,ModWSGI,Tornado,uWSGI should give everybody confidence in their suitability for a production environment.


HTTP 1.1 Server results

In the HTTP/1.1 benchmark we have a different list of contestants as not all servers were able to pipeline multiple requests over a single connection. In this test the connection rate is relatively low,for example
a request rate of 8000 per second is about 800 connections per second with 10 requests per connection. This means that some servers that were not able to complete the HTTP/1.0 benchmark (with connection rates up to 5000 per second) are able to complete the
HTTP/1.1 benchmark (Cogen and Concurrence for example).

02000400060008000

0200040006000800010000Request
rate



Succesful Reply Rate


on an increasing amount of requests (more is better)

uwsgi

modwsgi

cherrypy

twisted

cogen

gevent-spawn

gevent

tornado

eventlet

concurrence

Highcharts.com

This graph shows the achieved request rate of the servers and we can clearly see that the achieved request rate is higher than in the HTTP/1.0 test. We could increase the total request rate even more by increasing
the number of pipelined requests but this would then lower the connection rate. I think that 10 pipelined requests is a ok generalization of a webbrowser opening an average page.
The graph shows a huge gap in performance difference,with the fastest serverGevent we are able to obtain about 9000 replies per second,with Twisted,Concurrence and Cogen we get about 1000. In the middle we
have CherryPy and ModWSGI with themwe are able to obtain a reply rate around the 4000. It is interesting that Tornado while being close to CherryPy and ModWSGI seems to have an edge in this benchmark compared to the edge CherryPy had in the HTTP/1.0 benchmark.
This is along the lines of our expectations as pipelined requests in Tornado are cheaper (since it is Async) then in ModWSGI or CherryPy. We expect this gap to widen if we increase the number of pipelined requests. However,it falls to be seen how much of
a performance boost this would provide in a deployment setup as Tornado and CherryPy will then probably be sitting behind a reverse proxy,for example NGINX. In such a setting the connection type between the upstreamand the proxy is usually limited to HTTP/1.0,
NGINX for example does not even support HTTP/1.1 keep alive connections to its upstreams.
The best performers are clearly uWSGI and Gevent. I benchmarked Gevent with the ’spawn=none’ option to prevent Gevent fromspawning a Greenlet,this seems fair in a benchmark like this. However,when you want
to do something interesting with lots of concurrent connections you want each request to have its own Greentlet as this allows you to have thread like flow control. Thus I also benchmarked that version which can be seen in the Graph under the name ‘Gevent-Spawn’,
fromits results we can see that performance penalty is small.

02000400060008000

050010001500200025003000Response
Time (ms)



Response Time


on an increasing amount of requests (less is better)

uwsgi

modwsgi

cherrypy

twisted

cogen

gevent-spawn

gevent

tornado

eventlet

concurrence

Highcharts.com

Cogen is getting a high latency after about 2000 requests per second,Eventlet and Twisted show an increased latency fairly early as well.

02000400060008000

0510152025Error
Rate



Error Rate


on an increasing amount of requests (less is better)

uwsgi

modwsgi

cherrypy

twisted

cogen

gevent-spawn

gevent

tornado

eventlet

concurrence

Highcharts.com

The error rate shows that Twisted,Concurrence and Cogen have some trouble keeping up,I think all other error rates are acceptable.


Memory Usage

I also monitored the memory usage of the different frameworks during the benchmark. The benchmark noted below is the peak memory usage of all accumulated processes. As this benchmark does not really benefit from
additional processes (as there is only one available processor) I limited the amount of workers when possible.

36231223024771744276993233157

AspenCherryPyCogenConcurrenceEventletFAPWS3geventGunicornGunicorn-3wMagnumPyModWSGIPasterTornadoTwisteduWSGIWsgiRef

0255075100125150Memory
Usage (Megabytes)



Accumulated Peak Memory Usage per WSGI server


Highcharts.com

Fromthese results there is one thing that really stands out and that is the absolutely low memory usage of uWSGI,Gevent and FAPWS3. Especially if we take their performance into account. It looks like Cogen
is leaking memory,but I haven’t really looked into that. Gunicorn-3w shows compared with Gunicorn a relatively high memory usage. But it should be noted that this is mainly caused by the switch fromthe naked deployment to the deployment after NGINX as we
now also have to add the memory usage of NGINX. A single Gunicorn worker only takes about 7.5Mb of memory.


Let’s Kick it up a notch



The
first part of this post focussed purely on the RPS performance of the different frameworks under a high load. When the WSGI serverwas working hard enough it could simply answer all requests froma certain user and move on to the next user. This keeps the
amount of concurrent connections relatively low making such a benchmark suitable for threaded web servers.
However,if we are going to increase the amount of concurrent connections we will quickly run into systemlimits as explained in the introduction. This is commonly known as the C10K problem. Asynchronous servers
use a single thread to handle multiple connections and when efficiently implemented with for example EPoll or KQueue are perfectly able to handle a large amount of concurrent connections.
So that is what we are going to do,we are going to take the top-3 performing WSGI servers namely Tornado,Gevent and uWSGI (FAPWS3 lack of HTTP/1.1 support made it unsuitable for this benchmark) and give them
5 minutes of ping-pongmayhem.
You see,ping-pongis a simple game and it isn’t really the complexity that makes it interesting it is the speed and the reaction of the players. Now,what is 5 minutes of pingpongmayhem? Imagine that for 5
minutes long every second an Airbus loaded with ping-pongplayers lands (500 clients) and each of those players is going to slamyou exactly 12 balls (with a 5 second interval). This would mean that after 5 seconds you would already have to return the volleys
of 2000 different players at once.


Tsung Benchmark Setup

To performthis benchmark I amgoing to use Tsung,which is a multi-protocol distributed load
testing tool written in Erlang. I will then have 3 different machines simulating the ping-pongrampage. I used the following Tsung script.

<?xml
version="1.0"?>
<!DOCTYPE
tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []>
<tsung
loglevel="warning">
<clients>
<client
host="tsung2" use_controller_vm="false" maxusers="800"/>
<client
host="tsung3" use_controller_vm="false" maxusers="800"/>
<client
host="bastet" use_controller_vm="false" maxusers="800"/>
</clients>
<servers>
<server
host="tsung1" port="8000" type="tcp"/>
</servers>
<monitoring>
<monitor
host="tsung1" type="erlang"/>
</monitoring>
<load>
<arrivalphase
phase="1" duration="5" unit="minute">
<users
interarrival="0.002" unit="second"/>
</arrivalphase>
</load>
<sessions>
<session
name='wsgitest' probability='100'  type='ts_http'>
<for
from="0" to="12" incr="1" var="counter">
<request>
<http
url='http://tsung1:8000/' version='1.1' method='GET'/>
</request>
<thinktime
random='false' value='5'/>
</for>
</session>
</sessions>
</tsung>


Tsung Benchmark Results

0100200300400500600

05000100001500020000Concurrent
Connections



Concurrent Connections


measured over time (in seconds)

tornado

uwsgi

gevent

Highcharts.com

0250500

00.511.5


SystemLoad


Highcharts.com

0250500

0255075100125CPU
Usage



CPU Usage


Highcharts.com

0250500

7008009001000Memory
Free



Free memory


Highcharts.com

Let me first state that all the three frameworks are perfectly capable to handle this kind of load,none of the frameworks dropped connection or ignored requests. Which I must say is already quite an achievement,
considering that they had to handle about 2 million requests each.
Below the concurrent connection graph we can see the systemload,the cpu usage and the free memory on the systemduring the benchmark. We can clearly see that Gevent put less strain on the systemas the CPU
and Load graph indicate. In the memory graph we can see that all frameworks used a consistent amount of memory.
The readers that still pay close attention to this article should note that the memory graph displays 4 lines instead of 3. The fourth line is Gevent compiled against Libevent
2.0.4a,the new release of Libevent has been said to show considerable performance improvements
in its HTTP server. But it is still an alpha version and the memory graph shows that this version is leaking memory. Not something you want on your production site.

0100200300400500600

0100200300400500Response
Time (ms)



Server Latency


measured over time (in seconds)

tornado

uwsgi

gevent

Highcharts.com

The final graph shows the latency of the 3 frameworks we can see a clear difference between Tornado and its competitors as Tornado’s response time hovers around 100ms,uWSGI around 5ms and geventaround 3ms.
This is quite a difference and I amreally amazed by the low latency of both Gevent and uWSGI during this onslaught.


Summary and Remarks

The above results show that as a Python web developer we have lots of different methods to deploy our applications. Some of these seemto performbetter than others but by focussing only on serverperformance
I will not justify most of the tested servers as they differ greatly in functionality. Also,if you are going to take some stock web framework and won’t do any optimizations or caching,the performance of your webserveris not going to matter as this will
not be the bottleneck. If there is one thing which made this benchmark clear is that most Python Web servers offer great performance and if you feel things are slow the first thing to look at is really your own application.
When you are just interested in quickly hosting your threaded application you really can’t go wrong with Apache ModWSGI. Even though Apache ModWSGI might put a little more strain on your memory requirements there
is a lot to go for in terms of functionality. For example,protecting part of your website by using a LDAP serveris as easy as enabling a module. Standalone CherryPy also shows great performance and functionality and is really a viable (fully Python) alternative
which can lower memory requirements.
When you are a little more adventurous you can look at uWSGI and FAPWS3,they are relatively new compared to CherryPy and ModWSGI but they show a significant performance increase and do have lower memory requirements.
Concerning Tornado and performance,I do not think Tornado is an alternative for CherryPy or even ModWSGI. Not only does it hardly show any increases in performance but it also requires you to rethink your code.
But Tornado can be a great option if you do not have any code using blocking connections or are just wanting to look at something new.
And then there is Gevent,it really showed amazing performance at a low memory footprint,it might need some
adjustments to your legacy code but then again the monkey patching of the socket module could help and I really love the cleanness of Greenlets. There has already been some
reports of deploying Gevent successfully even with SQLAlchemy.
And if you want to dive into high performance websockets with lots of concurrent connections you really have to go with an asynchronous framework. Gevent seems like the perfect companion for that,at least that
is what we are going to use.
From
http://nichol.as/benchmark-of-python-web-servers
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: