您的位置:首页 > 运维架构 > Nginx

Emiller's Advanced Topics In Nginx Module Development

2012-07-18 09:44 197 查看
Emiller'sAdvancedTopicsInNginxModuleDevelopment

Emiller'sAdvancedTopicsInNginxModuleDevelopment

ByEvanMiller(withGrzegorzNosek)

DRAFT:August13,2009(changes)

WhereasEmiller'sGuideToNginxModuleDevelopmentdescribesthebread-and-butterissuesofwritingasimplehandler,filter,orload-balancerforNginx,thisdocumentcoversthreeadvancedtopicsfortheambitiousNginxdeveloper:sharedmemory,subrequests,andparsing.BecausethesearesubjectsontheboundariesoftheNginxuniverse,thecodeheremaybesparse.Theexamplesmaybeoutofdate.Buthopefully,youwillmakeitoutnotonlyalive,butwithafewextratoolsinyourbelt.

TableofContents

SharedMemoryA(fore)wordofcaution
Creatingandusingasharedmemorysegment
Usingtheslaballocator
Spinlocks,atomicmemoryaccess
Usingrbtrees

SubrequestsInternalredirect
Asinglesubrequest
Sequentialsubrequests
Parallelsubrequests

ParsingWithRagel*NEW*Installingragel
Callingragelfromnginx
Writingagrammar
Writingsomeactions
Puttingitalltogether

TODO

1.SharedMemory

GuestchapterwrittenbyGrzegorzNosek

Nginx,whilebeingunthreaded,allowsworkerprocessestosharememorybetweenthem.However,thisisquitedifferentfromthestandardpoolallocatorasthesharedsegmenthasfixedsizeandcannotberesizedwithoutrestartingnginxordestroyingitscontentsinanotherway.

1.1.A(fore)wordofcaution

Firstofall,caveathacker.Thisguidehasbeenwrittenseveralmonthsafterhands-onexperiencewithsharedmemoryinnginxandwhileItrymybesttobeaccurate(andhavespentsometimerefreshingmymemory),innowayisitguaranteed.You'vebeenwarned.

Also,100%ofthisknowledgecomesfromreadingthesourceandreverse-engineeringthecoreconcepts,sothereareprobablybetterwaystodomostofthestuffdescribed.

Oh,andthisguideisbasedon0.6.31,though0.5.xis100%compatibleAFAIKand0.7.xalsobringsnocompatibility-breakingchangesthatIknowof.

Forreal-worldusageofsharedmemoryinnginx,seemyupstream_fairmodule.

ThisprobablydoesnotworkonWindowsatall.Coredumpsintherearmirrorarecloserthantheyappear.

1.2.Creatingandusingasharedmemorysegment

Tocreateasharedmemorysegmentinnginx,youneedto:

provideaconstructorfunctiontoinitialisethesegment
call
ngx_shared_memory_add

Thesetwopointscontainthemaingotchas(thatIcameacross),namely:

Yourconstructorwillbecalledmultipletimesandit'suptoyoutofindoutwhetheryou'recalledthefirsttime(andshouldsetsomethingup),ornot(andshouldprobablyleaveeverythingalone).Theprototypeforthesharedmemoryconstructorlookslike:

staticngx_int_tinit(ngx_shm_zone_t*shm_zone,void*data);


Thedatavariablewillcontainthecontentsof
oshm_zone->data
,where
oshm_zone
isthe"old"shmzonedescriptor(moreaboutitlater).Thisvariableistheonlyvaluethatcansurviveareload,soyoumustuseitifyoudon'twanttolosethecontentsofyoursharedmemory.

Yourconstructorfunctionwillprobablylookroughlysimilartotheone
inupstream_fair,i.e.:

staticngx_int_t
init(ngx_shm_zone_t*shm_zone,void*data)
{
if(data){/*we'rebeingreloaded,propagatethedata"cookie"*/
shm_zone->data=data;
returnNGX_OK;
}

/*setupwhateverstructuresyouwishtokeepintheshm*/

/*initialiseshm_zone->datasothatweknowwehave
beencalled;ifnothinginterestingcomestoyourmind,try
shm_zone->shm.addror,ifyou'redesperate,(void*)1,justset
thevaluetosomethingnon-NULLforfutureinvocations
*/
shm_zone->data=something_interesting;

returnNGX_OK;
}


Youmustbecarefulwhentoaccesstheshmsegment.

Theinterfaceforaddingasharedmemorysegmentlookslike:

ngx_shm_zone_t*
ngx_shared_memory_add(ngx_conf_t*cf,ngx_str_t*name,size_tsize,
void*tag);


cf
isthereferencetotheconfigfile(you'llprobablycreatethesegmentinresponsetoaconfigoption),nameisthenameofthesegment(asa
ngx_str_t
,i.e.acountedstring),sizeisthesizeinbytes(whichwillusuallygetroundeduptothenearestmultipleofthepagesize,e.g.4KBonmanypopulararchitectures)andtagisa,well,tagfordetectingnamingconflicts.Ifyoucall
ngx_shared_memory_add
multipletimeswiththesamename,tagandsize,you'llgetonlyasinglesegment.Ifyouspecifydifferentnames,you'llgetseveraldistinctsegmentsandifyouspecifythesamenamebutdifferentsizeortag,you'llgetanerror.Agoodchoiceforthetagvaluecouldbee.g.thepointertoyourmoduledescriptor.

Afteryoucall
ngx_shared_memory_add
andreceivethenew
shm_zone
descriptor,youmustsetuptheconstructorin
shm_zone->init
.Wait...afteryouaddthesegment?Yes,andthat'samajorgotcha.Thisimpliesthatthesegmentisnotcreatedwhilecalling
ngx_shared_memory_add
(becauseyouspecifytheconstructoronlylater).Whatreallyhappenslookslikethis(grosslysimplified):

parsethewholeconfigfile,notingrequestedshmsegments

afterwards,create/destroyallthesegmentsinonego

Theconstructorsarecalledhere.Notethateverytimeyourctoriscalled,itiswithanothervalueof
shm_zone
.Thereasonisthatthedescriptorlivesaslongasthecycle(generationinApacheterms)whilethesegmentlivesaslongasthemasterandalltheworkers.Toletsomedatasurviveareload,youhaveaccesstotheolddescriptor's
->data
field(mentionedabove).

(re)startworkerswhichbeginhandlingrequests

uponreceiptofSIGHUP,goto1

Also,youreallymustsettheconstructor,otherwisenginxwillconsider
yoursegmentunusedandwon'tcreateitatall.

Nowthatyouknowit,it'sprettyclearthatyoucannotrelyonhavingaccesstothesharedmemorywhileparsingtheconfig.Youcanaccessthewholesegmentas
shm_zone->shm.addr
(whichwillbeNULLbeforethesegmentgetsreallycreated).Anyaccessafterthefirstparsingrun(e.g.insiderequesthandlersoronsubsequentreloads)shouldbefine.

1.3.Usingtheslaballocator

Nowthatyouhaveyournewandshinyshmsegment,howdoyouuseit?Thesimplestwayistouseanothermemorytoolthatnginxhasatyourdisposal,namelytheslaballocator.Nginxisniceenoughtoinitialisetheslabforyouineverynewshmsegment,soyoucaneitheruseit,orignoretheslabstructuresandoverwritethemwithyourowndata.

Theinterfaceconsistsoftwofunctions:

void*ngx_slab_alloc(ngx_slab_pool_t*pool,size_tsize);

voidngx_slab_free(ngx_slab_pool_t*pool,void*p);

Thefirstargumentissimply
(ngx_slab_pool_t*)shm_zone->shm.addr
andtheotheroneiseitherthesizeoftheblocktoallocate,orthepointertotheblocktofree.(trivia:notonceis
ngx_slab_free
calledinvanillanginxcode)

1.4.Spinlocks,atomicmemoryaccess

Rememberthatsharedmemoryisinherentlydangerousbecauseyoucanhavemultipleprocessesaccessingitatthesametime.Theslaballocatorhasaper-segmentlock(
shpool->mutex
)whichisusedtoprotectthesegmentagainstconcurrentmodifications.

Youcanalsoacquireandreleasethelockyourself,whichisusefulifyouwanttoimplementsomemorecomplicatedoperationsonthesegment,likesearchingorwalkingatree.Thetwosnippetsbelowareessentiallyequivalent:

/*
void*new_block;
ngx_slab_pool_t*shpool=(ngx_slab_pool_t*)shm_zone->shm.addr;
*/

new_block=ngx_slab_alloc(shpool,ngx_pagesize);


ngx_shmtx_lock(&shpool->mutex);
new_block=ngx_slab_alloc_locked(shpool,ngx_pagesize);
ngx_shmtx_unlock(&shpool->mutex);


Infact,ngx_slab_alloclooksalmostexactlylikeabove.

Ifyouperformanyoperationswhichdependonnonewallocations(or,moretothepoint,frees),protectthemwiththeslabmutex.However,rememberthatnginxmutexesareimplementedasspinlocks(non-sleeping),sowhiletheyareveryfastintheuncontendedcase,theycaneasilyeat100%CPUwhenwaiting.Sodon'tdoanylong-runningoperationswhileholdingthemutex(especiallyI/O,butyoushouldavoidanysystemcallsatall).

Youcanalsouseyourownmutexesformorefine-grainedlocking,viathe
ngx_mutex_init()
,
ngx_mutex_lock()
and
ngx_mutex_unlock()
functions.

Asanalternativeforlocks,youcanuseatomicvariableswhichareguaranteedtobereadorwritteninanuninterruptibleway(noworkerprocessmayseethevaluehalfwayasit'sbeingwrittenbyanotherone).

Atomicvariablesaredefinedwiththetype
ngx_atomic_t
or
ngx_atomic_uint_t
(dependingonsignedness).Theyshouldhaveatleast32bits.Tosimplyreadorunconditionallysetanatomicvariable,youdon'tneedanyspecialconstructs:

ngx_atomic_ti=an_atomic_var;
an_atomic_var=i+5;


Notethatanythingcanhappenbetweenthetwolines;contextswitches,executionofcodeonotherotherCPUs,etc.

Toatomicallyreadandmodifyavariable,youhavetwofunctions(veryplatform-specific)withtheirinterfacedeclaredin
src/os/unix/ngx_atomic.h
:

ngx_atomic_cmp_set(lock,old,new)


Atomicallyretrievesoldvalueof
*lock
andstores
new
underthesame
address.Returns1if
*lock
wasequalto
old
beforeoverwriting.

ngx_atomic_fetch_add(value,add)


Atomicallyadds
add
to
*value
andreturnstheold
*value
.

1.5.Usingrbtrees

OK,youhaveyourdataneatlyallocated,protectedwithasuitablelockbutyou'dalsoliketoorganiseitsomehow.Again,nginxhasaverynicestructurejustforthispurpose-ared-blacktree.

Highlights(API-wise):

requiresaninsertioncallback,whichinsertstheelementinthe
tree(probablyaccordingtosomepredefinedorder)andthencalls
ngx_rbt_red(the_newly_added_node)
torebalancethetree
requiresallleavestobesettoapredefinedsentinelobject(notNULL)
Thischapterisaboutsharedmemory,notrbtreessoshoo!Goreadthesourceforupstream_fairtoseecreatingandwalkinganrbtreeinaction.

2.Subrequests

SubrequestsareoneofthemostpowerfulaspectsofNginx.Withsubrequests,youcanreturntheresultsofadifferentURLthanwhattheclientoriginallyrequested.Somewebframeworkscallthisan"internalredirect."ButNginxgoesfurther:notonlycanmodulesperformmultiplesubrequestsandcombinetheoutputsintoasingleresponse,subrequestscanperformtheirownsub-subrequests,andsub-subrequestscaninitiatesub-sub-subrequests,and...yougettheidea.Subrequestscanmaptofilesontheharddisk,otherhandlers,orupstreamservers;itdoesn'tmatterfromtheperspectiveofNginx.AsfarasIknow,onlyfilterscanissuesubrequests.

2.1.Internalredirects

IfallyouwanttodoisreturnadifferentURLthanwhattheclientoriginallyrequested,youwillwanttousethe
ngx_http_internal_redirect
function.Itsprototypeis:

ngx_int_t
ngx_http_internal_redirect(ngx_http_request_t*r,ngx_str_t*uri,ngx_str_t*args)
Where
r
istherequeststruct,and
uri
and
args
arethenewURI.NotethatURIsmustbelocationsalreadydefinedinnginx.conf;youcannot,forinstance,redirecttoanarbitrarydomain.Handlersshouldreturnthereturnvalueof
ngx_http_internal_redirect
,i.e.redirectinghandlerswilltypicallyendlike



returnngx_http_internal_redirect(r,&uri,&args);


Internalredirectsareusedinthe"index"module(whichmapsURLsthatendin/toindex.html)aswellasNginx'sX-Accel-Redirectfeature.

2.2.Asinglesubrequest

Subrequestsaremostusefulforinsertingadditionalcontentbasedondatafromtheoriginalresponse.Forexample,theSSI(server-sideinclude)moduleusesafiltertoscanthecontentsofthereturneddocument,andthenreplaces"include"directiveswiththecontentsofthespecifiedURLs.

We'llstartwithasimplerexample.We'llmakeafilterthattreatstheentirecontentsofadocumentasaURLtoberetrieved,andthenappendsthenewdocumenttotheURLitself.RememberthattheURLmustbealocationinnginx.conf.



staticngx_int_t
ngx_http_append_uri_body_filter(ngx_http_request_t*r,ngx_chain_t*in)
{
intrc;
ngx_str_turi;
ngx_http_request_t*sr;

/*FirstcopythedocumentbufferintotheURIstring*/
uri.len=in->buf->last-in->buf->pos;
uri.data=ngx_palloc(r->pool,uri.len);
if(uri.data==NULL)
returnNGX_ERROR;
ngx_memcpy(uri.data,in->-buf->pos,uri.len);

/*Nowreturntheoriginaldocument(i.e.theURI)totheclient*/
rc=ngx_http_next_body_filter(r,in);

if(rc==NGX_ERROR)
returnrc;

/*Finallyissuethesubrequest*/
returnngx_http_subrequest(r,&uri,NULL/*args*/,
NULL/*callback*/,0/*flags*/);
}


Theprototypeof
ngx_http_subrequest
is:



ngx_int_tngx_http_subrequest(ngx_http_request_t*r,
ngx_str_t*uri,ngx_str_t*args,ngx_http_request_t**psr,
ngx_http_post_subrequest_t*ps,ngx_uint_tflags)


Where:

*r
istheoriginalrequest
*uri
and
*args
refertothesub-request
**psr
isareferencetoaNULLpointerthatwillpointtothenew(sub-)requeststructure
*ps
isacallbackforwhenthesubrequestisfinished.I'veneverusedthis,butseehttp/ngx_http_request.hfordetails.
flags
canbeabitwise-OR'edcombinationof:
NGX_HTTP_ZERO_IN_URI
:theURIcontainsacharacterwithASCIIcode0(alsoknownas'\0'),orcontains"%00"
NGX_HTTP_SUBREQUEST_IN_MEMORY
:storetheresultofthesubrequestinacontiguouschunkofmemory(usuallynotnecessary)

Theresultsofthesubrequestwillbeinsertedwhereyouexpect.Ifyouwanttomodifytheresultsofthesubrequest,youcanuseanotherfilter(orthesameone!).Youcantellwhetherafilterisoperatingontheprimaryrequestorasubrequestwiththistest:



if(r==r->main){
/*primaryrequest*/
}else{
/*subrequest*/
}


Thesimplestexampleofamodulethatissuesasinglesubrequestisthe"addition"module.

2.3.Sequentialsubrequests

Note,8/13/2009:ThissectionmaybeoutofdateduetochangesinNginx'ssubrequestprocessingintroducedinNginx0.7.25.Followatyourownrisk.-EMYoumightthinkissuingmultiplesubrequestsisassimpleas:



intrc1,rc2,rc3;
rc1=ngx_http_subrequest(r,uri1,...);
rc2=ngx_http_subrequest(r,uri2,...);
rc3=ngx_http_subrequest(r,uri3,...);


You'dbewrong!RememberthatNginxissingle-threaded.Subrequestsmightneedtoaccessthenetwork,andifso,Nginxneedstoreturntoitsotherworkwhileitwaitsforaresponse.Soweneedtocheckthereturnvalueof
ngx_http_subrequest
,whichcanbeoneof:

NGX_OK
:thesubrequestfinishedwithouttouchingthenetwork
NGX_DONE
:theclientresetthenetworkconnection
NGX_ERROR
:therewasaservererrorofsomesort
NGX_AGAIN
:thesubrequestrequiresnetworkactivity
Ifyoursubrequestreturns
NGX_AGAIN
,yourfiltershouldalsoimmediatelyreturn
NGX_AGAIN
.Whenthatsubrequestfinishes,andtheresultshavebeensenttotheclient,Nginxisniceenoughtocallyourfilteragain,fromwhichyoucanissuethenextsubrequest(ordosomeworkinbetweensubrequests).Ithelps,ofcourse,tokeeptrackofyourplannedsubrequestsinacontextstruct.Youshouldalsotakecaretoreturnerrorsimmediately,too.

Let'smakeasimpleexample.SupposeourcontextstructcontainsanarrayofURIs,andtheindexofthenextsubrequest:



typedefstruct{
ngx_array_turis;
inti;
}my_ctx_t;


ThenafilterthatsimplyconcatenatesthecontentsoftheseURIstogethermightsomethinglooklike:



staticngx_int_t
ngx_http_multiple_uris_body_filter(ngx_http_request_t*r,ngx_chain_t*in)
{
my_ctx_t*ctx;
intrc=NGX_OK;
ngx_http_request_t*sr;

if(r!=r->main){/*subrequest*/
returnngx_http_next_body_filter(r,in);
}

ctx=ngx_http_get_module_ctx(r,my_module);
if(ctx==NULL){
/*populatectxandctx->urishere*/
}
while(rc==NGX_OK&&ctx->i<ctx->uris.nelts){
rc=ngx_http_subrequest(r,&((ngx_str_t*)ctx->uris.elts)[ctx->i++],
NULL/*args*/,&sr,NULL/*cb*/,0/*flags*/);
}

returnrc;/*NGX_OK/NGX_ERROR/NGX_DONE/NGX_AGAIN*/
}


Let'sthinkthiscodethrough.Theremightbemoregoingonthanyouexpect.

First,thefilteriscalledontheoriginalresponse.Basedonthisresponsewepopulate
ctx
and
ctx->uris
.Thenweenterthewhileloopandcall
ngx_http_subrequest
forthefirsttime.

If
ngx_http_subrequest
returnsNGX_OKthenwemoveontothenextsubrequestimmediately.IfitreturnswithNGX_AGAIN,webreakoutofthewhileloopandreturnNGX_AGAIN.

Supposewe'vereturnedanNGX_AGAIN.Thesubrequestispendingsomenetworkactivity,andNginxhasmovedontootherthings.Butwhenthatsubrequestisfinished,Nginxwillcallourfilteratleasttwomoretimes:

oncewith
r
settothesubrequest,and
in
settobuffersfromthesubrequest'sresponse
oncewith
r
settotheoriginalrequest,and
in
settoNULL
Todistinguishthesetwocases,wemusttestwhether
r==r->main
.Inthisexamplewecallthenextfilterifwe'refilteringthesubrequest.Butifwe'reinthemainrequest,we'lljustpickupthewhileloopwherewelastleftoff.
in
willbesettoNULLbecausetherearen'tactuallyanynewbufferstoprocess.

Whenthelastsubrequestfinishesandalliswell,wereturnNGX_OK.

Thisexampleisofcoursegreatlysimplified.You'llhavetofigureouthowtopopulate
ctx->uris
onyourown.Buttheexampleshowshowsimpleitistore-enterthesubrequestingloop,andbreakoutassoonaswegetanerroror
NGX_AGAIN
.

2.4.Parallelsubrequests

It'salsopossibletoissueseveralsubrequestsatoncewithoutwaitingforprevioussubrequeststofinish.Thistechniqueis,infact,tooadvancedevenforEmiller'sAdvancedTopicsinNginxModuleDevelopment.SeetheSSImoduleforanexample.

3.ParsingwithRagel

Ifyourmoduleisdealingwithanykindofinput,beitanincomingHTTPheaderorafull-blowntemplatelanguage,youwillneedtowriteaparser.Parsingisoneofthosethingsthatseemseasy—howhardcanitbetoconvertastringintoastruct?—butthereisdefinitelyarightwaytoparseandawrongwaytoparse.Unfortunately,Nginxsetsabadexamplebychoosing(whatIfeelis)thewrongway.

What'swrongwithNginx'sparsingcode?



/*RandomsnippetfromNginxparsingcode*/

for(p=ctx->pos;p<last;p++){
ch=*p;

switch(state){
casessi_tag_state:
switch(ch){
case'!':
/*dostuff*/
...


Nginxdoesallofitsparsing,whetherofSSIincludes,HTTPheaders,orNginxconfigurationfiles,usingstatemachines.Astatemachine,youmightrecallfromyourcollegeTheoryofComputationclass,readsatapeofcharacters,movesfromstatetostatebasedonwhatitreads,andmightperformsomeactionbasedonwhatcharacteritreadsandwhatstateitisin.Soforexample,ifIwantedtoparsepositivedecimalpointnumberswithastatemachine,Imighthavea"readingstuffleftoftheperiod"state,a"justreadaperiod"state,anda"readingstuffrightoftheperiod"state,andmoveamongthemasIreadineachdigit.

Unfortunately,statemachineparsersareusuallyverbose,complex,hardtounderstand,andhardtomodify.Fromasoftwaredevelopmentpointofview,abetterapproachistouseaparsergenerator.Aparsergeneratortranslateshigh-level,highlyreadableparsingrulesintoalow-levelstatemachine.Thecompiledcodefromaparsergeneratorisvirtuallythesameasthatofhandwrittenstatemachine,butyourcodeismucheasiertoworkwith.

Thereareanumberofparsergeneratorsavailable,eachwiththeirownspecialsyntaxes,butIamgoingtofocusononeparsergeneratorinparticular:Ragel.Ragelisagoodchoicebecauseitwasdesignedtoworkwithbufferedinputs.GivenNginx'sbuffer-chainarchitecture,thereisaverygoodchancethatyouwillparsingabufferedinput,whetheryoureallywanttoornot.

3.1.InstallingRagel

Useyoursystem'spackagemanagerorelsedownloadRagelfromhere.

3.2.CallingRagelfromNginx

It'sagoodideatoputyourparserfunctionsinaseparatefilefromtherestofthemodule.Youwillthenneedto:

Createaheader(
.h
)filefortheparser
Includetheheaderfromyourmodule
CreateaRagel(
.rl
)file
GenerateaC(
.c
)filefromtheRagelfile
IncludetheCfileinyourmoduleconfig
Theheaderfileshouldjusthaveprototypeforparserfunctions,whichyoucanincludeinyourmoduleviatheusual
#include"my_module_parser.h"
directive.TherealworkiswritingtheRagelfile.Wewillworkthroughasimpleexample.TheofficialRagelUserGuide(PDFavailablehere)isfully56pageslongandgivestheprogrammertremendouspower,butwewilljustgothroughthepartsofRagelyoureallyneedforasimpleparser.

RagelfilesareCfilesinterspersedwithspecialRagelcommandsandfunctions.Ragelcommandsareinblocksofcodesurroundedby
%%{
and
}%%
.ThefirsttwoRagelcommandsyouwillwantinyourparserare:



%%{
machinemy_parser;
writedata;
}%%


Thesetwocommandsshouldappearafteranypre-processordirectivesbutbeforeyourparserfunction.The
machine
commandgivesanametothestatemachineRagelisabouttobuildforyou.The
write
commandwillcreatethestatedefinitionsthatthestatemachinewilluse.Don'tworryaboutthesecommandstoomuch.

NextyoucanstartwritingyourparserfunctionasregularC.Itcantakeanyargumentsyouwantandshouldreturnan
ngx_int_t
with
NGX_OK
uponsuccessand
NGX_ERROR
uponfailure.Youshouldpassin,ifnotapointertotheinputyouwanttoparse,thenatleastsomekindofcontextstructthatcontainstheinputdata.

Ragelwillcreateanumberofvariablesimplicitlyforyou.OthervariablesyouneedtodefineyourselfinordertouseRagel.Atthetopofyourfunction,youneedtodeclare:

u_char*p
-pointertothebeginningoftheinput
u_char*pe
-pointertotheendoftheinput
intcs
-anintegerwhichstoresthestatemachine'sstate
Ragelwillstartitsparsingwherever
p
points,andfinishupassoonasitreaches
pe
.Therefore
p
and
pe
shouldbothbepointersonacontiguouschunkofmemory.NotethatwhenRagelisfinishedrunningonaparticularinput,youcansavethevalueof
cs
(themachinestate)andresumeparsingonadditionalinputbuffersexactlywhereyouleftoff.InthiswayRagelworksacrossmultipleinputbuffersandfitsbeautifullyintoNginx'sevent-drivenarchitecture.

3.3.Writingagrammar

NextwewanttowritetheRagelgrammarforourparser.Agrammarisjustasetofrulesthatspecifieswhichkindsofinputareallowed;aRagelgrammarisspecialbecauseitallowsustoperformactionsaswescaneachcharacter.TotakeadvantageofRagel,youmustlearntheRagelgrammarsyntax;itisnotdifficult,butitisnottrivial,either.

Ragelgrammarsaredefinedbysetsofrules.Arulehasanarbitrarynameontheleftsideofanequalssignandaspecificationontherightside,followedbyasemicolon.Therulespecificationisamixtureofregularexpressionsandactions.Wewillgettoactionsinaminute.

Themostimportantruleiscalled"main."Allgrammarsmusthavearuleformain.Theruleformainisspecialinthat1)thenameisnotarbitraryand2)ituses
:=
insteadof
=
toseparatethenamefromthespecification.

Let'sstartwithasimpleexample:aparserforprocessingRangerequests.Thiscodeisadaptedfrommymod_zipmodule,whichalsoincludesamorecomplicatedparserforprocessinglistsoffiles,ifyouareinterested.

The"main"ruleforourbyterangeparserisquitesimple:



main:="bytes="byte_range_set;


Thatrulejustsays"theinputshouldconsistofthestring
bytes=
followedbyinputwhichfollowstherulecalled
byte_range_set
."Soweneedtodefinetherule
byte_range_set
:



byte_range_set=byte_range_specs(","byte_range_specs)*;


Thatrulejustsays"
byte_range_set
consistsofa
byte_range_specs
followedbyzeroormorecommaseachfollowedbya
byte_range_specs
."Inotherwords,a
byte_range_set
isacomma-separatedlistof
byte_range_specs
's.Youmightrecognizethe
*
asaKleenestarorfromregularexpressions.

Nextweneedtodefinethe
byte_range_specs
rule:



byte_range_specs=byte_range_spec>new_range;


The
>
characterisspecial.Itsaysthat
new_range
isnotthenameofanotherrule,butthenameofanaction,andtheactionshouldbetakenatthebeginningofthisrule,i.e.thebeginningof
byte_range_specs
.Themostimportantspecialcharactersare:

>
-actionshouldbetakenatthebeginningofthisrule
$
-actionshouldbetakenaseachcharacterisprocessed
%
-actionshouldbetakenattheendofthisrule
Thereareothersaswell,whichyoucanreadaboutintheRagelUserGuide.Theseareenoughtogetyoustartedwithoutbeingtooconfusing.

Beforewegetintoactions,let'sfinishdefiningourrules.Intherulefor
byte_range_specs
(plural),wereferredtoarulecalled
byte_range_spec
(singular).Itisdefinedas:



byte_range_spec=[0-9]+$start_incr
"-"
[0-9]+$end_incr;


Thisrulestates"readoneormoredigits,executingtheaction
start_incr
foreach,thenreadadash,thenreadoneormoredigits,executingtheaction
end_incr
foreach."Noticethatnoactionsaretakenatthebeginningorendof
byte_range_spec
.

Whenyouareactuallywritingagrammar,youshouldwritetherulesinreverseorderofwhatIhavehere.Rulesshouldreferonlytootherrulesthathavebeenpreviouslydefined.So"main"shouldalwaysbethelastruleinyourgrammar,notthefirst.

Ourbyte-rangegrammarisnowfinished;it'stimetospecifytheactions.

3.4.Writingsomeactions

ActionsarechunksofCcodewhichhaveaccesstoafewspecialvariables.Themostimportantspecialvariablesare:

fc
-thecurrentcharacterbeingread
fpc
-apointertothecurrentcharacterbeingread
fc
ismostusefulfor
$
actions,i.e.actionsperformedoneachcharacterofastringorregularexpression.
fpc
ismoreusefulfor
>
and
%
actions,thatis,actionstakenatthestartorendofarule.

Toreturntoourbyte-rangeexample,hereisthe
new_range
action.Itdoesnotuseanyspecialvariables.



actionnew_range{
if((range=ngx_array_push(&ctx->ranges))==NULL){
returnNGX_ERROR;
}
range->start=0;range->end=0;
}


new_range
issurprisinglydull.Itjustallocatedanew"range"structonthe"ranges"arraystoredinourcontextstruct.Noticethataslongasweincludetherightheaderfiles,RagelactionshavefullaccesstotheNginxAPI.

Nextwedefinethetworemainingactions,
start_incr
and
end_incr
.Theseactionsparsepositiveintegersintotheappropriatevariables.Aswereadeachdigitofanumber,wewanttomultiplythestorednumberby10andaddthedigit.Herewetakeadvantageofthespecialvariable
fc
describedabove:



actionstart_incr{range->start=range->start*10+(fc-'0');}

actionend_incr{range->end=range->end*10+(fc-'0');}


Notetheoldparsingtrickofsubtracting'0'toconvertacharactertoaninteger.

That'sitforactions.Wearealmostfinishedwithourparser.

3.5.Puttingitalltogether

ActionsandthegrammarshouldgoinsideaRagelblockinsideyourparserfunction,butafterthedeclarationsof
p
,
pe
,and
cs
.I.e.,somethinglike:



ngx_int_tmy_parser(/*somecontext,therequeststruct,etc.*/)
{
intcs;
u_char*p=ctx->beginning_of_data;
u_char*pe=ctx->end_of_data;

%%{
/*Actions*/
actionsome_action{/*Ccodegoeshere*/}
actionother_action{/*etc.*/}

/*Grammar*/
my_rule=[0-9]+"-">some_action;
main:=(my_rule)*>other_action;

writeinit;
writeexec;
}%%

if(cs<my_parser_first_final){
returnNGX_ERROR;
}

returnNGX_OK;
}


We'veaddedafewextrapieceshere.Thefirstare
writeinit
and
writeexec
.ThesearecommandstoRageltoinsertthegeneratedparser(writteninC)rightthere.

Theotherextrabitisthecomparisonof
cs
to
my_parser_first_final
.Recallthat
cs
storestheparser'sstate.Thischeckensuresthattheparserisinavalidstateafterithasfinishedprocessinginput.Ifweareparsingacrossmultipleinputbuffers,theninsteadofthischeckwewillstore
cs
somewhereandretrieveitwhenwewanttocontinueparsing.

Finally,wearereadytogeneratetheactualparser.Thecodewe'vewrittensofarshouldbeinaRagel(
.rl
)file;whenwe'rereadytocompile,wejustrunthecommand:



ragelmy_parser.rl


Thiscommandwillproduceafilecalled"my_parser.c".ToensurethatitiscompiledbyNginx,youthenneedtoaddalinetoyourmodule's"config"file,likethis:



NGX_ADDON_SRCS="$NGX_ADDON_SRCS$ngx_addon_dir/my_parser.c"


OnceyougetthehangofparsingwithRagel,youwillwonderhowyoueverdidwithoutit.YouwillactuallywanttowriteparsersinyourNginxmodules.Ragelopensupawholenewsetofpossiblemodulestotheimaginativedeveloper.

4.TODO:AdvancedTopicsNotYetCoveredHere

Topicsnotyetcoveredinthisguide:

Parallelsubrequests
Built-indatastructures(red-blacktrees,arrays,hashtables...)
Accesscontrolmodules
BacktoEvanMiller'sHomepage

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
章节导航