您的位置:首页 > 其它

集群文件系统 moosefs 安装配置 容灾恢复

2013-06-03 17:33 537 查看

http://contrib.meharwal.com/home/moosefs


LargeScaleDataStoragewithMooseFS(MFS)

LastUpdated:June/21/2011(Commentsto:contrib+moosefs@meharwal.com)
MooseFSisafault
tolerant,largescalenetworkdistributedfilesystemavailableforUnix/Linuxcompatibleenvironment.Itishorizontallyscalablefilesystem,thatiseasy
tosetupbyusingcommodityhardware.ItsupportsPOSIXcomplianceandmountsfilesystemusingFUSEdriver,
commonlysupportedbyvariousUnix/Linuxdistributions.Ituses,anyofyourfavoritenativefilesystem(ext3,ext4,xfs,zfs,...)underneathandbuildsontopofit.ItfitsrightinyourUNIX/Linuxnetworkedenvironmentbymountingfilesystemusingmount,
systemfstabfileorubiquitousautomounter.SomeoftheexcitingfaulttolerantfeaturesofMooseFSinclude,N
copiesofdatareplication,trashbin(recoveryofdeletedfiles)andsnapshots,configurableataverygranularlevelofafilesystem.VisitMooseFSweb
sitehttp://www.moosefs.org,fordetailedfeaturessetandarchitecture.Largescalenetworkbaseddistributedfilesystemclusterisanevolvingfield.Thesefilesystemshavetheirownstrengthsandweaknesses.Someofthewellknowndistributedfilesystemsarelistedhere.Network
filesystemisacorecomponentoflargescaleapplicationframeworksandsetupsimplicityisoneoftheimportantfactorforanystorageadministrator,inordertochooseasystem.Inourexperience,MooseFSshinesinterms
ofperformance,massivescalabilityusingcommodityhardware,POSIXcompliance,usefulfeaturelists,quickrecoveryandsimplicityofsetup.Contents1MooseFSArchitecture2MFSClusterSetup2.1CGIstatusviewerSetup2.2MasterServerSetup2.3MetaloggerServerSetup2.4ChunkServerSetup2.5MountingFilesystemonClientHosts:3MFSClusterMaintenanceandOperation3.1Goals(Replicationsetup)3.2Trashbinanddataquarantinetimesetup3.3Metadatabackup,recoveryandredundancy3.4Fewthingstoknow3.5OtherGoodies4MFSMetadataMaintenance(DisasterRecovery)4.1FewBasicsaboutmetadata4.2Recoveryfromcrashedmasterserver4.3Movingmastermetadataserverinaplannedmanner.4.4RecoveryfromMetaloggerserverdata.4.5Additionalcopiesofmetadata.4.6Speedingupchunkreplicationandre-balancing.5Wrappingup6Conclusion.


MooseFSArchitecture

MooseFSconistsofMaster,
Metalogger,ChunkandClienthosts.Ifdesired,thesecomponentscanco-existsonasingleserver.MasterServeris
responsibleforafilesystemmeta-data.AllotherMFScomponentscommunicatewithmasterserveratspecifiednetworkports.Current
version(1.6.20)ofMooseFSdoesnotprovideactive/activeredundancyformasterserver,soitisasinglepointoffailure.Howevermovingmasterwithabackupmetaloggerserver(manually)isaquickandeasyprocess.Master
serverkeepsacopyofallcurrentmetadatainmemory(RAM),thismakesmetadataaccessveryfast.Itisimportanttosetupmasterserveronacomputerwithdisksredundancy(suchasRAID-1orhigher)
andECCmemory.MetaloggerServer(s)areoptionalbutabsolutelyrecommendedfordisasterrecovery.Metaloggerspassivelyreplicate
filesystemmetadatafrommasterserverinrealtime.ItisrecommendedtosetupatleasttwoMetaloggerservers,inadditiontomasterserver.Datafrom
Metaloggerserverscanbecopiedaroundtocreatenewmasterserver,incase,originalmasterserverislost.ChunkServersarethebulkdatastorageunitsofMFScluster.Chunk
serverscanbeaddedorremovedfromMFSclusterdynamically,atanytime.MFSclustercangrowhorizontallytoaverylargescalefilesystem,justbyaddingchunkserverslater.If
enoughreplicas(MFS
goals
perfile/folder)aresetup,thenachunkservercanberemoved,rebootedorshutdownwithoutanydisruptionforafilesystemoperation.ClientcomputersuseFUSE
drivertomountMFSvolume.Itcanalsobeauto-mountedviaautofsorviasystemlevelfstabfiles.CGIstatusserverprovidesawebbasedstatusinterface.Althoughoptional,
butabsolutelyrecommendedtoviewthehealthofMFSclusters.EitheruseMooseFSsuppliedclientserverorinstallcgipackageonanyserverrunningwebserversuchasApache.CGI
programcommunicatewithmasterserver(s)atagivennetworkporttogetastatusaboutMFScluster(s).Noadditionalpackageneeded.AsinglecgistatusserverisenoughandcancommunicatewithallMFSclusters,by
providingdifferentwebURLarguments.


Figure1:MFSCluster


MFSClusterSetup

FollowingsectionsdescribetheMooseFSfilesystemclustersetupforCentOS,Scientific
Linux(SL)oranyRHELbasedOSusingcommodityhardware.MooseFScomponentscanrunonseveralLinux/Unixflavorsandsimilarsetupcanbeappliedtothose
platformsaswell.Designconsiderations:
FollowingdesignconsiderationsaresuggestedforbetterMFSmanagementperourexperience.Theyarenotmandatoryandindividualpreferencesmayvary.Itisadvisable
torunMFSasanonrootuser.Typically,createadedicatedusermfsandgroupmfsforallMFSservers.SetupMFSmasterserverlisteningonseparateVIP(VirtualIPaddress)andonlylistenonthatVIPusing
defaultports,i.edonotuseserver'sprimaryIPaddressforMFSmasterserver.ThisisusefulwhensettingupseveralMFSclusterwithinasamenetworkandusingsinglehardwareforallMFSmasterserversusingdefaultports.Also,whenmovingmasterMFS
serverfromonephysicalservertoanother,movingVIPssetuphelpsalot.Inthiscasemetaloggerservers,chunkserversandclienthosts,don'thavetore-bindtoadifferentIPaddressofamasterserverandwaitforDNSorcache(suchasnscd)totime
out.Thismakesmigrationofmasterserver(eitherplannedoremergency)lessdisruptive.
ClienthostscanuseAutomounter(autofs)setuptomountMFSfilesystemforeasyscalabilityandsetup.Restrictrootmounts[MFS_ROOT]withfullrootprivileges
onlytoMFSmaster(oranydesignatedserver)forMFSclustermaintenanceandbettersecurity.InthefollowingexampleassumingMFSclusternamefooto
besetup.Filesystemmountpointas
/net/mfs/foo
(canbeanypathwhereclienthostcanmountnetworkfilesystem).Master
serverVIPnameis
mfsmaster-foo.example.com(withIPaddress192.168.1.10)
SourcecodeorPre-compiledbinaries:InthefollowingexamplesRPMinstallsareused,thesearedownloadedfromhttp://packages.sw.be/mfs/site.
Forsourcecodecompileandinstall,refertodocumentfromMooseFSsitehere.


CGIstatusviewerSetup

ItisadvisabletosetupCGIstatusserverfirst.LaterwhenMFSclustercomponents(master,metaloggerandchunkservers)areaddedtheycanbeverifiedfromwebGUI,easily.MooseFSalso
shipswithabasiccgihttpdserver,thatcanbeused.However,followingstepsdescribeasetupusingApachewebserver.Firstinstall
MFScgiRPM,forexamplerpm-ivhmfs-cgi-1.6.20-1.el6.rf.x86_64.rpm,
Install/etc/httpd/conf.d/mfs.conffileasbelow:
Alias/mfs/"/var/www/html/mfs/"

ScriptAlias/mfscgi/"/var/www/cgi-bin/mfs/"

<Directory"/var/www/cgi-bin/mfs/">

AllowOverrideNone

OptionsNone

Orderallow,deny

Allowfromall

</Directory>
Reloadapache.MFSclustercanthenbeaccessviaURL.http://[APACHE-SERVER-HOSTNAME]/mfscgi/mfs.cgi?masterhost=mfsmaster-foo.example.com



MasterServerSetup

InstallMFSRPM,forexamplerpm-ivhmfs-1.6.20-1.el6.rf.x86_64.rpm.
CreateaseparateVIP(VirtualIPinterface)interfaceformfsmaster-[foo].example.com.Linux
commandsuchas'ifconfigeth0:foo192.168.1.10netmask255.255.255.0up'orfile/etc/sysconfig/network-scripts/ifcfg-eth0:foo
canbeusedtocreateaVIPinterfacelikeeth0:foo.Edit
configurationfilesin/etc/mfs/area.
IfyouareplanningtorunmultipleMFSmasterserversonasinglehostformorethanoneMFSclusters(e.g.foo1,foo2,...).CreateseparateVIPs(e.g
eth0:foo1,eth0:foo2,...),correspondingconfigurationfilesunderseparatesub-directories(e.g/etc/mfs/{foo1,foo2}),distinctmastermetadatadirectories(e.g:/var/mfs/{foo1,foo2,...}/andseparatebootscriptsforeachMFScluster'smasterserver(e.g
/etc/init.d/{mfsmaster-foo1,mfsmaster-foo2,...}).mfsmaster.cfg:Master'sprimaryconfigurationfile.
#-----------------------------------------------------

#MFSmasterserverfor"foo"cluster

#-----------------------------------------------------

WORKING_USER=mfs

WORKING_GROUP=mfs

SYSLOG_IDENT=mfsmaster-foo

#LOCK_MEMORY=0

#NICE_LEVEL=-1
#Filepathforexportsdefinitionforthiscluster.

EXPORTS_FILENAME=/etc/mfs/mfsexports.cfg
#Metadataforthismasterstoredhere.

DATA_PATH=/var/mfs/#--Usedistinctdirectory,ifrunningmultiplemasterserversonthishost
#BACK_LOGS=50

#REPLICATIONS_DELAY_INIT=300

#REPLICATIONS_DELAY_DISCONNECT=3600
#--ForMetaLoggerserverconnections

#MATOML_LISTEN_HOST=*#--ChangethisdefaultfromallinterfacestojustVIPinterface.

MATOML_LISTEN_HOST=mfsmaster-foo.example.com

#MATOML_LISTEN_PORT=9419
#--ForChunkServerconnection

#MATOCS_LISTEN_HOST=*#--ChangethisdefaultfromallinterfacestojustVIPinterface

MATOCS_LISTEN_HOST=
mfsmaster-foo.example.com
#MATOCS_LISTEN_PORT=9420
#--ForClientconnection

#MATOCU_LISTEN_HOST=*#--ChangethisdefaultfromallinterfacestojustVIPinterface.

MATOCU_LISTEN_HOST=
mfsmaster-foo.example.com
#MATOCU_LISTEN_PORT=9421
#CHUNKS_LOOP_TIME=300

#CHUNKS_DEL_LIMIT=100

#CHUNKS_WRITE_REP_LIMIT=1

#CHUNKS_READ_REP_LIMIT=5

#REJECT_OLD_CLIENTS=0
#deprecated,toberemovedinMooseFS1.7

#LOCK_FILE=/var/run/mfs/mfsmaster.lock
mfsexports.cfg:Thisfiledescribes,shareexportsdefinitionforthecluster.
Client'smountrequestsarecontrolledviathisfile.Refertoanexamplefilesuppliedinthe/etc/mfsdirectory.ExportfilecanbesetuptorestrictmountsbasedonIPaddresses,limitedrootaccessandpasswordprotectedforexportedshares.
#-----------------------------------------------------------------------------

#--[MFS_ROOT]:rootlocation/ofMFSandallpathsarerelativeafterthat.

#-----------------------------------------------------------------------------

#--Onlymastercanseefull[MFS_ROOT]andwithfullrootpriv.access.

#--Mountonaseparatepathlike:

#--mfsmount/mfsfoo-Hmfsmaster-foo.example.com

192.168.1.10/rw,alldirs,ignoregid,maproot=0

#Allow"-omfsmeta".frommaster

192.168.1.10.rw
#--[MFS_ROOT]/AllowRWaccessfromall,butrootprivdisabled.

192.168.1.0/24/rw,alldirs,ignoregid,maproot=nobody
Beforestartingmasterserverforthefirsttime,seedanemptymetadatafileas,cd
/var/mfs;cpmetadata.mfs.emptymetadata.mfs.Applyproperpermissions,chown-Rhmfs:mfs/var/mfs.Test
start/stopmasterserverbyhandusingmfsmastercommand:mfsmaster-c/etc/mfs/mfsmaster.cfgstart/stop.Activatebootscript/etc/init.d/mfsmasterto
start/stopmasterserveratboot/shutdowntime.Masterserverisnowreadytoacceptconnections.ViewmasterserverstatususingCGIviewer.


MetaloggerServerSetup

Althoughoptional,butabsolutelyrecommendedtosetupatleasttwoadditionalmetaloggerservers.InstallMFSRPM,rpm
-ivhmfs-1.6.20-1.el6.rf.x86_64.rpm.Editconfigurationfilesin/etc/mfs/area.IfyouareplanningtorunmultipleMFScluster'smetaloggerserversonasingle
hostthencreateseparateconfigurationfiles,distinctmetaloggerdatadirectoriesandcorrespondingbootscriptsforeachMFScluster'smetaloggerserver.mfsmetalogger.cfg:Metalogger'sconfiguration
file.
#-----------------------------------------------------

#MFSmetalogger(backup)serverfor"foo"cluster

#-----------------------------------------------------

WORKING_USER=mfs

WORKING_GROUP=mfs

SYSLOG_IDENT=mfsmetalogger-foo

#LOCK_MEMORY=0

#NICE_LEVEL=-19
#Datadirectoryformetaloggerdata

DATA_PATH=/var/mfs#Usedistintdirectories,ifrunningseveralmetaloggersonthishost.
#BACK_LOGS=50

#META_DOWNLOAD_FREQ=24
#MASTER_RECONNECTION_DELAY=5
MASTER_HOST=mfsmaster-foo.example.com

#MASTER_PORT=9419
#MASTER_TIMEOUT=60
#deprecated,toberemovedinMooseFS1.7

#LOCK_FILE=/var/run/mfs/mfsmetalogger.lock
Applyproperpermission,chown
-Rhmfs:mfs/var/mfs.Teststart/stopmetaloggerserverbyhandusingcommand,mfsmetalogger
-c/etc/mfs/mfsmetalogger.cfgstart/stop.Activatebootscript/etc/init.d/mfsmetaloggerto
start/stopmetaloggerserveratboot/shutdowntime.Metaloggerserverisnowreadyandshouldbefetchingdatafrommasteronaperiodically.


ChunkServerSetup

Bulkofstoragedataresidesonchunkservers.Chunkserversarehighlyredundantinnatureandifenoughreplicas(goals)aresetupfordata,thenachunkservercanberemovedorrebooted
withoutanylossofdata.NumberofchunkserverscanbeaddedlateranytimedynamicallytotheMFSclusterinordertoexpandstoragecapacity.
Oneachchunkserver,installMFSRPM,forexamplerpm-ivhmfs-1.6.20-1.el6.rf.x86_64.rpm.
Editconfigurationfilesin/etc/mfs/area.ItisrecommendedtouseasinglechunkserverforonlyoneMFSclusteratatime.mfschunkserver.cfg:Chunkserverconfigurationfile.
#-----------------------------------------------------

#MFSchunkserverfor"foo"cluster

#-----------------------------------------------------

WORKING_USER=mfs

WORKING_GROUP=mfs

SYSLOG_IDENT=mfschunkserver-foo

#LOCK_MEMORY=0

#NICE_LEVEL=-19
#DATA_PATH=/var/mfs
#MASTER_RECONNECTION_DELAY=5
#BIND_HOST=*

MASTER_HOST=mfsmaster-foo.example.com

#MASTER_PORT=9420
#MASTER_TIMEOUT=60
#CSSERV_LISTEN_HOST=*

#CSSERV_LISTEN_PORT=9422
#HDD_CONF_FILENAME=/etc/mfs/mfshdd.cfg

#HDD_TEST_FREQ=10
#deprecated,toberemovedinMooseFS1.7

#LOCK_FILE=/var/run/mfs/mfschunkserver.lock

#BACK_LOGS=50

#CSSERV_TIMEOUT=5
Datachunksarestoredinadirectorycreatedonanativefilesystem(ext3,ext4,zfs,...)onachunkserver.Chunkdirectoriesareaddedtothemfshdd.cfg
file.Multipledirectories(chunkareas)canbespecified.NormallythereisnoneedtouseRAIDsetupforchunkstorage,ifgoals>1setupforallMooseFSdata.
DataredundancyishandledatMooseFSsetuplevel.Datachunksarereplicated
onseparatechunkserverstosurviveacompletehardwarefailureofasinglechunkserveritself.Forcriticaldataitisrecommendedtosetupgoals=>3orsetupappropriateRAIDonchunkserverstoachievehigherlevelofredundancy.mfshdd.cfg:Chunkstoragedatadirectories#--mountpointsofHDDdrives.
/data/sda3/mfschunk
/data/sdb4/mfschunkApplyproperpermissions,chown
-Rhmfs:mfs/var/mfs.Teststart/stopchunkserverbyhandusingcommand,mfschunkserver
-c/etc/mfs/mfschunkserver.cfgstart/stop.Activatebootscript/etc/init.d/mfschunkserverto
start/stopchunkserveratboot/shutdowntime.Chunkserverisnowreadyandshouldbecommunicatingwithmasterserver.VerifyviaCGIstatusviewer.
Itisnotedthat,atthistimethereisnoinbuiltmechanismtopreventunauthorizedchunkserverstojoinMFSclusterviamasterserver.Intheoryanynon-rootuseron
thenetworkcansetupachunkserver,pointtoaMasterserverandjoinexistingMFScluster.Although,itmakeswholeprocessveryconvenient,whenaddinganewchunkserver,howeveritpresentsalittleoperationalrisk.Hopefullyinthefuturerelease,
MooseFSwouldhavesomeauthenticationmechanismbetweenchunkandmasterserverstoavoidsecurityrisksandoperationalaccidents,especiallywhenrunningmultipleMFSclustersinthesamenetworksegment.


MountingFilesystemonClientHosts:

Installmfs-clientRPM,forexamplerpm-ivhmfs-client-1.6.20-1.el6.rf.x86_64.rpm.Also
installfuseandfuse-libs.ForRHEL,CentOStypeOS,yuminstallfusefuselibsshould
installthose.ClientscannowinstallshareofMFSclusterfooby
followingoneofthesemethodsbelow.Mountfilesystemto/net/mfs/foolocally,usingmfsmountcommand.
[root]#mfsmount/net/mfs/foo-Hmfsmster-foo.example.com-omfssubfolder=/
Mountfilesystemusing/etc/fstabfile.Addanentrylikebelowandthenrunacommand,mount/net/mfs/fooNote:Duringbootprocess,MFSclustermayinitiatelaterthanfilesystemmountattemptfrom/etc/fstabfile.Oneoftheworkaroundistoputalinelike'mount-a'inthe/etc/rc.localfile,
thatisexecutedasalaststepinthebootprocess.ThiswillensuretomountMFSshareacrossreboots.
mfsmount/net/mfs/foofusemfsmaster=mfsmster-foo.example.com,mfssubfolder=/00
MountfilesystemusingAutomounter(Autofs).cd/net/mfs/foo,shouldauto-mountMFSfileshare.
File/etc/auto.master::/net/mfs/etc/auto.mfs--timeout=120
File/etc/auto.mfs::foo-fstype=fuse,mfsmaster=mfsmaster-foo.example.com,mfssubfolder=/\:mfsmount


MFSClusterMaintenanceandOperation

OnceMFSclusterisupandrunningitrequiresverylittlemaintenanceeffort.Followingsectiondescribesfewoperationtips.


Goals(Replicationsetup)

Foldersandfilesreplication(orgoals)canbesetupataverygranularlevel.Usecommandmfsgetgoalandmfssetgoalforgoalssetup.MFSensurestokeepreplicatedchunksondifferentphysical
servers,thusfolderssetupwithNgoals,wouldhaveN-1redundancyatchunkserverslevel.Bydefaultgoalssettingsareinheritedfromtheparentfolder,butindividualfilesandfolderscanbesetup
withdifferentgoals.MFSclustermaintainsgoallevelonthedatachunksatthetimeofwriting,anditalsore-balancestimetotimetoensureN-1levelredundancyatachunkserverlevel.Incaseachunkserverdisappears,MFSclusterwouldstartreplicating
chunkstoattaingoallevelsetforthedata.CGIwebinterfacedisplaysthechunksstatusmatrixwithanyundergoal(orange)chunks.Datachunkswithzerogoal(red)isafatalconditionandindicatesthat,somechunksarenotavailableatthemoment.File
systemwillthrowanI/Oerrorwhenadatahittingthosemissingchunksarerequested.Thiscanhappenifmorethandesiredchunkserversbecomeoffline.Normallybringingthosechunkserversonlinewillbringbackfilesysteminhealthycondition.Itishighly
recommendedtokeepaneyeonCGIwebinterfacewhiledoingchunkservermaintenancelikerebooting,replacingorretiringchunkservers.
[root]#mfsgetgoal/net/mfs/foo

/net/mfs/foo/:1
[root]#mfsgetgoal-r2/net/mfs/foo<==Settingupgoals=2recursivelyonexistingfilesandfolders.

2:

/net/mfs/foo:

fileswithgoal2:10

directorieswithgoal2:3


Trashbinanddataquarantinetimesetup

Accidentsdohappen.Apowerful/bin/rm-rf*commandisafriendandafoe.Lastnightbackupcanonlygivesomerelief,butstillmaycauseadayworthofeffortlost.MooseFSenablesatrashbinsetupforall
folders.BydefaultsallfilesdatadeletedfromMFSclusteriskeptfor1day.Trashtimecanbesetuponindividualfolderorfiles,likegoalssetupbyusingcommandsmfsgettrashtimeandmfssettrashtime.Whenfileisdeleted,ametadatarelatedtofile
iskeptinaspecial.(META)areaandrelateddatachunksareleftonthechunkserverforaspecifiedperiodoftime.Administratorhastomountspecialsharewithoption-omfsmetatoaccesstrashmetadata.Inordertoperformundeleteoperation,move
relatedmetadatafiletotrash/undelareaandMFSclusterwillbringrelateddatabackonline.
TorecoverdeleteddatawithinspecificedLogintoAuthorizedserver.


[root]#mfsmount/mnt/mfsmeta-foo-omfsmeta-Hmfsmaster-foo.example.com<==willmount.(META)areaon/mnt/mfsmeta-foo

[root]#cd/mnt/mfsmeta-foo/trash


Findafilepathtobeun-deletedandmovethatentrytotrash/undelarea.


Metadatabackup,recoveryandredundancy

MFSmetadataredundancyisasweetandsourexperienceforoperationsfolks.MFSmetadatastoragefilesaresimpleandstoredinasingledirectory.Metadatacanbemovedaroundmanuallyandstandingupanewmasterserveris
veryeasyandtakesonlyfewminutes.BuiltinautomaticfailoverredundancyisahighlydesiredfeatureforanEnterprisefilesystemclustersetup.Currently,thirdpartysolutionssuchasUCARPorLinuxheartbeatwithDRBDsetupmayfillthisgap.These
solutionsmaybeanoverkillandpronetofalsepositivesforMooseFSsetup,comparedtohowquickandeasyitis,tomanuallyrecovermastermetadataserver.Howeverfor24x7operations,anautomatedfailoverredundancyishardtoavoidfeature.Agoodnews
isthatMooseFSdevelopershaveindicatedthat,itisontheradarforfuturereleases.


Fewthingstoknow

Currently(MooseFS1.6.20),globalPOSIXfilelockingisnotsupportedlikeNFSsupportsataserverlevel.HoweverMFSviaFUSEsupportskernellevelPOSIXfilelockingwith
inthesameOSkernel.Itlookslikeauthorisplanningtosupportglobalfilelockinginthefuturereleases.Alsoitseems,O_DIRECTfilesystemcallisnotsupportedeither,thisismoreofaFUSEdriverissue.ddcommandlikebelowwouldfailwitherror.(straceofsuchprocessshows
systemcallO_DIRECTfails.
[root]#ddif=/dev/zeroof=testfilebs=1024count=10oflag=direct

dd:opening`testfile':Invalidargument
straceofabovecommandfurthershowed:

open("test1",
O_WRONLY|O_CREAT|O_TRUNC|O_DIRECT,0666)=-1EINVAL(Invalidargument)
Ifyouarewritingyourownapplicationthenthesearenotamajorfactorasworkaroundscanbeused.Howeverifanythirdpartyapplicationalreadyusingthesecalls,theymayexhibitstrangebehaviorwhenusingFUSEbasedmounts
suchasusedbyMooseFSandseveralotherfilesystems.MFSuses64KBforblocksizeand64MB(max)forchunkfilesizeonthedisk.Thesearehardcodedlimitsandseemstoworkverywellingeneral.IfyouareplanningtouseMFSclusterforlotsofsmallfiles,suchasforcodedevelopment,
versioncontrol(Subversion,CVSrespository)etc,thenyouwillnoticeaspikeinthemetadatasizeonthemasterserver.HavingadditionalRAMtosupportlargermetadataonamasterserverwouldsurelyhelpinthiscase.


OtherGoodies

MostoftheMooseFScommandsstartwithmfs....MooseFShassomeverynicesetofcommandssuchasmfsdirinfoandmfsfilefiletogetinstantfilesystemmetadata.Givenacopy
ofmetadetaiscachedinRAMonmastermetadataserver,evenacommandlikedu-hsisextremelyfastcomparedtoseveralnativefilesystems.Removalofchunkservercanbeperformedsimplybyshuttingdownthemfschunkserverprocessandthatisimmediately
reflectedintheCGIwebgui.RecoveryofMooseFSisgenerallyveryquickincaseofnetworkblipormasterservercrash/rebootcycle.ClientswouldsimplyretryandassoonasallnecessaryMooseFScomponentsareonline,filesystemwillrespondrightaway,
whichisveryimpressive.Retiringchunkserverorchunkareafromaparticularchunkserverisaveryeasyprocess.Inordertoremovechunkstoragearea,simplymarkafilesystemareawith*(asterisk)
infrontofdirectorypath(e.g:*/data/sda3/mfschunktoretire/data/sda3/mfschunkstoragearea)andrestartmfschunkserver.MooseFSclusterthenreplicatedesiredchunkstootherareasandprepareforstoragearearemoval,whilekeepingdesiredgoals
levelforfilesandfolders.Incaseofstoragechunkarealossduetoharddiskfailureorsuch,simplyprepend#(hash)infrontofchunkstorageareaandstartmfschunkserver.


MFSMetadataMaintenance(DisasterRecovery)

AnyoneadministeringMooseFSclustermustunderstandhowtorecoverMFSmetadataincaseofhardwarecomponentfailureandsuddencrashofasystem.Withoutassociatedmetadata,
datastoredonthechunkserversarenothingbutaheapoftrash.Itisveryimportanttoguardametadataandkeepenoughbackupprovisiontorecoverfilesystemincaseofemergency.


FewBasicsaboutmetadata

metadata.mfs:Thisfileiscreatedonamasterserver,whenamasterserverprocessisshutdowngracefully.Allactivemetadataincludinganypendingchangelogsarewrittentothisfile.Itcontainsafullset
ofmetadataatthetimeofgracefulshutdown.Thisistheonlyfilerequiredatthemastermetadataserverstartupnexttime.Thisfileisnotpresentwhenmasterdataserverisrunningnormally.metadata.mfs.back:Whenmasterserverisstarted,itwillreadafilemetadata.mfsandinitiatemetadata.mfs.backfilefromitslaststate.Duringnormaloperationonamasterserver,allpendingchangelogs
arewrittentothisfileonanhourlybasisatthetopofthehour.Itishighlyrecommendedtomakeanhourlycopy,preferablyat[HH]:30andkeeplast24copiesofthisfileandpossiblebackupawayfromMastersrever.Inaveryrarecircumstances,ifmaster
serverormetaloggerserversarebeyondrecoveryforanyreason,thesehourlybackedupfilescanbeusedtorecoverMFSFilesystemtoitslastknowngoodstate.changelogs.mfs*:Thesearechangelogfileswrittentothedisktimetotime.Thesechangelogsaremergedandwrittentometadata.mfs.backfileonanhourlybasis.Backupmetaloggerssyncthesefilesandkeep
themselvesuptodateonaregularbasis.


Recoveryfromcrashedmasterserver

Whenamasterserverprocessgotkilledunexpectedly,itisleftwithmetadata.mfs.backfileandpendingchangelogs*filesthosearenotmergedyet.Atthenextstartup,masterserverwouldfailbecauseofmissingmetadata.mfsfile.Runmfsmetarestore
commandonthemasterservertomergeallthependingchangesmanuallyandthenstartmasterserver.
mfsmetarestore-a-d/var/mfs#--/var/mfsisalocationwheremastermetadataarewritten
Itishighlyrecommendedtoalwaysgracefullyshutdownmasterserver,wheneverpossible.


Movingmastermetadataserverinaplannedmanner.

Followthesesteps,whenmovingmastermetadataserverfromoneservertoanother.First,shutdownallmetaloggerservers,chunkservers(ifpossible,althoughnotrequired)andthenshutdownold
mastermetadataserver.Ensuremasterserverisgracefullyshutdown.PrepareanewmasterserverbymovingaVIP(VirtualIP)tothenewserver.Copymastermetadataserverdirectorytothenewserverandstartmastermetadataserveronanewhost.Startmetalogger
andchunkserversaswell.Note:UsingVIP(virtualIPaddress)formastermetadataserverandmovingitaroundhasagreatadvantage.ChunkserversandclienthostsearliermountingMFSvolumewillbeabletojoinback
withmuchease,inthiscase.IfyouchangeIPaddressofmastermetadataserver,thenchunkserversandclienthostsmayexperiencealongdisruptionduetoDNScaching,nscd,failedmounttoanoldIPetc.


RecoveryfromMetaloggerserverdata.

Inanevent,whenmastermetadataserverhardwareiscompletelylostandunabletoretrievelatestmastermetadatafromit,metadatafromMetaloggerserverscanbe
usedtorecover.InthiscaseprepareanewmasterMetadataserverbymovingitsdesignatedVIP.CopyallMetaloggerdatatothenewserveratsometemporarylocation(e.g:/tmp/metalogger/)andthenrunafollowingcommandtorecreate[MasterDir]/metadata.mfs
file.
mfsmetarestore-o/var/mfs/metadata.mfs-m/tmp/metalogger/metadata.mfs.back
/tmp/metalogger/changelog_ml.*.mfsOncecompleted,startMastermetadataserver.ThiswillbringtheMFSFilesystemtothelateststatecachedbytheMetaloggerserversatthetime,whenMasterserverwentdown.


Additionalcopiesofmetadata.

Metadataisaverycriticalassetandthereareeveryreasontogetparanoidaboutnotloosingmetadata.Twoormoremetaloggersarehighlyrecommenedforsure,inadditiontothatscriptlikebelow
canalsokeepanhourlycopyofametadataforhistoricalpurpose.Theseoldersnapshotsmaybehelpfulifyouwanttomovebackintimeanduseoldermetadata(e.gcurrentmetadatagotcorruptedandalsocopiedovermetalogger).Usinghistoricalmetadata
won'tbringeverythingcurrent,butmaybeabletobringfilesystemuptolastknowngoodstate.Obviouslycorrespondingchunksmustbepresentonthechunkserverstomatchupwithmetadataoryouwillnoticesomeerroraboutmissingchunks.
#!/bin/sh
#-----------------------------------------------------------------#Backingupmastermetalogslocally.Runthisscript#fromcronjobfrequently(ateveryHOUR:30orso),tocapture#frequentsnapshots.Thiswillkeeplast24snapshotsofmetadata#andoverwriteafterthat.Foradditionalprotection,backupthis#datausingyourbackupsystemfrequently.
#-----------------------------------------------------------------MFS_BASE="/var/mfs"MFS_LOCALBACKUP="$MFS_BASE/LocalBackup"
#--Assumingactivemastermetalogsarestoredas#--$MFS_BASE/metadata.mfs.back
DATE=`date`CURRENT_METADATA="$MFS_BASE/metadata.mfs.back
if[!-d${MFS_LOCALBACKUP}];thenmkdir$MFS_LOCALBACKUPif[$?!=0];thenecho"Oops!Cannotcreate'$MFS_LOCALBACKUP'directory.Aborting..."exit1fifi
if[-f$CURRENT_METADATA];thenecho"$DATE:Backingupmetadata$CURRENT_METADATA"HOUR_NUM=`date+%0H`#--copyandreplacepreviousfile.cp-u$CURRENT_METADATA$MFS_LOCALBACKUP/metadata.mfs-hour:$HOUR_NUMfi


Speedingupchunkreplicationandre-balancing.

BydefaultMooseFSyieldshigherI/OtothefilesystemoperationsforclientsandusesverylittleI/Oforchunkreplicationandre-balancing.Thisismostlypreferableinnormaloperation.Howeverwhenyoureplaceanexisting
chunkserver,thensomeofthechunksmaybeundergoalandneedaquickattention.IfyouwanttospeedupchunkreplicationprocessbysacrificingI/Oforotherfilesystemoperations,thentweakthefollowingtwoparametersinmfsmaster.cfgfileonamaster
serverandrestartmasterserverprocess.YoumayhavetoexperimentlittlebittofindacorrectbalancebetweenchunkreplicationrateandavailableI/Oforotherfilesystemoperations.
CHUNKS_WRITE_REP_LIMIT=5#defaultvalue1

CHUNKS_READ_REP_LIMIT=25#defaultvalue5


Wrappingup

Someofthemissing,butdesiredfeaturesareworthmentioninghere.NativeredundancyatMastermetadataserverishighlydesiredfeature.CurrentlyonecanuseuCARPandDRBDwithheartbeatsolutions,howeveritwould
benicetoseeabuiltinredundancyinthefuturereleases.
Althoughcgi-binwebinterfaceissufficientformostinformation,howevermorecommandlineoptionstogatherinformationaboutMooseFSwouldbenice,soonecanuseinscriptsandformonitoringsystem.
GlobalPOSIXfilesystemlockingwouldbenicetohaveandlookslikepromisedforthenextrelease.Advancedfeaturessuchasfilesystemcompression,encryptionanddatade-duplicationarealsoimportantformanydatacenterenvironments.Currentlysomeofthesefeaturescanbeusedwithnativefilesystemssuchas(ext4,
zfsorlessfs).Therearetwozfs-linuxportsinworksaswellzfs-fuseandnative
zfsforlinuxkernel.BtrfsissupposetobenextLinuxfilesystem,worthkeepinganeye.


Conclusion.

InourexperienceMooseFSturnedouttobeagreatfilesystem.It'ssimplicityandresiliencysurpassesmanyotherdistributednetworkfilesystems.MooseFScomponentssuchasMasterserverandchunkserverscanberestarted
gracefullywithoutsignificantissuesontheclientsidemountingMFSshares.MooseFSisabletorecoververywellifmorethanonecriticalcomponentsaredownatagivenpointandcomebackaftercertaintime.ReplicationandTrashquarantinesetupona
verygranularlevelissuperb.Forexampleonecansetupaveryhighreplicationfactor(saygoals=4)foramissioncriticaldatafolderthusallowingmorechunkserversfailureswithoutaffectingtheavailabilityofsuchdata,whilekeepinga(goals=2)
forlesscriticaldatatoutilizediskspacebetter.Cgi-binstatuswebinterfaceisgreatandgivelotsofinformationneededforafilesystemoperation.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: