Trafodion Troubleshooting 常用命令
2016-10-27 12:16
267 查看
1 sqstart(启动数据库,支持热启动和冷启动)
[trafodion@cent-1 scripts]$ sqstart -h Usage: sqstart { -w | -c | -r } -x -w Perform a Warm Start -c Perform a Cold Start -r Perform a Cold Start with volume recovery -x Perform Extra Checks (chk_dbperms) before allowing startup [trafodion@cent-1 scripts]$ sqstart Checking orphan processes. Checking if HBase is available -------------------------------------- executing: hbcheck ... Checking if peers are available -------------------------------------- executing: check_peers ... Executing sqipcrm (output to sqipcrm.out) Starting the SQ Environment (Executing .../sql/scripts/gomon.cold) Background SQ Startup job (pid: 55893) *** Checking Trafodion Environment *** ... Starting the DCS environment now ... The Trafodion environment is up! Process Configured Actual Down ------- ---------- ------ ---- DTM 2 2 RMS 4 4 DcsMaster 1 1 DcsServer 2 2 mxosrvr 4 0 4 Starting the REST environment now ...
2 sqstop(关闭数据库,支持abrupt模式和immediate模式)
[trafodion@cent-1 ~]$ sqstop -h Usage: sqstop [ abrupt | immediate ] This command is used to perform a shutdown of the SQ environment. If a parameter is not specified, then a normal shutdown is performed. [trafodion@cent-1 ~]$ sqstop Shutting down the REST environment now Shutting down the DCS environment now Shutting down (normal) the SQ environment! [$Z00175H] %ps [$Z00175H] NID,PID(os) PRI TYPE STATES NAME PARENT PROGRAM [$Z00175H] ------------ --- ---- ------- ----------- ----------- --------------- [$Z00175H] 000,00023532 000 WDG ES--A-- $WDG000 NONE sqwatchdog [$Z00175H] 000,00023533 000 PSD ES--A-- $PSD000 NONE pstartd [$Z00175H] 000,00023548 001 GEN ES--A-- $TSID0 NONE idtmsrv [$Z00175H] 000,00023554 001 GEN ES--A-- $CMON NONE service_monitor [$Z00175H] 000,00023670 001 GEN ES--A-- $NMON0 NONE service_monitor [$Z00175H] 000,00023774 001 DTM ES--A-- $TM0 NONE tm [$Z00175H] 000,00027218 001 GEN ES--A-- $ZSC000 NONE mxsscp [$Z00175H] 000,00027707 001 SSMP ES--A-- $ZSM000 NONE mxssmp [$Z00175H] 000,00051642 001 GEN ES--A-- $Z00175H NONE shell [$Z00175H] 001,00012526 000 PSD ES--A-- $PSD001 NONE pstartd [$Z00175H] 001,00012525 000 WDG ES--A-- $WDG001 NONE sqwatchdog [$Z00175H] 001,00012539 001 GEN ES--A-- $NMON1 NONE service_monitor [$Z00175H] 001,00012754 001 DTM ES--A-- $TM1 NONE tm [$Z00175H] 001,00015350 001 GEN ES--A-- $ZSC001 NONE mxsscp [$Z00175H] 001,00015557 001 SSMP ES--A-- $ZSM001 NONE mxssmp [$Z00175H] 001,00019503 001 GEN ES--A-- $Z010FX8 NONE mxosrvr [$Z00175H] 001,00020602 001 GEN ES--A-- $Z010GTM NONE mxosrvr [$Z00175H] %shutdown Shutdown in progress Stopping OpenTSD... Stopping Tcollector... Stopping DBMgr... Stopping Bosun...
3 sqcheck/trafcheck(检查数据库相关进程)
[trafodion@cent-1 scripts]$ sqcheck -h Usage: sqcheck [ -c <cc> | -i <nn> | -d <nn> | -h | -f | -q | -v | -r | -j ] -i <nn> Number of times the check for Trafodion processes be done (Default 2) -d <nn> Duration of sleep (in seconds) between each check (Default 1) -c <cc> Which component to check: [all | dtm | dcs | rms ] (Default all) -f fast - (it's like running .../sql/scripts/sqcheck -i 1 -d 0) -r reset the iteration counter if the process count increases (as compared to the count in the last iteration) -j Format the output in JSON. Used by REST server -h Help -v Verbose -q Quiet [trafodion@cent-1 scripts]$ sqcheck *** Checking Trafodion Environment *** Checking if processes are up. Checking attempt: 1; user specified max: 2. Execution time in seconds: 0. The Trafodion environment is up! Process Configured Actual Down ------- ---------- ------ ---- DTM 2 2 RMS 4 4 DcsMaster 1 1 DcsServer 2 2 mxosrvr 4 4
4 dcscheck(检查DCS服务)
[trafodion@cent-1 scripts]$ dcscheck Zookeeper listen port: 2181 DcsMaster listen port: 23400 Configured Primary DcsMaster: "cent-1" Active DcsMaster : "cent-1" Process Configured Actual Down --------- ---------- ------ ---- DcsMaster 1 1 DcsServer 2 2 mxosrvr 4 4
5 dcsstart(启动DCS服务)
[trafodion@cent-1 scripts]$ dcsstart *** Checking Trafodion Environment *** Checking if processes are up. Checking attempt: 1; user specified max: 2. Execution time in seconds: 0. The Trafodion environment is up! Process Configured Actual Down ------- ---------- ------ ---- DTM 2 2 RMS 4 4 DcsMaster 1 0 1 DcsServer 2 0 2 mxosrvr 4 0 4 Starting the DCS environment now
6 dcsstop(停止DCS服务)
[trafodion@cent-1 scripts]$ dcsstop Shutting down the DCS environment now stopping master. cent-2: stopping server. cent-1: stopping server.
7 rmscheck(检查RMS服务)
[trafodion@cent-1 scripts]$ rmscheck Timestamp Id Status 2016-10-27 06:23:09.218429 Node 0 OK 2016-10-27 06:23:09.241041 Node 1 OK
8 rmsstart(启动RMS服务)
[trafodion@cent-1 scripts]$ rmsstart exec {nowait, nid 0, name $ZSC000, out stdout_ZSC000 } mxsscp ... exec {type ssmp, nowait, nid 0, name $ZSM000, out stdout_ZSM000 } mxssmp ...
9 rmsstop(停止RMS服务)
[trafodion@cent-1 scripts]$ rmsstop !Stop the SSMP processes ... !Stop the SSCP processes ...
10 hbcheck(检查Metadata)
[trafodion@cent-1 scripts]$ hbcheck -h Usage: hbstatus [[-h] | [-m] | [-p] <peer id> | [-t] <table name>] -h : Help (this output). -m : Check the status of Trafodion metadata tables. -p <peer id> : Defaults to the local cluster. -t <table name> : Check the status of the provided table. -v : Verbose output. [trafodion@cent-1 scripts]$ hbcheck -v ZooKeeper Quorum: cent-2.novalocal, ZooKeeper Port : 2181 HBase is available! HBase version: 1.0.0-cdh5.4.8 HMaster: cent-2.novalocal,60000,1477058112339 Number of RegionServers available:2 RegionServer #1: cent-2.novalocal,60020,1477058112366 RegionServer #2: cent-1.novalocal,60020,1477058082327 Number of Dead RegionServers:0 Number of regions: 115 Number of regions in transition: 0 Average load: 57.5
11 sqcollectlogs(收集相关日志)
[trafodion@cent-1 scripts]$ sqcollectlogs -h Usage: sqcollectlogs {-a | -d | -z | -h} -a Collect all (logs and pstacks). Default is logs only -c Compress (tar-zip) the logs/stdout files on a per-node basis and move the tar-zip files to the collection node. To extract these files, execute the xtract_logs program that can be found in the logs collection directory. On a cluster, recommend executing xtract_logs on the head node. -d After zipping, delete the directory where the logs were collected. Note: This option (-d) only applies if the -z option is also used. -z tar/zip up the directory (containing the collected logs/pstacks). Default is not to tar/zip -h Help [trafodion@cent-1 scripts]$ sqcollectlogs -a -z Collection in progress... Collecting monitor pstacks [1] 54909 Collecting mxosrvr pstacks [2] 54910 Collecting tdm_arkcmp pstacks [3] 54911 Collecting tdm_arkesp pstacks [4] 54912 Collecting tm pstacks [5] 54913 [1] Done $SQPDSHA "sqnpstack monitor $PWD/monitor" 2> /dev/null [3] Done $SQPDSHA "sqnpstack tdm_arkcmp $PWD/tdm_arkcmp" 2> /dev/null [5]+ Done $SQPDSHA "sqnpstack tm $PWD/tm" 2> /dev/null [2]- Done $SQPDSHA "sqnpstack mxosrvr $PWD/mxosrvr" 2> /dev/null [4]+ Done $SQPDSHA "sqnpstack tdm_arkesp $PWD/tdm_arkesp" 2> /dev/null Logs and pstacks collected in /home/trafodion/logs/sqinfo.20161027_0636 Creating a tar/zip file... Created the .tgz file: /home/trafodion/logs/sqinfo.20161027_0636.tgz
12 reststart(启动REST服务)
[trafodion@cent-1 scripts]$ reststart *** Checking Trafodion Environment *** Checking if processes are up. Checking attempt: 1; user specified max: 2. Execution time in seconds: 0. The Trafodion environment is up! Process Configured Actual Down ------- ---------- ------ ---- DTM 2 2 RMS 4 4 DcsMaster 1 1 DcsServer 2 2 mxosrvr 4 4 Starting the REST environment now
13 reststop(关闭REST服务)
[trafodion@cent-1 scripts]$ reststop Shutting down the REST environment now stopping rest.
14 restcheck(检查REST服务)
[trafodion@cent-1 scripts]$ restcheck TrafodionRest listen port : 4200 TrafodionRest is up on node: cent-1,pid: 28467 Process Actual --------- ------ TrafodionRest 1
15 lobstart(启动LOB服务)
[trafodion@cent-1 scripts]$ lobstart Starting lob server processes Successfully started $zlobsrv0 Successfully started $zlobsrv1
16 lobstop(关闭LOB服务)
[trafodion@cent-1 scripts]$ lobstop stopped $zlobsrv0 stopped $zlobsrv1
17 sqps(显示当前正在执行的进程)
[trafodion@cent-1 scripts]$ sqps [$Z000SUY] %ps [$Z000SUY] NID,PID(os) PRI TYPE STATES NAME PARENT PROGRAM [$Z000SUY] ------------ --- ---- ------- ----------- ----------- --------------- [$Z000SUY] 000,00056370 000 WDG ES--A-- $WDG000 NONE sqwatchdog [$Z000SUY] 000,00056371 000 PSD ES--A-- $PSD000 NONE pstartd [$Z000SUY] 000,00056388 001 GEN ES--A-- $TSID0 NONE idtmsrv [$Z000SUY] 000,00056394 001 GEN ES--A-- $CMON NONE service_monitor [$Z000SUY] 000,00056510 001 GEN ES--A-- $NMON0 NONE service_monitor [$Z000SUY] 000,00056584 001 DTM ES--A-- $TM0 NONE tm [$Z000SUY] 000,00030812 001 GEN ES--A-- $Z000Q5C NONE java [$Z000SUY] 000,00034275 001 GEN ES--A-- $Z000SZA NONE mxosrvr [$Z000SUY] 000,00035473 001 GEN ES--A-- $Z000TYI NONE mxosrvr [$Z000SUY] 000,00041871 001 GEN ES--A-- $ZSC000 NONE mxsscp [$Z000SUY] 000,00041954 001 SSMP ES--A-- $ZSM000 NONE mxssmp [$Z000SUY] 000,00030358 001 GEN ES--A-- $ZLOBSRV0 NONE mxlobsrvr [$Z000SUY] 000,00034123 001 GEN ES--A-- $Z000SUY NONE shell [$Z000SUY] 001,00044580 000 WDG ES--A-- $WDG001 NONE sqwatchdog [$Z000SUY] 001,00044581 000 PSD ES--A-- $PSD001 NONE pstartd [$Z000SUY] 001,00044593 001 GEN ES--A-- $NMON1 NONE service_monitor [$Z000SUY] 001,00044794 001 DTM ES--A-- $TM1 NONE tm [$Z000SUY] 001,00014474 001 GEN ES--A-- $Z010BTJ NONE mxosrvr [$Z000SUY] 001,00015554 001 GEN ES--A-- $Z010CPE NONE mxosrvr [$Z000SUY] 001,00023248 001 GEN ES--A-- $ZSC001 NONE mxsscp [$Z000SUY] 001,00023252 001 SSMP ES--A-- $ZSM001 NONE mxssmp [$Z000SUY] 001,00008542 001 GEN ES--A-- $ZLOBSRV1 NONE mxlobsrvr
18 pstat(显示当前节点Trafodion所有核心进程)
[trafodion@cent-1 scripts]$ pstat trafodion 56388 56203 futex_ 7960 108024 00:00:30 SNl idtmsrv SQMON1.1 00000 00000 056388 $TSID0 192.168.0.16:51336 00004 00000 00006 SPARE trafodion 56203 1 hrtime 38460 377980 00:02:19 Ssl ../export/bin64/monitor COLD trafodion 30358 56203 futex_ 37404 332480 00:00:00 SNl mxlobsrvr SQMON1.1 00000 00000 030358 $ZLOBSRV0 192.168.0.16:51336 00004 00000 00468 SPARE trafodion 34275 33902 futex_ 42188 386484 00:00:34 Sl mxosrvr -ZKHOST cent-2.novalocal:2181 -RZ cent-1.novalocal:1:1 -ZKPNODE /trafodion -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 192.168.0.16 -MAXHEAPPCT 0 -STATISTICSINTERVAL 60 -STATISTICSLIMIT 60 -STATISTICSTYPE aggregated -STATISTICSENABLE true -SQLPLAN true -PORTMAPTOSECS -1 -PORTBINDTOSECS -1 -PUBLISHSTATSTOTSDB false -OPENTSDURL localhost:5242 trafodion 35473 35027 futex_ 42188 386484 00:00:33 Sl mxosrvr -ZKHOST cent-2.novalocal:2181 -RZ cent-1.novalocal:1:2 -ZKPNODE /trafodion -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 192.168.0.16 -MAXHEAPPCT 0 -STATISTICSINTERVAL 60 -STATISTICSLIMIT 60 -STATISTICSTYPE aggregated -STATISTICSENABLE true -SQLPLAN true -PORTMAPTOSECS -1 -PORTBINDTOSECS -1 -PUBLISHSTATSTOTSDB false -OPENTSDURL localhost:5242 trafodion 41871 56203 futex_ 37948 399064 00:00:17 SNl mxsscp SQMON1.1 00000 00000 041871 $ZSC000 192.168.0.16:51336 00004 00000 00250 SPARE trafodion 41954 56203 futex_ 41772 409564 00:00:18 SNl mxssmp SQMON1.1 00000 00000 041954 $ZSM000 192.168.0.16:51336 00011 00000 00255 SPARE trafodion 56371 56203 futex_ 5016 114808 00:00:35 Sl pstartd SQMON1.1 00000 00000 056371 $PSD000 192.168.0.16:51336 00012 00000 00002 SPARE trafodion 56394 56203 futex_ 7924 97768 00:00:32 SNl service_monitor SQMON1.1 00000 00000 056394 $CMON 192.168.0.16:51336 00004 00000 00007 SPARE -t 60 -f cluster_monitor.cmd trafodion 56510 56203 futex_ 7924 97768 00:00:31 SNl service_monitor SQMON1.1 00000 00000 056510 $NMON0 192.168.0.16:51336 00004 00000 00008 SPARE -t 60 -f node_monitor.cmd trafodion 56370 56203 futex_ 4552 133856 00:00:38 Sl sqwatchdog SQMON1.1 00000 00000 056370 $WDG000 192.168.0.16:51336 00005 00000 00001 SPARE trafodion 56584 56203 futex_ 278304 2670664 00:02:21 SNl tm SQMON1.1 00000 00000 056584 $TM0 192.168.0.16:51336 00002 00000 00011 SPARE
19 cstat(显示集群Trafodion所有核心进程)
[trafodion@cent-1 scripts]$ cstat cent-2: uid pid ppid wchan rss vsz time stat cmd cent-2: --- --- ---- ----- --- --- ---- ---- --- cent-1: uid pid ppid wchan rss vsz time stat cmd cent-1: --- --- ---- ----- --- --- ---- ---- --- cent-1: trafodion 56388 56203 futex_ 7960 108024 00:00:30 SNl idtmsrv SQMON1.1 00000 00000 056388 $TSID0 192.168.0.16:51336 00004 00000 00006 SPARE cent-1: trafodion 56203 1 hrtime 38460 377980 00:02:20 Ssl ../export/bin64/monitor COLD cent-1: trafodion 30358 56203 futex_ 37404 332480 00:00:01 SNl mxlobsrvr SQMON1.1 00000 00000 030358 $ZLOBSRV0 192.168.0.16:51336 00004 00000 00468 SPARE cent-1: trafodion 34275 33902 futex_ 42188 386484 00:00:34 Sl mxosrvr -ZKHOST cent-2.novalocal:2181 -RZ cent-1.novalocal:1:1 -ZKPNODE /trafodion -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 192.168.0.16 -MAXHEAPPCT 0 -STATISTICSINTERVAL 60 -STATISTICSLIMIT 60 -STATISTICSTYPE aggregated -STATISTICSENABLE true -SQLPLAN true -PORTMAPTOSECS -1 -PORTBINDTOSECS -1 -PUBLISHSTATSTOTSDB false -OPENTSDURL localhost:5242 cent-1: trafodion 35473 35027 futex_ 42188 386484 00:00:33 Sl mxosrvr -ZKHOST cent-2.novalocal:2181 -RZ cent-1.novalocal:1:2 -ZKPNODE /trafodion -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 192.168.0.16 -MAXHEAPPCT 0 -STATISTICSINTERVAL 60 -STATISTICSLIMIT 60 -STATISTICSTYPE aggregated -STATISTICSENABLE true -SQLPLAN true -PORTMAPTOSECS -1 -PORTBINDTOSECS -1 -PUBLISHSTATSTOTSDB false -OPENTSDURL localhost:5242 cent-1: trafodion 41871 56203 futex_ 37952 399064 00:00:17 SNl mxsscp SQMON1.1 00000 00000 041871 $ZSC000 192.168.0.16:51336 00004 00000 00250 SPARE cent-1: trafodion 41954 56203 futex_ 41780 409564 00:00:18 SNl mxssmp SQMON1.1 00000 00000 041954 $ZSM000 192.168.0.16:51336 00011 00000 00255 SPARE cent-1: trafodion 56371 56203 futex_ 5016 114808 00:00:35 Sl pstartd SQMON1.1 00000 00000 056371 $PSD000 192.168.0.16:51336 00012 00000 00002 SPARE cent-1: trafodion 56394 56203 futex_ 7924 97768 00:00:32 SNl service_monitor SQMON1.1 00000 00000 056394 $CMON 192.168.0.16:51336 00004 00000 00007 SPARE -t 60 -f cluster_monitor.cmd cent-1: trafodion 56510 56203 futex_ 7924 97768 00:00:31 SNl service_monitor SQMON1.1 00000 00000 056510 $NMON0 192.168.0.16:51336 00004 00000 00008 SPARE -t 60 -f node_monitor.cmd cent-1: trafodion 56370 56203 futex_ 4552 133856 00:00:38 Sl sqwatchdog SQMON1.1 00000 00000 056370 $WDG000 192.168.0.16:51336 00005 00000 00001 SPARE cent-1: trafodion 56584 56203 futex_ 278308 2670664 00:02:22 SNl tm SQMON1.1 00000 00000 056584 $TM0 192.168.0.16:51336 00002 00000 00011 SPARE cent-2: trafodion 8542 44432 futex_ 37408 332476 00:00:01 SNl mxlobsrvr SQMON1.1 00001 00001 008542 $ZLOBSRV1 192.168.0.47:34220 00004 00001 00024 SPARE cent-2: trafodion 14474 13989 futex_ 44200 386480 00:00:27 Sl mxosrvr -ZKHOST cent-2.novalocal:2181 -RZ cent-2.novalocal:2:1 -ZKPNODE /trafodion -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 192.168.0.47 -MAXHEAPPCT 0 -STATISTICSINTERVAL 60 -STATISTICSLIMIT 60 -STATISTICSTYPE aggregated -STATISTICSENABLE true -SQLPLAN true -PORTMAPTOSECS -1 -PORTBINDTOSECS -1 -PUBLISHSTATSTOTSDB false -OPENTSDURL localhost:5242 cent-2: trafodion 15554 15120 futex_ 46096 386480 00:00:27 Sl mxosrvr -ZKHOST cent-2.novalocal:2181 -RZ cent-2.novalocal:2:2 -ZKPNODE /trafodion -CNGTO 60 -ZKSTO 180 -EADSCO 0 -TCPADD 192.168.0.47 -MAXHEAPPCT 0 -STATISTICSINTERVAL 60 -STATISTICSLIMIT 60 -STATISTICSTYPE aggregated -STATISTICSENABLE true -SQLPLAN true -PORTMAPTOSECS -1 -PORTBINDTOSECS -1 -PUBLISHSTATSTOTSDB false -OPENTSDURL localhost:5242 cent-2: trafodion 23248 44432 futex_ 37940 399060 00:00:14 SNl mxsscp SQMON1.1 00001 00001 023248 $ZSC001 192.168.0.47:34220 00004 00001 00022 SPARE cent-2: trafodion 23252 44432 futex_ 39828 409560 00:00:15 SNl mxssmp SQMON1.1 00001 00001 023252 $ZSM001 192.168.0.47:34220 00011 00001 00023 SPARE cent-2: trafodion 44581 44432 futex_ 5008 114804 00:00:28 Sl pstartd SQMON1.1 00001 00001 044581 $PSD001 192.168.0.47:34220 00012 00001 00002 SPARE cent-2: trafodion 44593 44432 futex_ 7908 97768 00:00:26 SNl service_monitor SQMON1.1 00001 00001 044593 $NMON1 192.168.0.47:34220 00004 00001 00003 SPARE -t 60 -f node_monitor.cmd cent-2: trafodion 44580 44432 futex_ 4524 133852 00:00:32 Sl sqwatchdog SQMON1.1 00001 00001 044580 $WDG001 192.168.0.47:34220 00005 00001 00001 SPARE cent-2: trafodion 44794 44432 futex_ 241652 2662688 00:01:57 SNl tm SQMON1.1 00001 00001 044794 $TM1 192.168.0.47:34220 00002 00001 00004 SPARE
20 pkillall(杀掉当前节点Trafodion核心进程)
[trafodion@cent-1 scripts]$ pkillall kill 23313: No such process
21 ckillall(杀掉集群Trafodion核心进程)
[trafodion@cent-1 scripts]$ ckillall Do you really want to continue? y/n : y Shutting down the DCS environment now stopping master. cent-2: stopping server. cent-1: stopping server. Shutting down the REST environment now stopping rest. Stopping OpenTSD... Stopping Tcollector... Stopping DBMgr... Stopping Bosun...
相关文章推荐
- Trouble shooting in linux - linux问题排查常用命令
- Web troubleshooting分析常用的命令
- Web troubleshooting分析常用的命令
- Trafodion Troubleshooting-server handle not available
- Trafodion Troubleshooting - Error 4411 Decryption error
- Trafodion Troubleshooting-initialize trafodion Killed
- Trafodion Troubleshooting-Certificate file error
- Trafodion Troubleshooting-Sort failed while writing to a scratch file with error 28
- Trafodion Troubleshooting-NotServingRegionException region is not online
- Trafodion Troubleshooting - Error occured while establishing the connection
- Trafodion Troubleshooting-org.apache.zookeeper.KeeperException$NoNodeException
- Trafodion Troubleshooting-current onlineEpoch is less than new onlineEpoch
- Trafodion Troubleshooting-Could not instantiate a region instance
- Trafodion Troubleshooting-java.io.IOException: createTable exception
- Trafodion Troubleshooting - TrxRegionEndpoint cannot be loaded
- Trafodion Troubleshooting-Region xxx is not online on xxx
- 组策略最佳实践|组策略命令|组策略编辑器 troubleshooting debug
- Trafodion Troubleshooting - HBase.client.ScannerTimeoutException
- Trafodion Troubleshooting-OutOfOrderScannerNextException
- 端口扫描分析(一)常用的网络相关命令