Greenplum failed segment的恢复方法
2016-07-18 11:41
471 查看
【前记】
Segment检测及故障切换机制
GP Master首先会检测Primary状态,如果Primary不可连通,那么将会检测Mirror状态,Primary/Mirror状态总共有4种:
1. Primary活着,Mirror活着。GP Master探测Primary成功之后直接返回,进行下一个Segment检测;
2. Primary活着,Mirror挂了。GP Master探测Primary成功之后,通过Primary返回的状态得知Mirror挂掉了(Mirror挂掉之后,Primary将会探测到,将自己变成ChangeTracking模式),这时候更新Master元信息,进行下一个Segment检测;
3. Primary挂了,Mirror活着。GP Master探测Primary失败之后探测Mirror,发现Mirror是活着,这时候更新Master上面的元信息,同时使Mirror接管Primary(故障切换),进行下一个Segment检测;
4. Primary挂了,Mirror挂了。GP Master探测Primary失败之后探测Mirror,Mirror也是挂了,直到重试最大值,结束这个Segment的探测,也不更新Master元信息了,进行下一个Segment检测。
上面的2-4需要进行gprecoverseg进行segment恢复。
对失败的segment节点;启动时会直接跳过,忽略。
查看数据库的mirror的节点启动状态
可直观看出“[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Failed ”
如何恢复这个mirror segment呢?当然primary segment也是这样恢复的
1. 首先产生一个恢复的配置文件 : gprecoverseg -o ./recov
2. 查看恢复的配置文件;可以知道哪些segment需要恢复
3. 使用这个配置文件进行恢复 : gprecoverseg -i ./recov
4. 查看恢复状态
5. 到上一步,数据库的主备就恢复了,但是还有一步,是可选的。
你要不要把primary , mirror角色对调一下,因为现在mirror和primary和优先角色是相反的。
如果要对调,使用以下命令,会停库来处理。
【总结】
用于修复Segment的是gprecoverseg。使用方式比较简单,有限的几个主要参数如下:
-i :主要参数,用于指定一个配置文件,该配置文件描述了需要修复的Segment和修复后的目的位置。
-F :可选项,指定后,gprecoverseg会将”-i”中指定的或标记”d”的实例删除,并从活着的Mirror复制一个完整一份到目标位置。
-r :当FTS发现有Primary宕机并进行主备切换,在gprecoverseg修复后,担当Primary的Mirror角色并不会立即切换回来,就会导致部分主机上活跃的Segment过多从而引起性能瓶颈。因此需要恢复Segment原先的角色,称为re-balance。
Segment检测及故障切换机制
GP Master首先会检测Primary状态,如果Primary不可连通,那么将会检测Mirror状态,Primary/Mirror状态总共有4种:
1. Primary活着,Mirror活着。GP Master探测Primary成功之后直接返回,进行下一个Segment检测;
2. Primary活着,Mirror挂了。GP Master探测Primary成功之后,通过Primary返回的状态得知Mirror挂掉了(Mirror挂掉之后,Primary将会探测到,将自己变成ChangeTracking模式),这时候更新Master元信息,进行下一个Segment检测;
3. Primary挂了,Mirror活着。GP Master探测Primary失败之后探测Mirror,发现Mirror是活着,这时候更新Master上面的元信息,同时使Mirror接管Primary(故障切换),进行下一个Segment检测;
4. Primary挂了,Mirror挂了。GP Master探测Primary失败之后探测Mirror,Mirror也是挂了,直到重试最大值,结束这个Segment的探测,也不更新Master元信息了,进行下一个Segment检测。
上面的2-4需要进行gprecoverseg进行segment恢复。
对失败的segment节点;启动时会直接跳过,忽略。
[gpadmin@mdw ~]$ gpstart 20160718:18:43:27:002949 gpstart:mdw:gpadmin-[INFO]:-Starting gpstart with args: 20160718:18:43:27:002949 gpstart:mdw:gpadmin-[INFO]:-Gathering information and validating the environment... 20160718:18:43:27:002949 gpstart:mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' 20160718:18:43:28:002949 gpstart:mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150' 20160718:18:43:28:002949 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance in admin mode 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Setting new master era 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Master Started... 20160718:18:43:30:002949 gpstart:mdw:gpadmin-[INFO]:-Shutting down master 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on sdw2 directory /home/gpadmin/gpdata/gpdatam/gpseg0 <<<<< 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master instance parameters 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Database = template1 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master Port = 1921 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master directory = /home/gpadmin/gpdata/pgmaster/gpseg-1 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Timeout = 600 seconds 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Master standby = Off 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:-Segment instances that will be started 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:--------------------------------------- 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- Host Datadir Port Role 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatap/gpseg0 40000 Primary 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- sdw2 /home/gpadmin/gpdata/gpdatap/gpseg1 40000 Primary 20160718:18:43:32:002949 gpstart:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 50000 Mirror Continue with Greenplum instance startup Yy|Nn (default=N): > y 20160718:18:43:34:002949 gpstart:mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait... ........... 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:-Process results... 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:- Successful segment starts = 3 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:- Failed segment starts = 0 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 1 <<<<<<<< 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160718:18:43:45:002949 gpstart:mdw:gpadmin-[INFO]:- 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[INFO]:-Successfully started 3 of 3 segment instances, skipped 1 other segments 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[INFO]:----------------------------------------------------- 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-There are 1 segment(s) marked down in the database 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases. 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[WARNING]:-**************************************************************************** 20160718:18:43:46:002949 gpstart:mdw:gpadmin-[INFO]:-Starting Master instance mdw directory /home/gpadmin/gpdata/pgmaster/gpseg-1 20160718:18:43:48:002949 gpstart:mdw:gpadmin-[INFO]:-Command pg_ctl reports Master mdw instance active 20160718:18:43:49:002949 gpstart:mdw:gpadmin-[INFO]:-No standby master configured. skipping... 20160718:18:43:49:002949 gpstart:mdw:gpadmin-[WARNING]:-Number of segments not attempted to start: 1 20160718:18:43:49:002949 gpstart:mdw:gpadmin-[INFO]:-Check status of database with gpstate utility
查看数据库的mirror的节点启动状态
[gpadmin@mdw ~]$ gpstate -m 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:--Type = Spread 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Failed <<<<<<<< 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 50000 Passive Synchronized 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:45:48:003084 gpstate:mdw:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) have failed
可直观看出“[WARNING]:-sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Failed ”
如何恢复这个mirror segment呢?当然primary segment也是这样恢复的
1. 首先产生一个恢复的配置文件 : gprecoverseg -o ./recov
[gpadmin@mdw ~]$ gprecoverseg -o ./recov 20160718:18:47:04:003134 gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -o ./recov 20160718:18:47:04:003134 gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' 20160718:18:47:04:003134 gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' 20160718:18:47:04:003134 gprecoverseg:mdw:gpadmin-[INFO]:-Checking if segments are ready 20160718:18:47:04:003134 gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:47:06:003134 gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:47:07:003134 gprecoverseg:mdw:gpadmin-[INFO]:-Configuration file output to ./recov successfully.
2. 查看恢复的配置文件;可以知道哪些segment需要恢复
[gpadmin@mdw ~]$ cat recov filespaceOrder=fastdisk sdw2:50000:/home/gpadmin/gpdata/gpdatam/gpseg0
3. 使用这个配置文件进行恢复 : gprecoverseg -i ./recov
[gpadmin@mdw ~]$ gprecoverseg -i ./recov 20160718:18:47:33:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Starting gprecoverseg with args: -i ./recov 20160718:18:47:34:003187 gprecoverseg:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' 20160718:18:47:34:003187 gprecoverseg:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' 20160718:18:47:34:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Checking if segments are ready 20160718:18:47:34:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Greenplum instance recovery parameters 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Recovery from configuration -i option supplied 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Recovery 1 of 1 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:---------------------------------------------------------- 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Synchronization mode = Incremental 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance host = sdw2 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance address = sdw2 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance directory = /home/gpadmin/gpdata/gpdatam/gpseg0 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance port = 50000 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance replication port = 51000 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Failed instance fastdisk directory = /data/gpdata/seg1/pg_mir_cdr/gpseg0 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance host = sdw1 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance address = sdw1 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance directory = /home/gpadmin/gpdata/gpdatap/gpseg0 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance port = 40000 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance replication port = 41000 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Source instance fastdisk directory = /data/gpdata/seg1/pg_pri_cdr/gpseg0 20160718:18:47:35:003187 gprecoverseg:mdw:gpadmin-[INFO]:- Recovery Target = in-place 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Process results... 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Done updating primaries 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:-****************************************************************** 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Updating segments for resynchronization is completed. 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background. 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:- 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:-Use gpstate -s to check the resynchronization progress. 20160718:18:48:06:003187 gprecoverseg:mdw:gpadmin-[INFO]:-******************************************************************
4. 查看恢复状态
[gpadmin@mdw ~]$ gpstate -m 20160718:18:48:39:003353 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -m 20160718:18:48:39:003353 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.8.1 build 1' 20160718:18:48:39:003353 gpstate:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.8.1 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Apr 20 2016 08:08:56' 20160718:18:48:39:003353 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from master... 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:--Current GPDB mirror list and status 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:--Type = Spread 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:-------------------------------------------------------------- 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:- Mirror Datadir Port Status Data Status 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:- sdw2 /home/gpadmin/gpdata/gpdatam/gpseg0 50000 Passive Resynchronizing 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:- sdw1 /home/gpadmin/gpdata/gpdatam/gpseg1 50000 Passive Synchronized 20160718:18:48:40:003353 gpstate:mdw:gpadmin-[INFO]:--------------------------------------------------------------
5. 到上一步,数据库的主备就恢复了,但是还有一步,是可选的。
你要不要把primary , mirror角色对调一下,因为现在mirror和primary和优先角色是相反的。
如果要对调,使用以下命令,会停库来处理。
gprecoverseg -r
【总结】
用于修复Segment的是gprecoverseg。使用方式比较简单,有限的几个主要参数如下:
-i :主要参数,用于指定一个配置文件,该配置文件描述了需要修复的Segment和修复后的目的位置。
-F :可选项,指定后,gprecoverseg会将”-i”中指定的或标记”d”的实例删除,并从活着的Mirror复制一个完整一份到目标位置。
-r :当FTS发现有Primary宕机并进行主备切换,在gprecoverseg修复后,担当Primary的Mirror角色并不会立即切换回来,就会导致部分主机上活跃的Segment过多从而引起性能瓶颈。因此需要恢复Segment原先的角色,称为re-balance。
相关文章推荐
- 从零开始,5分钟创建并玩转属于自己的区块链(图文攻略)
- 真机测试遇到 ——is not paired with your computer
- git时认证失败(Authentication failed )的解决
- 安装Ubutun 12.04 遇到the grub-efi-amd64-signed package failed to install into...【解决办法】
- 最新中国菜刀caidao-20160620下载和说明
- 【poj3691-DNA repair】AC自动机+DP
- ubuntu enable all Ubuntu software (main universe restricted multiverse) repositories use
- 8A - Train and Peter
- the user operation is waiting
- Gradle sync failed: Cause: org/gradle/api/publication/maven/internal/DefaultMavenFactory
- 树形DP+DFS序+树状数组 HDOJ 5293 Tree chain problem(树链问题)
- HDU - 1021 Fibonacci Again
- 树链剖分+线段树 HDOJ 5029 Relief grain(分配粮食)
- Constraint 约束增强说明
- 解决.NET Core中MailKit无法使用阿里云邮件推送服务的问题
- HDU 1023 Train Problem II 卡特兰数 高精度
- aidl 进程间通信
- Ubuntu远程连接报错:xrdp_mm_process_login_response: login failed
- TurboMail邮件系统入驻第一重型机械集团
- Paint研究