不要滥用UNLOGGED table 和 hash index
2015-10-23 11:12
411 查看
注意PostgreSQL的unlogged table是不记录xlog的,所以在备库上没有unlogged table的数据记录。
同时,数据库在进入恢复状态时,为了保证数据的一致性,postgresql会自动清除unlogged table的数据文件。
那么问题来了,万一你不小心误创建了unlogged table,你可能一开始会没有感知,但是一旦发生以下情况,你会发现数据不见了。
1. 数据库crash掉之后,重启。
2. 主备切换,备库变成主库,主库变成备库。
以上两种情况,都是数据库在启动并进入recovery状态后,原来主库下的unlogged table的datafile都会被清空。
代码见:
src/backend/access/transam/xlog.c
/* * This must be called ONCE during postmaster or standalone-backend startup */voidStartupXLOG(void){......
/* REDO */ if (InRecovery) { ...... /* * We're in recovery, so unlogged relations may be trashed and must be * reset. This should be done BEFORE allowing Hot Standby * connections, so that read-only backends don't try to read whatever * garbage is left over from before. */ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);...... /* * Reset unlogged relations to the contents of their INIT fork. This is * done AFTER recovery is complete so as to include any unlogged relations * created during recovery, but BEFORE recovery is marked as having * completed successfully. Otherwise we'd not retry if any of the post * end-of-recovery steps fail. */ if (InRecovery) ResetUnloggedRelations(UNLOGGED_RELATION_INIT);......
backend/storage/file/reinit.c
/* * Reset unlogged relations from before the last restart. * * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any * relation with an "init" fork, except for the "init" fork itself. * * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main * fork. */voidResetUnloggedRelations(int op){ char temp_path[MAXPGPATH]; DIR *spc_dir; struct dirent *spc_de; MemoryContext tmpctx, oldctx;
/* Log it. */ elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d", (op & UNLOGGED_RELATION_CLEANUP) != 0, (op & UNLOGGED_RELATION_INIT) != 0);
/* * Just to be sure we don't leak any memory, let's create a temporary * memory context for this operation. */ tmpctx = AllocSetContextCreate(CurrentMemoryContext, "ResetUnloggedRelations", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); oldctx = MemoryContextSwitchTo(tmpctx);
/* * First process unlogged files in pg_default ($PGDATA/base) */ ResetUnloggedRelationsInTablespaceDir("base", op);
/* * Cycle through directories for all non-default tablespaces. */ spc_dir = AllocateDir("pg_tblspc");
while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL) { if (strcmp(spc_de->d_name, ".") == 0 || strcmp(spc_de->d_name, "..") == 0) continue;
snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s", spc_de->d_name, TABLESPACE_VERSION_DIRECTORY); ResetUnloggedRelationsInTablespaceDir(temp_path, op); } FreeDir(spc_dir);
/* * Restore memory context. */ MemoryContextSwitchTo(oldctx); MemoryContextDelete(tmpctx);}
src/include/common/relpath.h
/* * Stuff for fork names. * * The physical storage of a relation consists of one or more forks. * The main fork is always created, but in addition to that there can be * additional forks for storing various metadata. ForkNumber is used when * we need to refer to a specific fork in a relation. */typedef enum ForkNumber{ InvalidForkNumber = -1, MAIN_FORKNUM = 0, FSM_FORKNUM, VISIBILITYMAP_FORKNUM, INIT_FORKNUM
/* * NOTE: if you add a new fork, change MAX_FORKNUM and possibly * FORKNAMECHARS below, and update the forkNames array in * src/common/relpath.c */} ForkNumber;
那么问题来了,如果真的这样了,有办法恢复吗?
1. 首先,如果你在原来的主库上有基础备份,你可以从基础备份恢复。为什么需要主库的备份呢,因为备库上没有unlogged table的数据文件内容,所以在备库备份是备不到unlogged table的datafile的。
但是这种恢复方法也务必要小心,你需要在启动数据库前,先把_init的文件都删掉,这样启动数据库时数据文件就不会被清除。
2. 从主库的逻辑备份中恢复。
3. 从审计日志中回放SQL恢复。
4. 如果你没有主库的基础备份,那么可有从文件系统中去恢复删掉的数据文件。例如ext4文件系统的恢复方式如下:
http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
例子:
创建一个unlogged table,并记录它的filenode
postgres=# create unlogged table utbl1(id int);CREATE TABLEpostgres=# insert into utbl1 select generate_series(1,1000);INSERT 0 1000postgres=# select pg_relation_filepath('utbl1'::regclass); pg_relation_filepath ---------------------- base/151898/33822(1 row)
查看到PG对unlogged table做了_init的后缀标记
http://www.postgresql.org/docs/9.4/static/storage-init.html
postgres@digoal-> cd $PGDATApostgres@digoal-> ll base/151898/33822*-rw------- 1 postgres postgres 40K Sep 26 11:39 base/151898/33822-rw------- 1 postgres postgres 24K Sep 26 11:39 base/151898/33822_fsm-rw------- 1 postgres postgres 0 Sep 26 11:38 base/151898/33822_init
现在,我们进入恢复模式
postgres@digoal-> mv recovery.done recovery.confpostgres@digoal-> pg_ctl start
数据库启动后,其实已经将unlogged table的数据文件清理掉了,见前面的源码。
postgres=# select count(*) from utbl1 ;ERROR: cannot access temporary or unlogged relations during recovery
现在再起来,unlogged table就没有数据了。
postgres@digoal-> mv recovery.conf recovery.donepostgres@digoal-> pg_ctl startpostgres=# select count(*) from utbl1 ; count ------- 0(1 row)
恢复过程,先停库,然后umount数据文件所在的文件系统。
pg_ctl stop -m fastroot@digoal-> umount /data01
假设我已经安装了extundelete
http://sourceforge.net/projects/extundelete/files/extundelete/
http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
root@digoal-> cd /opt/extundelete/root@digoal-> lltotal 4.0Kdrwxr-xr-x 2 root root 4.0K Sep 26 13:43 binroot@digoal-> cd bin/root@digoal-> lltotal 1.2M-rwxr-xr-x 1 root root 1.2M Sep 26 13:43 extundelete
查看数据文件所在的文件系统的块设备的inode信息。
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01./extundelete: unrecognized option '--ls'NOTICE: Extended attributes are not restored.Loading filesystem metadata ... 80 groups loaded.Group: 0Contents of inode 2:0000 | ed 41 00 00 00 10 00 00 6d 08 fa 55 bc 08 fa 55 | .A......m..U...U0010 | bc 08 fa 55 00 00 00 00 00 00 05 00 08 00 00 00 | ...U............0020 | 00 00 00 00 02 00 00 00 a1 22 00 00 00 00 00 00 | ........."......0030 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0040 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0050 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0060 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0070 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0080 | 1c 00 00 00 88 90 cc 84 88 90 cc 84 00 00 00 00 | ................0090 | 6d 08 fa 55 00 00 00 00 00 00 00 00 00 00 00 00 | m..U............00a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00b0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00c0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00d0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00e0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00f0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
Inode is AllocatedFile mode: 16877Low 16 bits of Owner Uid: 0Size in bytes: 4096Access time: 1442449517Creation time: 1442449596Modification time: 1442449596Deletion Time: 0Low 16 bits of Group Id: 0Links count: 5Blocks count: 8File flags: 0File version (for NFS): 0File ACL: 0Directory ACL: 0Fragment address: 0Direct blocks: 8865, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0Indirect block: 0Double indirect block: 0Triple indirect block: 0
File name | Inode number | Deleted status. 2.. 2lost+found 11pg_root_1921 131073pg_root_1922 393217
我们的数据文件在pg_root_1921目录下,对应的inode=131073,根据这个inode继续查询下一级目录的inode
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131073找到了base的inodebase 131077
继续找,找到了unlogged table所在的database的inode
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131077151898
131078
继续找,找对应的_init文件,找到了它的前缀,根据前缀过滤
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep _init33822_init 131152
找到了被删除的文件
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep 33822./extundelete: unrecognized option '--ls'33822 131116 Deleted33822_fsm 131147 Deleted33822_init 131152
使用inode恢复,恢复删除的datafile
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --restore-inode 131116
恢复后放在这里
root@digoal-> ll RECOVERED_FILES/ total 40K-rw-r--r-- 1 root root 40K Sep 26 13:50 file.131116
将文件拷贝到原来的位置
root@digoal-> mount /data01root@digoal-> cp file.131116 /data01/pg_root_1921/base/151898/33822root@digoal-> chown -R postgres:postgres /data01/root@digoal-> su - postgres
启动数据库前,切记,删除_init后缀的文件,否则数据文件可能又会被清理掉。
cd $PGDATApostgres@digoal-> rm -f base/151898/33822_init
启动数据库,数据已经回来了。
postgres@digoal-> pg_ctl startpostgres=# select count(*) from utbl1 ; count ------- 1000(1 row)
大家千万要切记,不要滥用unlogged table,除非你可以为你的行为负责。
hash index也是这样,因为不写XLOG,所以如果你使用流复制来实施HA的话,在切换到备库后,如果走hash index scan,你会发现数据凭空"消失",实际上是因为hash index的变更没有复制过去的原因。所以就不要使用hash index了吧。
同时,数据库在进入恢复状态时,为了保证数据的一致性,postgresql会自动清除unlogged table的数据文件。
那么问题来了,万一你不小心误创建了unlogged table,你可能一开始会没有感知,但是一旦发生以下情况,你会发现数据不见了。
1. 数据库crash掉之后,重启。
2. 主备切换,备库变成主库,主库变成备库。
以上两种情况,都是数据库在启动并进入recovery状态后,原来主库下的unlogged table的datafile都会被清空。
代码见:
src/backend/access/transam/xlog.c
/* * This must be called ONCE during postmaster or standalone-backend startup */voidStartupXLOG(void){......
/* REDO */ if (InRecovery) { ...... /* * We're in recovery, so unlogged relations may be trashed and must be * reset. This should be done BEFORE allowing Hot Standby * connections, so that read-only backends don't try to read whatever * garbage is left over from before. */ ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);...... /* * Reset unlogged relations to the contents of their INIT fork. This is * done AFTER recovery is complete so as to include any unlogged relations * created during recovery, but BEFORE recovery is marked as having * completed successfully. Otherwise we'd not retry if any of the post * end-of-recovery steps fail. */ if (InRecovery) ResetUnloggedRelations(UNLOGGED_RELATION_INIT);......
backend/storage/file/reinit.c
/* * Reset unlogged relations from before the last restart. * * If op includes UNLOGGED_RELATION_CLEANUP, we remove all forks of any * relation with an "init" fork, except for the "init" fork itself. * * If op includes UNLOGGED_RELATION_INIT, we copy the "init" fork to the main * fork. */voidResetUnloggedRelations(int op){ char temp_path[MAXPGPATH]; DIR *spc_dir; struct dirent *spc_de; MemoryContext tmpctx, oldctx;
/* Log it. */ elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d", (op & UNLOGGED_RELATION_CLEANUP) != 0, (op & UNLOGGED_RELATION_INIT) != 0);
/* * Just to be sure we don't leak any memory, let's create a temporary * memory context for this operation. */ tmpctx = AllocSetContextCreate(CurrentMemoryContext, "ResetUnloggedRelations", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); oldctx = MemoryContextSwitchTo(tmpctx);
/* * First process unlogged files in pg_default ($PGDATA/base) */ ResetUnloggedRelationsInTablespaceDir("base", op);
/* * Cycle through directories for all non-default tablespaces. */ spc_dir = AllocateDir("pg_tblspc");
while ((spc_de = ReadDir(spc_dir, "pg_tblspc")) != NULL) { if (strcmp(spc_de->d_name, ".") == 0 || strcmp(spc_de->d_name, "..") == 0) continue;
snprintf(temp_path, sizeof(temp_path), "pg_tblspc/%s/%s", spc_de->d_name, TABLESPACE_VERSION_DIRECTORY); ResetUnloggedRelationsInTablespaceDir(temp_path, op); } FreeDir(spc_dir);
/* * Restore memory context. */ MemoryContextSwitchTo(oldctx); MemoryContextDelete(tmpctx);}
src/include/common/relpath.h
/* * Stuff for fork names. * * The physical storage of a relation consists of one or more forks. * The main fork is always created, but in addition to that there can be * additional forks for storing various metadata. ForkNumber is used when * we need to refer to a specific fork in a relation. */typedef enum ForkNumber{ InvalidForkNumber = -1, MAIN_FORKNUM = 0, FSM_FORKNUM, VISIBILITYMAP_FORKNUM, INIT_FORKNUM
/* * NOTE: if you add a new fork, change MAX_FORKNUM and possibly * FORKNAMECHARS below, and update the forkNames array in * src/common/relpath.c */} ForkNumber;
那么问题来了,如果真的这样了,有办法恢复吗?
1. 首先,如果你在原来的主库上有基础备份,你可以从基础备份恢复。为什么需要主库的备份呢,因为备库上没有unlogged table的数据文件内容,所以在备库备份是备不到unlogged table的datafile的。
但是这种恢复方法也务必要小心,你需要在启动数据库前,先把_init的文件都删掉,这样启动数据库时数据文件就不会被清除。
2. 从主库的逻辑备份中恢复。
3. 从审计日志中回放SQL恢复。
4. 如果你没有主库的基础备份,那么可有从文件系统中去恢复删掉的数据文件。例如ext4文件系统的恢复方式如下:
http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
例子:
创建一个unlogged table,并记录它的filenode
postgres=# create unlogged table utbl1(id int);CREATE TABLEpostgres=# insert into utbl1 select generate_series(1,1000);INSERT 0 1000postgres=# select pg_relation_filepath('utbl1'::regclass); pg_relation_filepath ---------------------- base/151898/33822(1 row)
查看到PG对unlogged table做了_init的后缀标记
http://www.postgresql.org/docs/9.4/static/storage-init.html
postgres@digoal-> cd $PGDATApostgres@digoal-> ll base/151898/33822*-rw------- 1 postgres postgres 40K Sep 26 11:39 base/151898/33822-rw------- 1 postgres postgres 24K Sep 26 11:39 base/151898/33822_fsm-rw------- 1 postgres postgres 0 Sep 26 11:38 base/151898/33822_init
现在,我们进入恢复模式
postgres@digoal-> mv recovery.done recovery.confpostgres@digoal-> pg_ctl start
数据库启动后,其实已经将unlogged table的数据文件清理掉了,见前面的源码。
postgres=# select count(*) from utbl1 ;ERROR: cannot access temporary or unlogged relations during recovery
现在再起来,unlogged table就没有数据了。
postgres@digoal-> mv recovery.conf recovery.donepostgres@digoal-> pg_ctl startpostgres=# select count(*) from utbl1 ; count ------- 0(1 row)
恢复过程,先停库,然后umount数据文件所在的文件系统。
pg_ctl stop -m fastroot@digoal-> umount /data01
假设我已经安装了extundelete
http://sourceforge.net/projects/extundelete/files/extundelete/
http://blog.163.com/digoal@126/blog/static/16387704020142124032866/
root@digoal-> cd /opt/extundelete/root@digoal-> lltotal 4.0Kdrwxr-xr-x 2 root root 4.0K Sep 26 13:43 binroot@digoal-> cd bin/root@digoal-> lltotal 1.2M-rwxr-xr-x 1 root root 1.2M Sep 26 13:43 extundelete
查看数据文件所在的文件系统的块设备的inode信息。
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01./extundelete: unrecognized option '--ls'NOTICE: Extended attributes are not restored.Loading filesystem metadata ... 80 groups loaded.Group: 0Contents of inode 2:0000 | ed 41 00 00 00 10 00 00 6d 08 fa 55 bc 08 fa 55 | .A......m..U...U0010 | bc 08 fa 55 00 00 00 00 00 00 05 00 08 00 00 00 | ...U............0020 | 00 00 00 00 02 00 00 00 a1 22 00 00 00 00 00 00 | ........."......0030 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0040 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0050 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0060 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0070 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................0080 | 1c 00 00 00 88 90 cc 84 88 90 cc 84 00 00 00 00 | ................0090 | 6d 08 fa 55 00 00 00 00 00 00 00 00 00 00 00 00 | m..U............00a0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00b0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00c0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00d0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00e0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................00f0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................
Inode is AllocatedFile mode: 16877Low 16 bits of Owner Uid: 0Size in bytes: 4096Access time: 1442449517Creation time: 1442449596Modification time: 1442449596Deletion Time: 0Low 16 bits of Group Id: 0Links count: 5Blocks count: 8File flags: 0File version (for NFS): 0File ACL: 0Directory ACL: 0Fragment address: 0Direct blocks: 8865, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0Indirect block: 0Double indirect block: 0Triple indirect block: 0
File name | Inode number | Deleted status. 2.. 2lost+found 11pg_root_1921 131073pg_root_1922 393217
我们的数据文件在pg_root_1921目录下,对应的inode=131073,根据这个inode继续查询下一级目录的inode
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131073找到了base的inodebase 131077
继续找,找到了unlogged table所在的database的inode
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131077151898
131078
继续找,找对应的_init文件,找到了它的前缀,根据前缀过滤
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep _init33822_init 131152
找到了被删除的文件
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --inode 131078|grep 33822./extundelete: unrecognized option '--ls'33822 131116 Deleted33822_fsm 131147 Deleted33822_init 131152
使用inode恢复,恢复删除的datafile
root@digoal-> ./extundelete --ls --inode 2 /dev/mapper/vgdata01-lv01 --restore-inode 131116
恢复后放在这里
root@digoal-> ll RECOVERED_FILES/ total 40K-rw-r--r-- 1 root root 40K Sep 26 13:50 file.131116
将文件拷贝到原来的位置
root@digoal-> mount /data01root@digoal-> cp file.131116 /data01/pg_root_1921/base/151898/33822root@digoal-> chown -R postgres:postgres /data01/root@digoal-> su - postgres
启动数据库前,切记,删除_init后缀的文件,否则数据文件可能又会被清理掉。
cd $PGDATApostgres@digoal-> rm -f base/151898/33822_init
启动数据库,数据已经回来了。
postgres@digoal-> pg_ctl startpostgres=# select count(*) from utbl1 ; count ------- 1000(1 row)
大家千万要切记,不要滥用unlogged table,除非你可以为你的行为负责。
hash index也是这样,因为不写XLOG,所以如果你使用流复制来实施HA的话,在切换到备库后,如果走hash index scan,你会发现数据凭空"消失",实际上是因为hash index的变更没有复制过去的原因。所以就不要使用hash index了吧。
相关文章推荐
- 2015/10/23 回顾
- 虚拟机磁盘 centos 扩展空间
- 深入理解jQuery.css和jQuery.style源码
- JAVA EL表达式详细介绍使用手册
- LoadRunner下载文件脚本
- Loadrunner的自定义监控器
- LoadRunner常见函数分析
- LoadRunner如何监控Tomcat性能
- Android 版本自动更新
- 企业网站首页结构布局几点心得感悟
- Leetcode || Reverse Integer
- 特殊的排序(思维好题)
- vs2010汇编开发环境搭建
- 一些容易出错的细节
- Javascript规范
- linux使用top命令查看系统状态
- java常用术语
- Markdown有关技术浅析
- 通信协议——Http、TCP、UDP
- codeforces 551 D. GukiZ and Binary Operations