您的位置：首页 > 其它

HBase备份之导入导出

2016-02-17 11:28 393 查看

我们在上一篇文章《HBase复制》中讲述了如何建立主/从集群，实现数据的实时备份。但是，HBase复制只对设置好复制以后的数据生效，也即，配置好复制之后插入HBase主集群的数据才能同步复制到HBase从集群中，而对之前的历史数据，采用HBase复制这种办法是无能为力的。本文介绍如何使用HBase的导入导出功能来实现历史数据的备份。

1）将HBase表数据导出到hdfs的一个指定目录中，具体命令如下：

[plain] view
plain copy

print ?

$ cd $HBASE_HOME/

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export test_table /data/test_table

其中，$HBASE_HOME为HBase主目录，test_table为要导出的表名，/data/test_table为hdfs中的目录地址。
执行结果太长，这里截取最后一部分，如下所示：

[plain] view
plain copy

print ?

2014-08-11 16:49:44,484 INFO  [main] mapreduce.Job: Running job: job_1407491918245_0021

2014-08-11 16:49:51,658 INFO  [main] mapreduce.Job: Job job_1407491918245_0021 running in uber mode : false

2014-08-11 16:49:51,659 INFO  [main] mapreduce.Job:  map 0% reduce 0%

2014-08-11 16:49:57,706 INFO  [main] mapreduce.Job:  map 100% reduce 0%

2014-08-11 16:49:57,715 INFO  [main] mapreduce.Job: Job job_1407491918245_0021 completed successfully

2014-08-11 16:49:57,789 INFO  [main] mapreduce.Job: Counters: 37

    File System Counters

        FILE: Number of bytes read=0

        FILE: Number of bytes written=118223

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=84

        HDFS: Number of bytes written=243

        HDFS: Number of read operations=4

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=2

    Job Counters

        Launched map tasks=1

        Rack-local map tasks=1

        Total time spent by all maps in occupied slots (ms)=9152

        Total time spent by all reduces in occupied slots (ms)=0

    Map-Reduce Framework

        Map input records=3

        Map output records=3

        Input split bytes=84

        Spilled Records=0

        Failed Shuffles=0

        Merged Map outputs=0

        GC time elapsed (ms)=201

        CPU time spent (ms)=5210

        Physical memory (bytes) snapshot=377470976

        Virtual memory (bytes) snapshot=1863364608

        Total committed heap usage (bytes)=1029177344

    HBase Counters

        BYTES_IN_REMOTE_RESULTS=87

        BYTES_IN_RESULTS=87

        MILLIS_BETWEEN_NEXTS=444

        NOT_SERVING_REGION_EXCEPTION=0

        NUM_SCANNER_RESTARTS=0

        REGIONS_SCANNED=1

        REMOTE_RPC_CALLS=3

        REMOTE_RPC_RETRIES=0

        RPC_CALLS=3

        RPC_RETRIES=0

    File Input Format Counters

        Bytes Read=0

    File Output Format Counters

        Bytes Written=243

查看以下指定的导出目录，命令如下：

[plain] view
plain copy

print ?

$ cd $HADOOP_HOME/

$ bin/hadoop fs -ls /data/test_table

其中$HADOOP_HOME为hadoop的主目录。结果如下：

[plain] view
plain copy

print ?

Found 2 items

-rw-r--r--   3 hbase supergroup          0 2014-08-11 16:49 /data/test_table/_SUCCESS

-rw-r--r--   3 hbase supergroup        243 2014-08-11 16:49 /data/test_table/part-m-00000

执行以下hbase shell命令，查看以下test_table表中的数据：

[plain] view
plain copy

print ?

$ cd $HBASE_HOME/

$ bin/hbase shell

2014-08-11 17:05:52,589 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014



hbase(main):001:0> describe 'test_table'

DESCRIPTION                                                                                                                               ENABLED

'test_table', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '1', COMPRESSION => 'NONE', VERSIONS => true

  '1', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>

  'true'}

1 row(s) in 1.3400 seconds



hbase(main):002:0> scan 'test_table'

ROW                                                    COLUMN+CELL

r1                                                    column=cf:q1, timestamp=1406788229440, value=va1

r2                                                    column=cf:q1, timestamp=1406788265646, value=va2

r3                                                    column=cf:q1, timestamp=1406788474301, value=va3

3 row(s) in 0.0560 seconds

至此，HBase表数据导出结束。接下来开始导入工作。

2）将导出到hdfs中的数据导入到hbase创建好的表中。注意，该表可以和之前的表不同名，但模式一定要相同。我们领取一个名字，使用test_copy这个表名。创建表的命令如下：

[plain] view
plain copy

print ?

$ cd $HBASE_HOME/

$ bin/hbase shell

2014-08-11 17:05:52,589 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014



hbase(main):001:0> create 'test_copy', 'cf'

0 row(s) in 1.1980 seconds



=> Hbase::Table - test_copy

接下来，执行导入命令。具体的命令如下：

[plain] view
plain copy

print ?

$ cd $HBASE_HOME/

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import test_copy hdfs://l-master.data/data/test_table

其中，test_copy为我们想要导入的表名。而hdfs://l-master.data/data/test_table为master集群的hdfs中，我们之前将test_table表导出hdfs的全路径。

导入命令执行的结果如下，因为结果很长，所以取最后一部分：

[plain] view
plain copy

print ?

2014-08-11 17:13:08,706 INFO  [main] mapreduce.Job:  map 100% reduce 0%

2014-08-11 17:13:08,710 INFO  [main] mapreduce.Job: Job job_1407728839061_0014 completed successfully

2014-08-11 17:13:08,715 INFO  [main] mapreduce.Job: Counters: 27

    File System Counters

        FILE: Number of bytes read=0

        FILE: Number of bytes written=117256

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=356

        HDFS: Number of bytes written=0

        HDFS: Number of read operations=3

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=0

    Job Counters

        Launched map tasks=1

        Rack-local map tasks=1

        Total time spent by all maps in occupied slots (ms)=6510

        Total time spent by all reduces in occupied slots (ms)=0

    Map-Reduce Framework

        Map input records=3

        Map output records=3

        Input split bytes=113

        Spilled Records=0

        Failed Shuffles=0

        Merged Map outputs=0

        GC time elapsed (ms)=21

        CPU time spent (ms)=1110

        Physical memory (bytes) snapshot=379494400

        Virtual memory (bytes) snapshot=1855762432

        Total committed heap usage (bytes)=1029177344

    File Input Format Counters

        Bytes Read=243

    File Output Format Counters

        Bytes Written=0

接下来，我们看看从集群test_copy表中的数据是否和主集群test_table表的数据一致，执行hbase shell命令：

[plain] view
plain copy

print ?

$ cd $HBASE_HOME/

$ bin/hbase shell

2014-08-11 17:15:52,117 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014



hbase(main):001:0> scan 'test_copy'

ROW                                                    COLUMN+CELL

r1                                                    column=cf:q1, timestamp=1406788229440, value=va1

r2                                                    column=cf:q1, timestamp=1406788265646, value=va2

r3                                                    column=cf:q1, timestamp=1406788474301, value=va3

3 row(s) in 0.3640 seconds

对照后，就可以发现，两个表的数据是完全一致的。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航