RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop
2014-11-11 10:37
393 查看
Sqoop 简介
Apache Sqoop 项目旨在协助 RDBMS 与 Hadoop 之间进行高效的大数据交流。用户可以在 Sqoop 的帮助下,轻松地把关系型数据库的数据导入到 Hadoop 与其相关的系统 ( 如 HBase 和 Hive) 中;同时也可以把数据从 Hadoop 系统里抽取并导出到关系型数据库里。除了这些主要的功能外,Sqoop 也提供了一些诸如查看数据库表等实用的小工具。Sqoop 支持的数据库理论上,Sqoop 支持任何一款支持 JDBC 规范的数据库,如 DB2、MySQL 等。在使用 Sqoop 连接关系型数据库前,首先需要把相关的 JDBC 驱动拷贝到 $SQOOP_HOME/lib 文件夹下,然后在“connect”参数后指定好数据库连接的 url,如“--connect jdbc:db2://localhost:50000/SAMPLE”。 对于 DB2 数据库来说,Sqoop 目前支持 DB2 的绝大多数数据类型,而且 Sqoop 的大多数工具也能在 DB2 上较好地运行。Sqoop 支持的文件类型Sqoop 能够将 DB2 数据库的数据导入到 HDFS 上,并保存为多种文件类型。常见的有定界文本类型,Avro 二进制类型以及 SequenceFiles 类型。在本文里,统一用定界文本类型。Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS list-databases List available databases on a server list-tables List available tables in a database version Display version information See 'sqoop help COMMAND' for information on a specific command.
List RDBMS Databases; ./sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -password root123;
List RDBMS Tables;
./sqoop list-tables --connect jdbc:mysql://127.0.0.1:3306/feixu --username root --password root123;
Import RDBMS Tables; ./sqoop import --connect jdbc:mysql://127.0.0.1:3306/feixu --username root --password root123 --table USERS;
hadoop@feixu-master1:/usr/local/sqoop/bin$ ./sqoop import --connect jdbc:mysql://192.168.56.102:3306/feixu --username root --password root123 --table RATES_BK; Warning: /usr/lib/hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. 13/12/20 05:03:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/12/20 05:03:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 13/12/20 05:03:52 INFO tool.CodeGenTool: Beginning code generation 13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1 13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1 13/12/20 05:03:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop Note: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 13/12/20 05:03:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.jar 13/12/20 05:03:53 WARN manager.MySQLManager: It looks like you are importing from mysql. 13/12/20 05:03:53 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 13/12/20 05:03:53 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 13/12/20 05:03:53 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job. 13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job. 13/12/20 05:03:53 INFO mapreduce.ImportJobBase: Beginning import of RATES_BK 13/12/20 05:03:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`USER_ID`), MAX(`USER_ID`) FROM `RATES_BK` 13/12/20 05:03:57 INFO mapred.JobClient: Running job: job_201312200120_0010 13/12/20 05:03:58 INFO mapred.JobClient: map 0% reduce 0% 13/12/20 05:04:13 INFO mapred.JobClient: map 25% reduce 0% 13/12/20 05:04:14 INFO mapred.JobClient: map 50% reduce 0% 13/12/20 05:04:15 INFO mapred.JobClient: map 100% reduce 0% 13/12/20 05:04:17 INFO mapred.JobClient: Job complete: job_201312200120_0010 13/12/20 05:04:17 INFO mapred.JobClient: Counters: 18 13/12/20 05:04:17 INFO mapred.JobClient: Job Counters 13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=36108 13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/12/20 05:04:17 INFO mapred.JobClient: Launched map tasks=4 13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/12/20 05:04:17 INFO mapred.JobClient: File Output Format Counters 13/12/20 05:04:17 INFO mapred.JobClient: Bytes Written=216525 13/12/20 05:04:17 INFO mapred.JobClient: FileSystemCounters 13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_READ=440 13/12/20 05:04:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=268158 13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=216525 13/12/20 05:04:17 INFO mapred.JobClient: File Input Format Counters 13/12/20 05:04:17 INFO mapred.JobClient: Bytes Read=0 13/12/20 05:04:17 INFO mapred.JobClient: Map-Reduce Framework 13/12/20 05:04:17 INFO mapred.JobClient: Map input records=10000 13/12/20 05:04:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=152514560 13/12/20 05:04:17 INFO mapred.JobClient: Spilled Records=0 13/12/20 05:04:17 INFO mapred.JobClient: CPU time spent (ms)=3350 13/12/20 05:04:17 INFO mapred.JobClient: Total committed heap usage (bytes)=37519360 13/12/20 05:04:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1391800320 13/12/20 05:04:17 INFO mapred.JobClient: Map output records=10000 13/12/20 05:04:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=440 13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Transferred 211.4502 KB in 23.3888 seconds (9.0407 KB/sec) 13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Retrieved 10000 records.
相关文章推荐
- Hadoop数据工具sqoop,导入HDFS,HIVE,HBASE,导出到oracle
- MySQL向Hive/HBase的迁移工具
- CDH4安装实践HDFS、HBase、Zookeeper、Hive、Oozie、Sqoop
- Sqoop_详细总结 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- Sqoop_详细总结 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- 大数据基础(二)hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装和sqoop与hdfs,hive,mysql导入导出
- Sqoop_具体总结 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- Hadoop2.0集群、Hbase集群、Zookeeper集群、Hive工具、Sqoop工具、Flume工具搭建总结
- sqoop:mysql和Hbase/Hive/Hdfs之间相互导入数据
- 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- MySQL向Hive/HBase的迁移工具
- sqoop:mysql和Hbase/Hive/Hdfs之间相互导入数据
- 使用Sqoop将HDFS/Hive/HBase与MySQL/Oracle中的数据相互导入、导出
- Sqoop数据迁移,工作机制,sqoop安装(配置),Sqoop的数据导入,导入表数据到HDFS,导入关系表到HIVE,导入到HDFS指定目录,导入表数据子集,按需导入,增量导入,sqoop数据导出
- mysql 导入hdfs、hive、hbase sqoop使用方法
- sqoop命令,mysql导入到hdfs、hbase、hive
- sqoop向hdfs,hive,hbase导入数据
- 利用Sqoop把Oracle数据迁移到HDFS、Hive
- MySQL向Hive/HBase的迁移工具
- hive的数据导入与数据导出:(本地,云hdfs,hbase),列分隔符的设置,以及hdfs上传给pig如何处理