sqoop使用心得~~~
2012-03-01 13:30
239 查看
sqoop是么哥?sqoop就是封装了RDB与hdfs之间互相cp数据的一个工具,没有它,我们当然也可以写一坨脚本,手动执行,但是用它会更方便,so。。。
安装步骤:
1、去 http://archive.cloudera.com/cdh/3/ 下载开发包,现在一般下cdh3u3。
2、下载jdbc rdb 驱动jar包,我测试用的是mysql 驱动。
3、解压完之后把mysql 驱动包放入sqoop目录的lib下。
4、配置环境变量:前提为hadoop,hive,hbase,zk都已安装,且配置正确,只需要配置SQOOP_HOME即可,因为我发现bin/configure-sqoop 里面配置了
export SQOOP_CLASSPATH
export SQOOP_CONF_DIR
export SQOOP_JAR_DIR
export HADOOP_CLASSPATH
export HADOOP_HOME
export HBASE_HOME
SQOOP_HOME是直接引用的,所以需要配,否则会找不到jar包
如果hbase,zk未安装,则要去configure-sqoop 里注释掉相应的检查语句。
测试:
RDB到hdfs:
导出的数据默认到hadoop hdfs主目录下,可使用 参数指定其它目录。
也可以直接导入hive,具体用法请参见官方文档。
hdfs到RDB:
hdfs文件分隔符的问题还是要注意的,mysql默认分隔符是逗号,如果hdfs文件分割符是其它的需要制定参数 ,如:--input-fields-terminated-by '\001'
安装步骤:
1、去 http://archive.cloudera.com/cdh/3/ 下载开发包,现在一般下cdh3u3。
2、下载jdbc rdb 驱动jar包,我测试用的是mysql 驱动。
3、解压完之后把mysql 驱动包放入sqoop目录的lib下。
4、配置环境变量:前提为hadoop,hive,hbase,zk都已安装,且配置正确,只需要配置SQOOP_HOME即可,因为我发现bin/configure-sqoop 里面配置了
export SQOOP_CLASSPATH
export SQOOP_CONF_DIR
export SQOOP_JAR_DIR
export HADOOP_CLASSPATH
export HADOOP_HOME
export HBASE_HOME
SQOOP_HOME是直接引用的,所以需要配,否则会找不到jar包
如果hbase,zk未安装,则要去configure-sqoop 里注释掉相应的检查语句。
测试:
RDB到hdfs:
[hadoop@master cuirong]$ sqoop import --connect jdbc:mysql://master:3306/sqoop --username sqoop --password sqoop123 --table t_etl_cmd_relation -m 1 12/03/01 12:12:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 12/03/01 12:12:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 12/03/01 12:12:56 INFO tool.CodeGenTool: Beginning code generation 12/03/01 12:12:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_etl_cmd_relation` AS t LIMIT 1 12/03/01 12:12:56 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop 12/03/01 12:12:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/9ced3ca3be0a259acc993a93328c40c8/t_etl_cmd_relation.jar 12/03/01 12:12:58 WARN manager.MySQLManager: It looks like you are importing from mysql. 12/03/01 12:12:58 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 12/03/01 12:12:58 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 12/03/01 12:12:58 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 12/03/01 12:12:58 INFO mapreduce.ImportJobBase: Beginning import of t_etl_cmd_relation 12/03/01 12:13:00 INFO mapred.JobClient: Running job: job_201202212041_15467 12/03/01 12:13:01 INFO mapred.JobClient: map 0% reduce 0% 12/03/01 12:13:06 INFO mapred.JobClient: map 100% reduce 0% 12/03/01 12:13:06 INFO mapred.JobClient: Job complete: job_201202212041_15467 12/03/01 12:13:06 INFO mapred.JobClient: Counters: 16 12/03/01 12:13:06 INFO mapred.JobClient: Job Counters 12/03/01 12:13:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4925 12/03/01 12:13:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/03/01 12:13:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/03/01 12:13:06 INFO mapred.JobClient: Launched map tasks=1 12/03/01 12:13:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/03/01 12:13:06 INFO mapred.JobClient: FileSystemCounters 12/03/01 12:13:06 INFO mapred.JobClient: HDFS_BYTES_READ=87 12/03/01 12:13:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56161 12/03/01 12:13:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1131 12/03/01 12:13:06 INFO mapred.JobClient: Map-Reduce Framework 12/03/01 12:13:06 INFO mapred.JobClient: Map input records=16 12/03/01 12:13:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=130781184 12/03/01 12:13:06 INFO mapred.JobClient: Spilled Records=0 12/03/01 12:13:06 INFO mapred.JobClient: CPU time spent (ms)=640 12/03/01 12:13:06 INFO mapred.JobClient: Total committed heap usage (bytes)=505348096 12/03/01 12:13:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1533403136 12/03/01 12:13:06 INFO mapred.JobClient: Map output records=16 12/03/01 12:13:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 12/03/01 12:13:06 INFO mapreduce.ImportJobBase: Transferred 1.1045 KB in 8.0517 seconds (140.4665 bytes/sec) 12/03/01 12:13:06 INFO mapreduce.ImportJobBase: Retrieved 16 records.测试:
导出的数据默认到hadoop hdfs主目录下,可使用 参数指定其它目录。
也可以直接导入hive,具体用法请参见官方文档。
hdfs到RDB:
[hadoop@master cuirong]$ sqoop export --connect jdbc:mysql://master:3306/sqoop --username sqoop --password sqoop123 --table t_etl_cmd_relation --export-dir /tmp/cui1/part-m-00000 12/03/01 12:52:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 12/03/01 12:52:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 12/03/01 12:52:10 INFO tool.CodeGenTool: Beginning code generation 12/03/01 12:52:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_etl_cmd_relation` AS t LIMIT 1 12/03/01 12:52:10 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop 12/03/01 12:52:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/44fdcaa5fbd99233f89008548b08b370/t_etl_cmd_relation.jar 12/03/01 12:52:11 INFO mapreduce.ExportJobBase: Beginning export of t_etl_cmd_relation 12/03/01 12:52:13 INFO input.FileInputFormat: Total input paths to process : 1 12/03/01 12:52:13 INFO input.FileInputFormat: Total input paths to process : 1 12/03/01 12:52:13 INFO mapred.JobClient: Running job: job_201202212041_15480 12/03/01 12:52:14 INFO mapred.JobClient: map 0% reduce 0% 12/03/01 12:52:19 INFO mapred.JobClient: map 100% reduce 0% 12/03/01 12:52:19 INFO mapred.JobClient: Job complete: job_201202212041_15480 12/03/01 12:52:19 INFO mapred.JobClient: Counters: 16 12/03/01 12:52:19 INFO mapred.JobClient: Job Counters 12/03/01 12:52:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4093 12/03/01 12:52:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/03/01 12:52:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/03/01 12:52:19 INFO mapred.JobClient: Rack-local map tasks=1 12/03/01 12:52:19 INFO mapred.JobClient: Launched map tasks=1 12/03/01 12:52:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/03/01 12:52:19 INFO mapred.JobClient: FileSystemCounters 12/03/01 12:52:19 INFO mapred.JobClient: HDFS_BYTES_READ=1239 12/03/01 12:52:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=55964 12/03/01 12:52:19 INFO mapred.JobClient: Map-Reduce Framework 12/03/01 12:52:19 INFO mapred.JobClient: Map input records=16 12/03/01 12:52:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=123195392 12/03/01 12:52:19 INFO mapred.JobClient: Spilled Records=0 12/03/01 12:52:19 INFO mapred.JobClient: CPU time spent (ms)=680 12/03/01 12:52:19 INFO mapred.JobClient: Total committed heap usage (bytes)=505348096 12/03/01 12:52:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1457479680 12/03/01 12:52:19 INFO mapred.JobClient: Map output records=16 12/03/01 12:52:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=114 12/03/01 12:52:19 INFO mapreduce.ExportJobBase: Transferred 1.21 KB in 7.2897 seconds (169.9657 bytes/sec) 12/03/01 12:52:19 INFO mapreduce.ExportJobBase: Exported 16 records.
hdfs文件分隔符的问题还是要注意的,mysql默认分隔符是逗号,如果hdfs文件分割符是其它的需要制定参数 ,如:--input-fields-terminated-by '\001'
相关文章推荐
- 使用sqoop做桥梁实现oracle和hdfs之前数据互导(问题和心得)
- Sqoop的一些使用心得
- Sqoop使用的心得<01>
- sqoop 使用心得(sqoop增量倒入)
- Google Closure Complier使用心得与文档
- 使用IBatis开发心得
- jQuery中 trigger() 使用心得
- perforce使用心得
- 使用Google Adsense的一点心得
- AT4RE Patcher v0.6.3使用心得:
- Android ORM 框架之 greenDAO 使用心得
- VIM 使用心得
- ECharts使用心得
- EasyJWeb的验证系统的Bug及使用心得
- Jenkins 使用心得
- IntelliJ IDEA 使用心得与常用快捷键
- 【C#】对于ListView控件使用的个人心得
- CC2540串口使用小心得
- Android ORM 框架之 greenDAO 使用心得
- IntelliJ IDEA 使用心得与常用快捷键