sqoop数据迁移(基于Hadoop和关系数据库服务器之间传送数据)
2017-12-15 14:15
681 查看
1:sqoop的概述:
(1):sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。
(2):导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;
(3):导出数据:从Hadoop的文件系统中导出数据到关系数据库
(4):工作机制:
将导入或导出命令翻译成mapreduce程序来实现;
在翻译出的mapreduce中主要是对inputformat和outputformat进行定制;
(5):Sqoop的原理:
Sqoop的原理其实就是将导入导出命令转化为mapreduce程序来执行,sqoop在接收到命令后,都要生成mapreduce程序;
使用sqoop的代码生成工具可以方便查看到sqoop所生成的java代码,并可在此基础之上进行深入定制开发;
2:sqoop安装:
安装sqoop的前提是已经具备java和hadoop的环境;
第一步:下载并解压,下载以后,上传到自己的虚拟机上面,过程省略,然后解压缩操作:
最新版下载地址:http://ftp.wayne.edu/apache/sqoop/1.4.6/
Sqoop的官方网址:http://sqoop.apache.org/
[root@master package]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /home/hadoop/
第二步:修改配置文件:
可以修改一下sqoop的名称,因为解压缩的太长了。然后你也可以配置sqoop的环境变量,这样可以方便访问;
[root@master hadoop]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop
配置Sqoop的环境变量操作,如下所示:
[root@master hadoop]# vim /etc/profile
[root@master hadoop]# source /etc/profile
修改sqoop的配置文件名称如下所示:
[root@master hadoop]# cd $SQOOP_HOME/conf
[root@master conf]# ls
oraoop-site-template.xml sqoop-env-template.sh sqoop-site.xml
sqoop-env-template.cmd sqoop-site-template.xml
[root@master conf]# mv sqoop-env-template.sh sqoop-env.sh
修改sqoop的配置文件如下所示,打开sqoop-env.sh并编辑下面几行(根据需求,可以修改hadoop,hive,hbase的配置文件):
第三步:加入mysql的jdbc驱动包(自己必须提前将mysql的jar包上传到虚拟机上面):
[root@master package]# cp mysql-connector-java-5.1.28.jar $SQOOP_HOME/lib/
第四步:验证启动,如下所示(由于未配置$HBASE_HOME等等这些的配置,所以发出Warning,不是Error):
[root@master conf]# cd $SQOOP_HOME/bin
到这里,整个Sqoop安装工作完成。下面可以尽情的和Sqoop玩耍了。
3:Sqoop的数据导入:
“导入工具”导入单个表从RDBMS到HDFS。表中的每一行被视为HDFS的记录。所有记录都存储为文本文件的文本数据(或者Avro、sequence文件等二进制数据)
下面的语法用于将数据导入HDFS。
导入表表数据到HDFS
下面的命令用于从MySQL数据库服务器中的emp表导入HDFS。
或者使用下面的命令进行导入:
开始将mysql的数据导入到sqoop的时候出现下面的错误,贴一下,希望可以帮到看到的人:
出现上面错误的原因是因为你的集群没有开,start-dfs.sh和start-yarn.sh开启你的集群即可:
正常运行如下所示:
为了验证在HDFS导入的数据,请使用以下命令查看导入的数据,如下所示:
总之,遇到很多问题,当我我没有指定导入到的目录的时候,我去hdfs查看的时候竟然没有我导入的mysql数据表。郁闷。先记录一下。如果查看成功的话,数据表的数据和字段之间用逗号(,)表示。
4:导入关系表到HIVE:
5:导入到HDFS指定目录:
在导入表数据到HDFS使用Sqoop导入工具,我们可以指定目标目录。
以下是指定目标目录选项的Sqoop导入命令的语法。
如果你指定的目录存在的话,将会报如下的错误:
导入到指定的目录,如果成功的话,会显示出你的mysql数据表的数据,字段之间以逗号分隔。
使用如下的命令是用来验证 /sqoop 目录中 emp数据表导入的数据形式。它会用逗号,分隔emp数据表的数据和字段。
6:导入表数据子集:
我们可以导入表的使用Sqoop导入工具,"where"子句的一个子集。它执行在各自的数据库服务器相应的SQL查询,并将结果存储在HDFS的目标目录。
where子句的语法如下。
7:sqoop的按需导入:
可以如上面演示使用命令用来验证数据从emp数据表导入/sqoop03 目录
它用逗号'\t'分隔 emp数据表数据和字段。
7:sqoop的增量导入,增量导入是仅导入新添加的表中的行的技术:
它需要添加‘incremental’, ‘check-column’, 和 ‘last-value’选项来执行增量导入。
下面的语法用于Sqoop导入命令增量选项。
8:Sqoop的数据导出:
将数据从HDFS导出到RDBMS数据库,导出前,目标表必须存在于目标数据库中。默认操作是从将文件中的数据使用INSERT语句插入到表中,更新模式下,是生成UPDATE语句更新表数据;
具体操作如下所示:
首先,数据是在HDFS 中"/sqoop"目录的part-m-00000文件中。
第一步:首先需要手动创建mysql中的目标表:
如果不事先创建好数据表会报如下错误:
第二步:然后执行导出命令(如果localhost不好使,可以换成ip试一下):
第三步:验证表mysql命令行(或者使用图形化工具查看是否有数据导入):
mysql>select * from emp;
如果给定的数据存储成功,那么可以找到符合的数据。
待续......
(1):sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。
(2):导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;
(3):导出数据:从Hadoop的文件系统中导出数据到关系数据库
(4):工作机制:
将导入或导出命令翻译成mapreduce程序来实现;
在翻译出的mapreduce中主要是对inputformat和outputformat进行定制;
(5):Sqoop的原理:
Sqoop的原理其实就是将导入导出命令转化为mapreduce程序来执行,sqoop在接收到命令后,都要生成mapreduce程序;
使用sqoop的代码生成工具可以方便查看到sqoop所生成的java代码,并可在此基础之上进行深入定制开发;
2:sqoop安装:
安装sqoop的前提是已经具备java和hadoop的环境;
第一步:下载并解压,下载以后,上传到自己的虚拟机上面,过程省略,然后解压缩操作:
最新版下载地址:http://ftp.wayne.edu/apache/sqoop/1.4.6/
Sqoop的官方网址:http://sqoop.apache.org/
[root@master package]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /home/hadoop/
第二步:修改配置文件:
可以修改一下sqoop的名称,因为解压缩的太长了。然后你也可以配置sqoop的环境变量,这样可以方便访问;
[root@master hadoop]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop
配置Sqoop的环境变量操作,如下所示:
[root@master hadoop]# vim /etc/profile
[root@master hadoop]# source /etc/profile
修改sqoop的配置文件名称如下所示:
[root@master hadoop]# cd $SQOOP_HOME/conf
[root@master conf]# ls
oraoop-site-template.xml sqoop-env-template.sh sqoop-site.xml
sqoop-env-template.cmd sqoop-site-template.xml
[root@master conf]# mv sqoop-env-template.sh sqoop-env.sh
修改sqoop的配置文件如下所示,打开sqoop-env.sh并编辑下面几行(根据需求,可以修改hadoop,hive,hbase的配置文件):
第三步:加入mysql的jdbc驱动包(自己必须提前将mysql的jar包上传到虚拟机上面):
[root@master package]# cp mysql-connector-java-5.1.28.jar $SQOOP_HOME/lib/
第四步:验证启动,如下所示(由于未配置$HBASE_HOME等等这些的配置,所以发出Warning,不是Error):
[root@master conf]# cd $SQOOP_HOME/bin
[root@master bin]# sqoop-version Warning: /home/hadoop/soft/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/soft/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/soft/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 17/12/14 04:29:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 Sqoop 1.4.6 git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25 Compiled by root on Mon Apr 27 14:38:36 CST 2015
到这里,整个Sqoop安装工作完成。下面可以尽情的和Sqoop玩耍了。
3:Sqoop的数据导入:
“导入工具”导入单个表从RDBMS到HDFS。表中的每一行被视为HDFS的记录。所有记录都存储为文本文件的文本数据(或者Avro、sequence文件等二进制数据)
下面的语法用于将数据导入HDFS。
$ sqoop import (generic-args) (import-args)
导入表表数据到HDFS
下面的命令用于从MySQL数据库服务器中的emp表导入HDFS。
$bin/sqoop import \ --connect jdbc:mysql://localhost:3306/test \ #指定主机名称和数据库 --username root \ #mysql的账号 --password root \ #mysql的密码 --table emp \ #导入的数据表 --m 1 #多少个mapreduce跑,这里是一个mapreduce
或者使用下面的命令进行导入:
[root@master sqoop]# bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table emp --m 1
开始将mysql的数据导入到sqoop的时候出现下面的错误,贴一下,希望可以帮到看到的人:
[root@master sqoop]# bin/sqoop import \ > --connect jdbc:mysql://localhost:3306/test \ > --username root \ > --password 123456 \ > --table emp \ > --m 1 Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 17/12/15 10:37:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/15 10:37:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/15 10:37:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/15 10:37:21 INFO tool.CodeGenTool: Beginning code generation 17/12/15 10:37:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 17/12/15 10:37:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 17/12/15 10:37:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.4.1 Note: /tmp/sqoop-root/compile/2df9072831c26203712cd4da683c50d9/emp.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/12/15 10:37:46 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/2df9072831c26203712cd4da683c50d9/emp.jar 17/12/15 10:37:46 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/12/15 10:37:46 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/12/15 10:37:46 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/12/15 10:37:46 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/12/15 10:37:46 INFO mapreduce.ImportJobBase: Beginning import of emp 17/12/15 10:37:48 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/12/15 10:37:52 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/15 10:37:54 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.3.129:8032 17/12/15 10:37:56 ERROR tool.ImportTool: Encountered IOException running import job: java.net.ConnectException: Call From master/192.168.3.129 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:699) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1762) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196) at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 40 more [root@master sqoop]# service iptables status iptables: Firewall is not running. [root@master sqoop]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.3.129 master 192.168.3.130 slaver1 192.168.3.131 slaver2 192.168.3.132 slaver3 192.168.3.133 slaver4 192.168.3.134 slaver5 192.168.3.135 slaver6 192.168.3.136 slaver7 [root@master sqoop]# jps 2813 Jps [root@master sqoop]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table emp --m 1 Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 17/12/15 10:42:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/12/15 10:42:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/12/15 10:42:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/12/15 10:42:13 INFO tool.CodeGenTool: Beginning code generation 17/12/15 10:42:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 17/12/15 10:42:14 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1 17/12/15 10:42:14 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.4.1 Note: /tmp/sqoop-root/compile/3748127f0b101bfa0fd892963bea25dd/emp.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/12/15 10:42:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/3748127f0b101bfa0fd892963bea25dd/emp.jar 17/12/15 10:42:15 WARN manager.MySQLManager: It looks like you are importing from mysql. 17/12/15 10:42:15 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 17/12/15 10:42:15 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 17/12/15 10:42:15 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 17/12/15 10:42:15 INFO mapreduce.ImportJobBase: Beginning import of emp 17/12/15 10:42:15 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/12/15 10:42:15 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/12/15 10:42:15 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.3.129:8032 17/12/15 10:42:15 ERROR tool.ImportTool: Encountered IOException running import job: java.net.ConnectException: Call From master/192.168.3.129 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:699) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1762) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196) at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 40 more [root@master sqoop]#
出现上面错误的原因是因为你的集群没有开,start-dfs.sh和start-yarn.sh开启你的集群即可:
正常运行如下所示:
[root@master sqoop]# bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table emp --m 1
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/12/15 10:49:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/12/15 10:49:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/12/15 10:49:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/12/15 10:49:34 INFO tool.CodeGenTool: Beginning code generation
17/12/15 10:49:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
17/12/15 10:49:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
17/12/15 10:49:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.4.1
Note: /tmp/sqoop-root/compile/60ee51250c5c6b5f5598392e068ce2d0/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/12/15 10:49:41 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/60ee51250c5c6b5f5598392e068ce2d0/emp.jar
17/12/15 10:49:41 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/12/15 10:49:41 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/12/15 10:49:41 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/12/15 10:49:41 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/12/15 10:49:41 INFO mapreduce.ImportJobBase: Beginning import of emp
17/12/15 10:49:42 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/12/15 10:49:44 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/12/15 10:49:44 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.3.129:8032
17/12/15 10:49:52 INFO db.DBInputFormat: Using read commited transaction isolation
17/12/15 10:49:53 INFO mapreduce.JobSubmitter: number of splits:1
17/12/15 10:49:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1513306156301_0001
17/12/15 10:49:55 INFO impl.YarnClientImpl: Submitted application application_1513306156301_0001
17/12/15 10:49:56 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1513306156301_0001/ 17/12/15 10:49:56 INFO mapreduce.Job: Running job: job_1513306156301_0001
17/12/15 10:50:42 INFO mapreduce.Job: Job job_1513306156301_0001 running in uber mode : false
17/12/15 10:50:42 INFO mapreduce.Job: map 0% reduce 0%
17/12/15 10:51:12 INFO mapreduce.Job: map 100% reduce 0%
17/12/15 10:51:13 INFO mapreduce.Job: Job job_1513306156301_0001 completed successfully
17/12/15 10:51:14 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=110675
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=46
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=25363
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=25363
Total vcore-seconds taken by all map tasks=25363
Total megabyte-seconds taken by all map tasks=25971712
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=151
CPU time spent (ms)=730
Physical memory (bytes) snapshot=49471488
Virtual memory (bytes) snapshot=365268992
Total committed heap usage (bytes)=11317248
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=46
17/12/15 10:51:14 INFO mapreduce.ImportJobBase: Transferred 46 bytes in 90.4093 seconds (0.5088 bytes/sec)
17/12/15 10:51:14 INFO mapreduce.ImportJobBase: Retrieved 3 records.
[root@master sqoop]#
为了验证在HDFS导入的数据,请使用以下命令查看导入的数据,如下所示:
总之,遇到很多问题,当我我没有指定导入到的目录的时候,我去hdfs查看的时候竟然没有我导入的mysql数据表。郁闷。先记录一下。如果查看成功的话,数据表的数据和字段之间用逗号(,)表示。
4:导入关系表到HIVE:
bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table emp --hive-import --m 1
5:导入到HDFS指定目录:
在导入表数据到HDFS使用Sqoop导入工具,我们可以指定目标目录。
以下是指定目标目录选项的Sqoop导入命令的语法。
--target-dir <new or exist directory in HDFS>
下面的命令是用来导入emp表数据到'/sqoop'目录。 bin/sqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --target-dir /sqoop \ --table emp \ --m 1 [root@master sqoop]# bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --target-dir /sqoop --table emp --m 1
如果你指定的目录存在的话,将会报如下的错误:
17/12/15 11:13:52 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://master:9000/wordcount already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196) at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
导入到指定的目录,如果成功的话,会显示出你的mysql数据表的数据,字段之间以逗号分隔。
使用如下的命令是用来验证 /sqoop 目录中 emp数据表导入的数据形式。它会用逗号,分隔emp数据表的数据和字段。
[root@master sqoop]# hadoop fs -cat /sqoop/part-m-00000 1,zhangsan,22,yi 2,lisi,23,er 3,wangwu,24,san [root@master sqoop]#
6:导入表数据子集:
我们可以导入表的使用Sqoop导入工具,"where"子句的一个子集。它执行在各自的数据库服务器相应的SQL查询,并将结果存储在HDFS的目标目录。
where子句的语法如下。
--where <condition>
#下面的命令用来导入emp表数据的子集。子集查询检索员工ID和地址,居住城市为:city; #方式一 bin/sqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --where "city ='zhengzhou'" \ --target-dir /sqoop02 \ --table emp \ --m 1 #方式二 [root@master sqoop]# bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --where "city ='zhengzhou'" --target-dir /sqoop02 --table emp --m 1
7:sqoop的按需导入:
可以如上面演示使用命令用来验证数据从emp数据表导入/sqoop03 目录
它用逗号'\t'分隔 emp数据表数据和字段。
#sqoop按需导入 #方式一 bin/sqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ #mysql的账号 --password 123456 \ #mysql的密码 --target-dir /sqoop03 \ #指定存放的目录 --query 'select id,name,age,dept from emp WHERE id>0 and $CONDITIONS' \ #灵活的mysql语句,且必须加and $CONDITIONS' --split-by id \ #按照那个字段做切片,id做切片 --fields-terminated-by '\t' \ #导入到我的文件系统中为'\t',默认为逗号。 --m 1 #方式二 [root@master sqoop]# bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --target-dir /sqoop03 --query 'select id,name,age,dept from emp WHERE id>0 and $CONDITIONS' --split-by id --fields-terminated-by '\t' --m 1
7:sqoop的增量导入,增量导入是仅导入新添加的表中的行的技术:
它需要添加‘incremental’, ‘check-column’, 和 ‘last-value’选项来执行增量导入。
下面的语法用于Sqoop导入命令增量选项。
--incremental <mode> --check-column <column name> --last value <last check column value>
下面的命令用于在EMP表执行增量导入。 #方式一 bin/sqoop import \ --connect jdbc:mysql://localhost:3306/test \ --username root \ --password 123456 \ --table emp \ --m 1 \ --incremental append \ #追加导入 --check-column id \ #根据id字段来判断从哪里开始导入 --last-value 6 #根据id字段,从7开始导入 #方式二 [root@master sqoop]# bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --m 1 --incremental append --check-column id --last-value 6
8:Sqoop的数据导出:
将数据从HDFS导出到RDBMS数据库,导出前,目标表必须存在于目标数据库中。默认操作是从将文件中的数据使用INSERT语句插入到表中,更新模式下,是生成UPDATE语句更新表数据;
以下是export命令语法。 $ sqoop export (generic-args) (export-args)
具体操作如下所示:
首先,数据是在HDFS 中"/sqoop"目录的part-m-00000文件中。
第一步:首先需要手动创建mysql中的目标表:
[root@master hadoop]# mysql -uroot -p123456 mysql> show databases; mysql> use test; mysql> CREATE TABLE `emp` ( -> `id` int(11) NOT NULL AUTO_INCREMENT, -> `name` varchar(255) DEFAULT NULL, -> `age` int(11) DEFAULT NULL, -> `dept` varchar(255) DEFAULT NULL, -> PRIMARY KEY (`id`) -> ) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
如果不事先创建好数据表会报如下错误:
17/12/15 13:56:21 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'test.emp' doesn't exist com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'test.emp' doesn't exist at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4237) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4169) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2617) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2825) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2156) at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2313) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:758) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:227) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107) at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64) at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
第二步:然后执行导出命令(如果localhost不好使,可以换成ip试一下):
#方式一 bin/sqoop export \ --connect jdbc:mysql://192.168.3.129:3306/test \ --username root \ --password 123456 \ --table emp \ #导入的数据表 --export-dir /sqoop/ #导出的目录 #方式二 bin/sqoop export --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table emp --export-dir /sqoop
第三步:验证表mysql命令行(或者使用图形化工具查看是否有数据导入):
mysql>select * from emp;
如果给定的数据存储成功,那么可以找到符合的数据。
待续......
相关文章推荐
- sqoop数据迁移(基于Hadoop和关系数据库服务器之间传送数据)
- <关于数据仓库>基于docker的Mysql与Hadoop/Hive之间的数据转移 (使用Apache Sqoop™)
- Hadoop平台上用Sqoop在Hive和DB2数据库之间传输数据的实践和总结
- hadoop 集群 远程访问 mysql(通过sqoop从远程数据库服务器向hdfs迁移数据) 屡次失败的原因
- 两个FTP服务器之间传送数据
- Hadoop Hive概念学习系列之HDFS、Hive、MySQL、Sqoop之间的数据导入导出(强烈建议去看)(十八)
- 使用sqoop 在关系型数据库和Hadoop之间实现数据的抽取
- Hadoop和关系数据库之间的数据流通
- Sqoop在Hadoop和关系型数据库之间的数据转移
- Sqoop-1.4.6安装配置及Mysql->HDFS->Hive数据导入(基于Hadoop2.7.3)
- [hadoop读书笔记] 第十五章 sqoop1.4.6小实验 - 数据在mysq和hdfs之间的相互转换
- 两个FTP服务器之间传送数据
- 两个FTP服务器之间传送数据
- mysql数据与Hadoop之间导入导出之Sqoop实例
- 基于Sqoop和Hadoop的数据质量分析报告
- 开始玩hadoop14 ---从sql 关系数据库和 hadoop 之间的数据交换---Sqoop
- 大数据 (十五)Hadoop-MR编程 -- 【使用hadoop计算网页之间的PageRank值----编程】
- 阿里云服务器上单机部署大数据开发环境(hadoop2.6-cdh5.8.0系列)
- 基于Hadoop的地震数据分析统计
- 不同服务器数据库之间的数据操作