Hive基于MySQL保存元数据的安装
2016-08-28 14:09
429 查看
Hive下载
Hive官方网站:http://hive.apache.org/
Hive官方下载:http://hive.apache.org/downloads.html
Apache归档:Apache Software Foundation Distribution Directory
本次下载版本:apache-hive-0.13.1-bin.tar.gz
解压Hive
配置Hive
编辑hive-env.sh修改如下两行代码
验证Hive
运行Hive之前,先启动Hadoop,需要在HDFS上创建/tmp和/user/hive/warehouse文件夹,并需要给新创建的文件夹写权限,如下代码所示:
至此Hive内嵌模式已经安装完成,如下命令来验证hive安装:
如下信息表示Hive内嵌模式安装成功。
MySQL保存元数据
下载MySQL源
Hive官方网站:http://hive.apache.org/
Hive官方下载:http://hive.apache.org/downloads.html
Apache归档:Apache Software Foundation Distribution Directory
本次下载版本:apache-hive-0.13.1-bin.tar.gz
解压Hive
$ tar zxvf apache-hive-0.13.1-bin.tar.gz -C /opt/modules/ $ cd /opt/modules/ $ mv apache-hive-0.13.1-bin/ hive-0.13.1
配置Hive
$ cd /opt/modules/hive-0.13.1/conf $ cp hive-env.sh.template hive-env.sh
编辑hive-env.sh修改如下两行代码
$ vim hive-env.sh # Set HADOOP_HOME to point to a specific hadoop install directory HADOOP_HOME=/opt/modules/hadoop-2.5.0 # Hive Configuration Directory can be controlled by: export HIVE_CONF_DIR=/opt/modules/hive-0.13.1/conf
验证Hive
运行Hive之前,先启动Hadoop,需要在HDFS上创建/tmp和/user/hive/warehouse文件夹,并需要给新创建的文件夹写权限,如下代码所示:
$ cd /opt/modules/hadoop-2.5.0/ $ bin/hdfs dfs -mkdir /tmp $ bin/hdfs dfs -mkdir -p /user/hive/warehouse $ bin/hdfs dfs -chmod g+w /tmp $ bin/hdfs dfs -chmod g+w /user/hive/warehouse
至此Hive内嵌模式已经安装完成,如下命令来验证hive安装:
$ cd /opt/modules/hive-0.13.1/ $ bin/hive
如下信息表示Hive内嵌模式安装成功。
Logging initialized using configuration in jar:file:/opt/modules/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties hive> show databases; OK default Time taken: 0.576 seconds, Fetched: 1 row(s)
MySQL保存元数据
下载MySQL源
$ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm[/code]
安装mysql-community-release-el7-5.noarch.rpm包$ sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm
安装mysql$ sudo yum install -y mysql-server
启动MySQL$ sudo service mysqld start
配置MySQL开机启动$ sudo chkconfig mysqld on
设置MySQL root密码$ mysqladmin -u root password 'hive'
登录MySQL$ mysql -uroot -p
配置远程登录mysql> grant all privileges on *.* to 'root'@'%' identified by 'hive' with grant option;
删除原用户信息mysql> use mysql mysql> delete from user where host='localhost' and user='root';
最后只剩如下root记录mysql> select host, user, password from user; +------+------+-------------------------------------------+ | host | user | password | +------+------+-------------------------------------------+ | % | root | *4DF1D66463C18D44E3B001A8FB1BBFBEA13E27FC | +------+------+-------------------------------------------+
重启MySQL服务mysql> quit; $ sudo service mysqld restart
配置Hive使用MySQL存储$ cd /opt/modules/hive-0.13.1/ $ cp conf/hive-default.xml.template conf/hive-site.xml
修改hive-site.xml文件$ vim conf/hive-site.xml <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop01.malone.com:3306/metastore?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> </configuration>
导入MySQL驱动包$ mv mysql-connector-java-5.1.27-bin.jar /opt/modules/hive-0.13.1/lib/
HQL语句测试$ cd /opt/modules/hive-0.13.1/ $ bin/hive
hive> show databases;
OK
default
Time taken: 1.418 seconds, Fetched: 1 row(s)
hive> create database if not exists hive_testdb;
OK
Time taken: 1.084 seconds
hive> use hive_testdb;
OK
Time taken: 0.027 seconds
hive> show tables;
OK
Time taken: 0.029 seconds
hive> create table employee(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 1.542 seconds
hive> load data local inpath '/opt/datas/hive/employee.txt' into table employee;
Copying data from file:/opt/datas/hive/employee.txt
Copying file: file:/opt/datas/hive/employee.txt
Loading data to table hive_testdb.employee
Table hive_testdb.employee stats: [numFiles=1, numRows=0, totalSize=52, rawDataSize=0]
OK
Time taken: 1.939 seconds
hive> desc employee;
OK
id int
name string
Time taken: 0.185 seconds, Fetched: 2 row(s)
hive> desc extended employee;
OK
id int
name string
Detailed Table Information Table(tableName:employee, dbName:hive_testdb, owner:hadoop, createTime:1472398263, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://hadoop01.malone.com:8020/user/hive/warehouse/hive_testdb.db/employee, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim=
Time taken: 0.161 seconds, Fetched: 4 row(s)
hive> desc formatted employee;
OK
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: hive_testdb
Owner: hadoop
CreateTime: Sun Aug 28 23:31:03 CST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop01.malone.com:8020/user/hive/warehouse/hive_testdb.db/employee
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 0
rawDataSize 0
totalSize 52
transient_lastDdlTime 1472398294
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.264 seconds, Fetched: 33 row(s)
hive> select * from employee;
OK
1 burce.lee
2 jacky.chen
3 elbert.malone
4 andy.lau
Time taken: 0.817 seconds, Fetched: 4 row(s)
hive> select id from employee;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1472391663133_0001, Tracking URL = http://hadoop01.malone.com:8088/proxy/application_1472391663133_0001/ Kill Command = /opt/modules/hadoop-2.5.0/bin/hadoop job -kill job_1472391663133_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-08-28 23:35:16,716 Stage-1 map = 0%, reduce = 0%
2016-08-28 23:35:50,749 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.84 sec
MapReduce Total cumulative CPU time: 1 seconds 840 msec
Ended Job = job_1472391663133_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.84 sec HDFS Read: 294 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 840 msec
OK
1
2
3
4
Time taken: 86.453 seconds, Fetched: 4 row(s)
Hive常用属性配置
cli命令行显示数据库名称和列标题名称$ cd /opt/modules/hive-0.13.1/ $ vim conf/hive-site.xml
新增如下配置信息<property> <name>hive.cli.print.header</name> <value>true</value> <description>Whether to print the names of the columns in query output.</description> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> <description>Whether to include the current database in the Hive prompt.</description> </property>
修改后的效果$ bin/hive Logging initialized using configuration in jar:file:/opt/modules/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties hive (default)> show databases; OK database_name default hive_testdb Time taken: 0.768 seconds, Fetched: 2 row(s) hive (default)> use hive_testdb; OK Time taken: 0.028 seconds hive (hive_testdb)> show tables; OK tab_name employee Time taken: 0.063 seconds, Fetched: 1 row(s) hive (hive_testdb)> select * from employee; OK employee.id employee.name 1 burce.lee 2 jacky.chen 3 elbert.malone 4 andy.lau Time taken: 0.917 seconds, Fetched: 4 row(s)
配置Hive的日志信息$ cd /opt/modules/hive-0.13.1/conf $ cp hive-log4j.properties.template hive-log4j.properties $ vim hive-log4j.properties
修改如下信息# Define some default values that can be overridden by system properties hive.log.threshold=ALL hive.root.logger=INFO,DRFA hive.log.dir=/opt/modules/hive-0.13.1/logs hive.log.file=hive.log
相关文章推荐
- 基于hadoop2.6.0,以mysql为元数据的hive 1.1.0安装初体验
- hive 安装mysql作为元数据
- Hive1.1安装配置,基于最小安装的CentOS7、hadoop2.6、MySQL
- Sqoop-1.4.6安装配置及Mysql->HDFS->Hive数据导入(基于Hadoop2.7.3)
- hive-1.1.0-cdh5.7.0 的编译安装并修改元数据存储数据库为MySQL
- 基于hadoop集群的hive 安装(mysql,derby)
- HIVE 安装系列(3)配置HIVE 使用Mysql作为元数据的数据库
- hive安装过程:metastore(元数据存储)的三种方式之远端mysql方式
- hive 安装mysql作为元数据
- HIVE安装系列之二:配置HIVE(用Mysql作为元数据仓库)
- hive 安装mysql作为元数据
- hive安装过程:metastore(元数据存储)的三种方式之本地mysql方式
- ambari安装hive以mysql作为元数据存储建表失败的解决办法
- centos6.8平台上安装hive(基于Mysql6.5 和hadoop2.7.3伪分布集群下)
- Hive安装过程(mysql/oracle存储元数据)
- hive 安装mysql作为元数据
- hive用mysql保存元数据(metadata)
- HIVE 通过 MYSQL 保存元数据
- 基于hadoop集群的hive 安装(mysql,derby)
- mysql5.7.18安装、Hive2.1.1安装和配置(基于Hadoop2.7.3集群)