3.1 环境准备,centos6.x或7.x系统,要能联网(推荐nat方式),关闭防火墙,安装软件:
yum install -y svn ncurses-devel gcc* lzo-devel zlib-devel autoconf automake libtool cmake openssl-devel  
安装jdk,这个大家随意,1.7+的都可以,最好配置下全局环境变量 export JAVA_HOME=xxxxz
  1. <mirror>
  2. <id>nexus-osc</id>
  3. <mirrorOf>central</mirrorOf>
  4. <name>Nexus osc</name>
  5. <url>https://repo.maven.apache.org/maven2</url>
  6. </mirror>
  7. <!--注意之前的oschina已经关闭了,编译时会报警告连接失败
  8. <mirror>
  9. <id>CN</id>
  10. <mirrorOf>central</mirrorOf>
  11. <name>OSChina Central</name>
  12. <url>http://maven.oschina.net/content/groups/public/</url>
  13. </mirror>
  14. -->
 最好配置下全局环境变量vi /etc/profile  export MAVEN_HOME=xxxxx
 下载hive源码包:http://mirrors.cnnic.cn/apache/hive/hive-2.1.0/  既然是学习,建议用最高版,踩坑既收获
 mvn clean install  -Pdist -DskipTests -Dhadoop-23.version=2.7.3 -Dspark.version=2.0.2
 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project spark-client: Compilation failure: Compilation failure:
  1. [ERROR] /opt/modules/apache-hive-2.1.0-src/spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java:[46,24] cannot find symbol
  2. [ERROR] symbol: class JavaSparkListener
  3. [ERROR] location: package org.apache.spark
  4. [ERROR] /opt/modules/apache-hive-2.1.0-src/spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java:[444,40] cannot find symbol
  5. [ERROR] symbol: class JavaSparkListener
  6. [ERROR] location: class org.apache.hive.spark.client.RemoteDriver
  7. [ERROR] -> [Help 1]
  8. [ERROR]
  9. [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
  10. [ERROR] Re-run Maven using the -X switch to enable full debug logging.
  11. [ERROR]
  12. [ERROR] For more information about the errors and possible solutions, please read the following articles:
  13. [ERROR] [Help 1] http://cwiki.apache.org/confluen ... ojoFailureException
  14. [ERROR]
  15. [ERROR] After correcting the problems, you can resume the build with the command
  16. [ERROR] mvn <goals> -rf :spark-client
最后生成的tar包在packaging/target里   2.2.0名字长这样:apache-hive-2.2.0-SNAPSHOT-bin.tar.gz


  1. tar -zxvf apache-hive-2.2.0-SNAPSHOT-bin.tar.gz
  2. cd apache-hive-2.2.0-SNAPSHOT-bin/conf
  3. cp hive-default.xml.template hive-site.xml
  4. vi hive-site.xml
  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
  3. Licensed to the Apache Software Foundation (ASF) under one or more
  4. contributor license agreements. See the NOTICE file distributed with
  5. this work for additional information regarding copyright ownership.
  6. The ASF licenses this file to You under the Apache License, Version 2.0
  7. (the "License"); you may not use this file except in compliance with
  8. the License. You may obtain a copy of the License at
  9. http://www.apache.org/licenses/LICENSE-2.0
  10. Unless required by applicable law or agreed to in writing, software
  11. distributed under the License is distributed on an "AS IS" BASIS,
  12. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. See the License for the specific language governing permissions and
  14. limitations under the License.
  15. --><configuration>
  16. <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  17. <!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
  18. <!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
  19. <!-- Hive Execution Parameters -->
  20. <property>
  21. <name>hive.metastore.schema.verification</name>
  22. <value>true</value>
  23. </property>
  24. <property>
  25. <name>hive.server2.long.polling.timeout</name>
  26. <value>5000</value>
  27. <description>Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling</description>
  28. </property>
  29. <property>
  30. <name>hive.server2.thrift.bind.host</name>
  31. <value>com.cloudera.archive.slave02</value>
  32. <description>Bind host on which to run the HiveServer2 Thrift interface.
  33. Can be overridden by setting $HIVE_SERVER2_THRIFT_BIND_HOST</description>
  34. </property>
  35. <property>
  36. <name>hive.hiveserver2.thrift.port</name>
  37. <value>10000</value>
  38. </property>
  39. <property>
  40. <name>javax.jdo.option.ConnectionURL</name>
  41. <value>jdbc:mysql://com.cloudera.archive.slave02:3306/hive22db?createDatabaseIfNotExist=true</value>
  42. </property>
  43. <property>
  44. <name>javax.jdo.option.ConnectionDriverName</name>
  45. <value>com.mysql.jdbc.Driver</value>
  46. </property>
  47. <property>
  48. <name>javax.jdo.option.ConnectionUserName</name>
  49. <value>root</value>
  50. </property>
  51. <property>
  52. <name>javax.jdo.option.ConnectionPassword</name>
  53. <value>123456</value>
  54. </property>
  55. <property>
  56. <name>hive.metastore.warehouse.dir</name>
  57. <value>/user/hive/warehouse</value>
  58. </property>
  59. <property>
  60. <name>hive.cli.print.current.db</name>
  61. <value>true</value>
  62. </property>
  63. <property>
  64. <name>hive.cli.print.header</name>
  65. <value>true</value>
  66. </property>
  67. <!-- 输出压缩 -->
  68. <property>
  69. <name>hive.exec.compress.output</name>
  70. <value>true</value>
  71. </property>
  72. <!-- 中间压缩,在shuffle阶段会快一些 -->
  73. <property>
  74. <name>hive.exec.compress.intermediate</name>
  75. <value>true</value>
  76. </property>
  77. <!-- mapreduce接触时合并小文件 default false -->
  78. <property>
  79. <name>hive.merge.mapredfiles</name>
  80. <value>true</value>
  81. </property>
  82. <!-- This flag should be set to true to enable vectorized mode of query execution. The default value is false. -->
  83. <property>
  84. <name>hive.vectorized.execution.enabled</name>
  85. <value>true</value>
  86. </property>
  87. <!-- Whether to execute jobs in parallel. Applies to MapReduce jobs that can run in parallel, for example jobs processing different source tables before a join. As of Hive 0.14, also applies to move tasks that can run in parallel, for example moving files to insert targets during multi-insert -->
  88. <property>
  89. <name>hive.exec.parallel</name>
  90. <value>true</value>
  91. </property>
  92. <!-- Chooses execution engine. Options are: mr (Map reduce, default), tez (Tez execution, for Hadoop 2 only), or spark (Spark execution, for Hive 1.1.0 onward). -->
  93. <property>
  94. <name>hive.execution.engine</name>
  95. <value>tez</value>
  96. </property>
  97. </configuration>
  1. vi hvie-env.sh 添加一下hadoophive的相关配置
  1. export HADOOP_HOME=/opt/modules/hadoop-2.7.3
  2. export HADOOP_CONF_DIR=/opt/modules/hadoop-2.7.3/etc/hadoop
  3. export HIVE_CONF_DIR=/opt/modules/apache-hive-2.2.0-SNAPSHOT-bin/conf

  1. yum install -y mysql-server
  2. /usr/bin/mysqladmin -uroot password '123456'(mysqladmin -u root password 123456)
  3. vi /etc/my.cnf 加上default-character-set=utf8
  4. chkconfig mysqld on
  5. chkconfig mysqld --list
  6. mysqld 0:off 1:off 2:on 3:on 4:on 5:on 6:off 2345on表示ok,开机自启
  7. service mysqld start


  1. mysql> create database hive;
  2. 设置hive字符集
  3. mysql> alter database hive character set latin1;
  4. 给机器的root用户放开权限
  5. mysql> grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;
  6. mysql> grant all privileges on *.* to 'root'@'hostname' identified by '123456' with grant option;
  7. mysql> grant all privileges on *.* to 'root'@'ip' identified by '123456' with grant option;
  8. mysql> flush privileges;


  1. 使用yum安装mysql connector
  2. yum install -y mysql-connector-java
  3. cp /usr/share/java/mysql-connector-java-5.1.17.jar /usr/local/hive/lib


  1. hive (default)> set hive.execution.engine=mr;
  2. Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.


  1. which: no hbase in xxx
  2. WARNING: Hive CLI is deprecated and migration to Beeline is recommended.



  1. insert overwrite table tb_name
  2. select site,product_id,case when site='ae' and wish_product_id is null then product_id else wish_product_id end as wish_product_id from tb_name;