@duguyiren3476 2015-08-06T02:40:19.000000Z 字数 3851 阅读 404

drill1.0配置hive storage plugin及测试

drill,hive

截止到目前本博客发布前，apache drill最新发布版本是1.0.0，对与此版本的数据源支持和文件格式的支持：

avro
parquet
hive
hbase
csv tsv psv

File system
对于目前我的需求：snappy+sequencefile 的hdfs存储格式，drill没有直接的支持，想到hive支持查询snappy+sequencefile，而drill支持hive，由此产生了是否可以通过hive storage plugin的方式来读取snappy+sequencefile？经查证是可以的，配置如下：

hive开启metastore的thrift服务：在hive-site.xml中加入如下配置

 <property>
    <name>hive.metastore.uris</name>
    <value>thrift://10.170.250.47:9083</value>
</property>
<property>
    <name>hive.metastore.local</name>
    <value>false</value>
</property>

启动metastore服务：

[hadoop@gateway local]$ ../hive-1.2.1/bin/hive --service metastore &

从drill的web ui上配置hive的plugin：

{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://10.170.250.47:9083",#hive的metastore服务地址和端口
"javax.jdo.option.ConnectionURL": "jdbc:mysql://xxx:3306/hive_database",
"hive.metastore.warehouse.dir": "/user/hive/warehouse",#为hive在hdfs上的warehouse目录
"fs.default.name": "hdfs://xxx:9000",
"hive.metastore.sasl.enabled": "false"
}
}

保存退出后，重启drillbit服务

[hadoop@gateway drill-1.1.0]$ bin/drillbit.sh restart
 ```
 3. 查询sequencefile测试：
 ``` shell
 [hadoop@gateway drill-1.1.0]$ bin/sqlline -u jdbc:drill:zk=10.172.171.229:2181
apache drill 1.0.0 
"the only truly happy people are children, the creative minority and drill users"
0: jdbc:drill:zk=10.172.171.229:2181> use hive.ai;
+-------+--------------------------------------+
|  ok   |               summary                |
+-------+--------------------------------------+
| true  | Default schema changed to [hive.ai]  |
+-------+--------------------------------------+
1 row selected (0.188 seconds)
0: jdbc:drill:zk=10.172.171.229:2181> !table
+------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
| TABLE_CAT  |     TABLE_SCHEM     |     TABLE_NAME      | TABLE_TYPE  | REMARKS  | TYPE_CAT  | TYPE_SCHEM  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  |
+------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
| DRILL      | INFORMATION_SCHEMA  | CATALOGS            | TABLE       |          |           |             |            |                            |                 |
| DRILL      | INFORMATION_SCHEMA  | COLUMNS             | TABLE       |          |           |             |            |                            |                 |
| DRILL      | INFORMATION_SCHEMA  | SCHEMATA            | TABLE       |          |           |             |            |                            |                 |
| DRILL      | INFORMATION_SCHEMA  | TABLES              | TABLE       |          |           |             |            |                            |                 |
| DRILL      | INFORMATION_SCHEMA  | VIEWS               | TABLE       |          |           |             |            |                            |                 |
| DRILL      | hive.ai             | metric_data_entity  | TABLE       |          |           |             |            |                            |                 |
| DRILL      | sys                 | boot                | TABLE       |          |           |             |            |                            |                 |
| DRILL      | sys                 | drillbits           | TABLE       |          |           |             |            |                            |                 |
| DRILL      | sys                 | memory              | TABLE       |          |           |             |            |                            |                 |
| DRILL      | sys                 | options             | TABLE       |          |           |             |            |                            |                 |
| DRILL      | sys                 | threads             | TABLE       |          |           |             |            |                            |                 |
| DRILL      | sys                 | version             | TABLE       |          |           |             |            |                            |                 |
+------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
0: jdbc:drill:zk=10.172.171.229:2181> SELECT count(1) FROM metric_data_entity where pt='2015080510' ;
+-----------+
|  EXPR$0   |
+-----------+
| 40455402  |
+-----------+
1 row selected (14.482 seconds)
0: jdbc:drill:zk=10.172.171.229:2181> 
 ```
 以上查询已经可以支持sequencefile查询，但是查询有压缩的snappy的文件就报错：
 ```
 2015-08-05 16:34:49,067 [WorkManager-2] ERROR o.apache.drill.exec.work.WorkManager - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exce
ption.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
    at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_85]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_85]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
2015-08-05 16:39:05,781 [UserServer-1] INFO  o.a.drill.exec.work.foreman.Foreman - State change requested.  RUNNING --> CANCELLATION_REQUESTED

很明显要配置snappy的本地库：LD_LIBRARY_PATH环境变量，请配置下面的第四步

配置LD_LIBRARY_PATH=/oneapm/local/hadoop-2.7.1/lib/native的系统环境变量并加入到CLASSPATH中

参考文献：
https://drill.apache.org/docs/hive-storage-plugin/
https://gist.github.com/vicenteg/7e060e79603f1e7ed3b4
http://blog.csdn.net/reesun/article/details/8556078

drill1.0配置hive storage plugin及测试

内容目录