@duguyiren3476
2015-08-06T02:40:19.000000Z
字数 3851
阅读 455
drill,hive
截止到目前本博客发布前,apache drill最新发布版本是1.0.0,对与此版本的数据源支持和文件格式的支持:
File system
对于目前我的需求:snappy+sequencefile 的hdfs存储格式,drill没有直接的支持,想到hive支持查询snappy+sequencefile,而drill支持hive,由此产生了是否可以通过hive storage plugin的方式来读取snappy+sequencefile? 经查证是可以的,配置如下:
<property><name>hive.metastore.uris</name><value>thrift://10.170.250.47:9083</value></property><property><name>hive.metastore.local</name><value>false</value></property>
启动metastore服务:
[hadoop@gateway local]$ ../hive-1.2.1/bin/hive --service metastore &
{"type": "hive","enabled": true,"configProps": {"hive.metastore.uris": "thrift://10.170.250.47:9083",#hive的metastore服务地址和端口"javax.jdo.option.ConnectionURL": "jdbc:mysql://xxx:3306/hive_database","hive.metastore.warehouse.dir": "/user/hive/warehouse",#为hive在hdfs上的warehouse目录"fs.default.name": "hdfs://xxx:9000","hive.metastore.sasl.enabled": "false"}}
保存退出后,重启drillbit服务
[hadoop@gateway drill-1.1.0]$ bin/drillbit.sh restart```3. 查询sequencefile测试:``` shell[hadoop@gateway drill-1.1.0]$ bin/sqlline -u jdbc:drill:zk=10.172.171.229:2181apache drill 1.0.0"the only truly happy people are children, the creative minority and drill users"0: jdbc:drill:zk=10.172.171.229:2181> use hive.ai;+-------+--------------------------------------+| ok | summary |+-------+--------------------------------------+| true | Default schema changed to [hive.ai] |+-------+--------------------------------------+1 row selected (0.188 seconds)0: jdbc:drill:zk=10.172.171.229:2181> !table+------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |+------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+| DRILL | INFORMATION_SCHEMA | CATALOGS | TABLE | | | | | | || DRILL | INFORMATION_SCHEMA | COLUMNS | TABLE | | | | | | || DRILL | INFORMATION_SCHEMA | SCHEMATA | TABLE | | | | | | || DRILL | INFORMATION_SCHEMA | TABLES | TABLE | | | | | | || DRILL | INFORMATION_SCHEMA | VIEWS | TABLE | | | | | | || DRILL | hive.ai | metric_data_entity | TABLE | | | | | | || DRILL | sys | boot | TABLE | | | | | | || DRILL | sys | drillbits | TABLE | | | | | | || DRILL | sys | memory | TABLE | | | | | | || DRILL | sys | options | TABLE | | | | | | || DRILL | sys | threads | TABLE | | | | | | || DRILL | sys | version | TABLE | | | | | | |+------------+---------------------+---------------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+0: jdbc:drill:zk=10.172.171.229:2181> SELECT count(1) FROM metric_data_entity where pt='2015080510' ;+-----------+| EXPR$0 |+-----------+| 40455402 |+-----------+1 row selected (14.482 seconds)0: jdbc:drill:zk=10.172.171.229:2181>```以上查询已经可以支持sequencefile查询,但是查询有压缩的snappy的文件就报错:```2015-08-05 16:34:49,067 [WorkManager-2] ERROR o.apache.drill.exec.work.WorkManager - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Zat org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_85]at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_85]at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]2015-08-05 16:39:05,781 [UserServer-1] INFO o.a.drill.exec.work.foreman.Foreman - State change requested. RUNNING --> CANCELLATION_REQUESTED
很明显要配置snappy的本地库:LD_LIBRARY_PATH环境变量,请配置下面的第四步
参考文献:
https://drill.apache.org/docs/hive-storage-plugin/
https://gist.github.com/vicenteg/7e060e79603f1e7ed3b4
http://blog.csdn.net/reesun/article/details/8556078