@tsing1226
2016-05-20T13:47:04.000000Z
字数 1570
阅读 4807
hive
常见的压缩技术有:bzip2、gzip、izo、snappy
Hadooop数据压缩优点
1 hadoopjob io减少
2 减少网络技术传输的数据大小
MR压缩的流程图如下:
MR压缩配置
mapreduce.output.fileoutputformat.compress
mapreduce.output.fileoutputformat.compress.codec
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/grc/wordcount/wc.input /user/grc/wordcount/nocompressoutpress
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount -D mapreduce.output.fileoutputformat.compress=true -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec /user/grc/wordcount/wc.input /user/grc/wordcount/compressoutpress
mapreduce.output.fileoutputformat.compress
mapreduce.output.fileoutputformat.compress.codec
MapReduce的参数可以在运行时加入命令行设置,也可以在mapred-site.xml文件中配置
各个压缩对应的源码
zib:org.apache.hadoop.io.compress.DefaultCodec
Gzip:org.apache.hadoop.io.compress.GzipCodec
Bzip2:org.apache.hadoop.io.compress.BzipCodec
Lzo:org.apache.hadoop.io.compress.lzo.lzoCode
Lz4:org.apache.hadoop.io.compress.Lz4Codec
Snappy:org.apache.hadoop.io.compress.SnappyCodec
set hive.exec.compress.intermediate=true ;
set mapreduce.map.output.compress=true ;
set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec ;
set hive.exec.compress.output=true ;
set mapreduce.output.fileoutputformat.compress=true ;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec ;
同样,hive端的压缩也可以运行时压缩。也可以在hive-site.xml文件中配置。