@tsing1226
2016-01-02T15:40:10.000000Z
字数 1698
阅读 1814
flume
1、 将hive的log日志实时日志文件至HDFS文件系统中;
2、 日志收集文件能自动放在对应日期的目录下例如20151215/00
20151215/113、 Flume抽取数据时,在hdfs sink设置文件的大小
hive.org-collect.properties
## define agent
a2.sources = r2
a2.channels = c2
a2.sinks = k2
## define sources
a2.sources.r2.type = exec
a2.sources.r2.command =tail -F /opt/cdh3.5.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a2.sources.r2.shell = /bin/bash -c
a2.sources.r2.batchSize=800000
## define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000000
a2.channels.c2.transactionCapacity = 100000
## define sinks
a2.sinks.k2.type = hdfs
##将收集的日志文件放在对应的目录下
a2.sinks.k2.hdfs.path = hdfs://hadoop-senior01.grc.com:8020/flume/events/hivelogs/%Y%m%d/%H/
## 文件类型
a2.sinks.k2.hdfs.fileType = DataStream
##文件写入格式
a2.sinks.k2.hdfs.writeFormat = Text
a2.sinks.k2.hdfs.batchSize = 100000
a2.sinks.k2.hdfs.rollInterval=0
##hdfs 设置大小
a2.sinks.k2.hdfs.rollSize=102400000
a2.sinks.k2.hdfs.rollCount=10000
##启动本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp=true
a2.sinks.k2.hdfs.timeZone=America/Los_Angeles
##日志文件前缀
a2.sinks.k2.hdfs.filePrefix = events-
a2.sinks.k2.hdfs.round = true
a2.sinks.k2.hdfs.roundValue = 10
a2.sinks.k2.hdfs.roundUnit = minute
### bind the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2
bin/flume-ng agent --conf conf --conf-file conf/hive.org-collect.properties --name a2 -Dflume.root.logger=INFO,console
select * from df_log_comm ;
当hdfs sink设置文件越大时,就会出现下面的错误,还未解决,不知道是不是块越大不能
ERROR - org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:553)] Hit max consecutive under-replication rotations (30); will not continue rolling files under this path due to under-replication
参考地址:http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.5.0-cdh5.3.6/FlumeUserGuide.html#hdfs-sink