[关闭]
@tsing1226 2016-01-02T15:40:10.000000Z 字数 1698 阅读 1814

flume

Flume实时收集日志文件至HDFS文件系统案例

1、需求分析

1、 将hive的log日志实时日志文件至HDFS文件系统中;

2、 日志收集文件能自动放在对应日期的目录下例如20151215/00
20151215/11

3、 Flume抽取数据时,在hdfs sink设置文件的大小

2、修改配置文件

hive.org-collect.properties

## define agent
a2.sources = r2
a2.channels = c2
a2.sinks = k2

## define sources
a2.sources.r2.type = exec
a2.sources.r2.command =tail -F /opt/cdh3.5.6/hive-0.13.1-cdh5.3.6/logs/hive.log
a2.sources.r2.shell = /bin/bash -c
a2.sources.r2.batchSize=800000

## define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000000
a2.channels.c2.transactionCapacity = 100000

## define sinks
a2.sinks.k2.type = hdfs
##将收集的日志文件放在对应的目录下
a2.sinks.k2.hdfs.path = hdfs://hadoop-senior01.grc.com:8020/flume/events/hivelogs/%Y%m%d/%H/
## 文件类型
a2.sinks.k2.hdfs.fileType = DataStream
##文件写入格式
a2.sinks.k2.hdfs.writeFormat = Text
a2.sinks.k2.hdfs.batchSize = 100000
a2.sinks.k2.hdfs.rollInterval=0
##hdfs 设置大小
a2.sinks.k2.hdfs.rollSize=102400000
a2.sinks.k2.hdfs.rollCount=10000
##启动本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp=true 
a2.sinks.k2.hdfs.timeZone=America/Los_Angeles
##日志文件前缀
a2.sinks.k2.hdfs.filePrefix = events-
a2.sinks.k2.hdfs.round = true
a2.sinks.k2.hdfs.roundValue = 10
a2.sinks.k2.hdfs.roundUnit = minute

### bind the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

3、启动flume

bin/flume-ng agent --conf conf --conf-file conf/hive.org-collect.properties --name a2 -Dflume.root.logger=INFO,console

4、hive进行相关操作时

select * from df_log_comm ;

5、观察hdfs上的flume收集的日志信息

6、出现问题

当hdfs sink设置文件越大时,就会出现下面的错误,还未解决,不知道是不是块越大不能

ERROR - org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:553)] Hit max consecutive under-replication rotations (30); will not continue rolling files under this path due to under-replication
参考地址:http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.5.0-cdh5.3.6/FlumeUserGuide.html#hdfs-sink

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注