[关闭]
@awsekfozc 2015-12-18T01:57:53.000000Z 字数 2698 阅读 1949

Flume

Flume


1)简介

日志数据收集工具,关键部分agent。每个agent分为三个部分
1.source:要收集数据的源。主动push数据到channel.
2.channel:数据管道,接受source推送的数据,被sink拉取数据
3.sink:收集的数据要存放的位置。主动从channel上poll数据。

2)安装

下载地址

http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.5.0-cdh5.3.5.tar.gz

官方文档

http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.5.0-cdh5.3.6/FlumeUserGuide.html

安装JDK

安装解压

  1. $ tar -zxvf flume-ng-1.5.0-cdh5.3.6.tar.gz -C /opt/cdh/

配置JAVA_HOME

文件flume-env.sh
  1. export JAVA_HOME=/opt/moduels/jdk1.7.0_67

3)简单示例

HDFS Sink for hour[1]

将hadoop jar包放入 flume_home/lib
QQ截图20151216002952.png-16.4kB
执行

  1. bin/flume-ng agent --conf conf --name a1 --conf-file conf/hdfs-sink.conf -Dflume.root.logger=INFO,console

配置

  1. # The configuration file needs to define the sources,
  2. # the channels and the sinks.
  3. #defined agent.
  4. a1.sources = r1
  5. a1.channels = c1
  6. a1.sinks = k1
  7. #defined sources.
  8. a1.sources.r1.type = exec
  9. a1.sources.r1.command = tail -f /opt/cdh/hive-0.13.1-cdh5.3.6/logs/hive.log
  10. a1.sources.r1.shell = /bin/bash -c
  11. #defined channels.
  12. a1.channels.c1.type = file
  13. a1.channels.c1.checkpointDir = /opt/datas/flume/checkpoint
  14. a1.channels.c1.dataDirs = /opt/datas/flume/data
  15. #defined sinks.
  16. a1.sinks.k1.type = hdfs
  17. a1.sinks.k1.hdfs.path = hdfs://hadoop.zc.com:8020/user/zc/flume/hive-log/%Y%m%d/%H
  18. a1.sinks.k1.hdfs.fileType = DataStream
  19. a1.sinks.k1.hdfs.filePrefix = zc
  20. a1.sinks.k1.hdfs.writeFormat = Text
  21. a1.sinks.k1.hdfs.batchSize = 1000
  22. a1.sinks.k1.hdfs.rollSize = 128000000
  23. a1.sinks.k1.hdfs.rollInterval = 3600
  24. a1.sinks.k1.hdfs.rollCount = 0
  25. a1.sinks.k1.hdfs.round = true
  26. a1.sinks.k1.hdfs.useLocalTimeStamp = true
  27. #defined sources channels sinks.
  28. a1.sources.r1.channels = c1
  29. a1.sinks.k1.channel = c1

Spooling Directory Source[2]

常搭配其他日志框架使用(如log4),当日志框架将文件完成写入后,加入标示,如拓展名改变。flume检测到后读取文件,读取完成后,修改标示。

配置

  1. # The configuration file needs to define the sources,
  2. # the channels and the sinks.
  3. #defined agent.
  4. a1.sources = r1
  5. a1.channels = c1
  6. a1.sinks = k1
  7. #defined sources.
  8. a1.sources.r1.type = spooldir
  9. a1.sources.r1.spoolDir = /opt/datas/flume/spdr
  10. a1.sources.r1.fileSuffix = .delete
  11. a1.sources.r1.ignorePattern = ^(.)*\\.log.tmp$
  12. #defined channels.
  13. a1.channels.c1.type = file
  14. a1.channels.c1.checkpointDir = /opt/datas/flume/checkpoint
  15. a1.channels.c1.dataDirs = /opt/datas/flume/data
  16. #defined sinks.
  17. a1.sinks.k1.type = hdfs
  18. a1.sinks.k1.hdfs.path = hdfs://hadoop.zc.com:8020/user/zc/flume/hive-log
  19. a1.sinks.k1.hdfs.fileType = DataStream
  20. a1.sinks.k1.hdfs.filePrefix = zc
  21. a1.sinks.k1.hdfs.writeFormat = Text
  22. a1.sinks.k1.hdfs.batchSize = 1000
  23. a1.sinks.k1.hdfs.rollSize = 128000000
  24. a1.sinks.k1.hdfs.rollInterval = 3600
  25. a1.sinks.k1.hdfs.rollCount = 0
  26. a1.sinks.k1.hdfs.round = true
  27. a1.sinks.k1.hdfs.useLocalTimeStamp = true
  28. #defined sources channels sinks.
  29. a1.sources.r1.channels = c1
  30. a1.sinks.k1.channel = c1

QQ截图20151216014318.png-10.3kB

在此输入正文

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注