[关闭]
@zhangyy 2021-05-26T16:35:42.000000Z 字数 4851 阅读 127

flume的集群部署

协作框架



一:flume 介绍:

1.1 flume 的介绍

  1. FlumeCloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。
  2. 当前Flume有两个版本Flume 0.9X版本的统称Flume-ogFlume1.X版本的统称Flume-ng。由于Flume-ng经过重大重构,与Flume-og有很大不同,使用时请注意区分。

1.2 flume的单机模式

1.2.1 系统初始化

  1. 系统:Centos7.9x64
  2. 主机名:
  3. cat /etc/hosts
  4. ----
  5. 192.168.100.11 node01.flyfish.cn
  6. 192.168.100.12 node02.flyfish.cn
  7. 192.168.100.13 node03.flyfish.cn
  8. 192.168.100.14 node04.flyfish.cn
  9. 192.168.100.15 node05.flyfish.cn
  10. 192.168.100.16 node06.flyfish.cn
  11. 192.168.100.17 node07.flyfish.cn
  12. 192.168.100.18 node08.flyfish.cn
  13. ----

1.2.2 安装flume1.9.0

  1. node01.flyfish.cn 节点上面执行:
  2. 上传 apache-flume-1.9.0-bin.tar.gz 包到 /opt/bigdata 解压
  3. tar -zxvf apache-flume-1.9.0-bin.tar.gz
  4. mv apache-flume-1.9.0-bin /opt/bigdata/flume

image_1f6jrl1001emp8v31c7ifer19m79.png-285kB

image_1f6jrlbbf9lcatd77q1mrf1sgnm.png-72.1kB

  1. cd /opt/bigdata/flume/conf
  2. cp -p flume-env.sh.template flume-env.sh

image_1f6jrm38aa8i1ke31hc3c3d1kpc16.png-71.9kB


1.2.3 配置flume jdk所需环境变量

  1. echo "JAVA_HOME=/opt/bigdata/jdk" >> flume-env.sh

image_1f6jrpa7bcldl1i1prclfv1r9s23.png-54.3kB

  1. vim /etc/profile
  2. -----
  3. 加上flume的环境变量
  4. #### flume
  5. export FLUME_HOME=/opt/bigdata/flume
  6. PATH=$PATH:$HOME/bin:$FLUME_HOME/bin:$FLUME_HOME/sbin
  7. ----
  8. source /etc/profile
  9. Flume-ng version

image_1f6jrqtf410rjjui8mk18e2cp2g.png-170.1kB

1.2.4 配置flume单机测试实例

image_1f6jrro791mgl1m6k19c51e6c18su2t.png-176kB

  1. cd /opt/bigdata/flume/conf
  2. vim test-flume.properties
  3. ---
  4. # example.conf: A single-node Flume configuration
  5. # Name the components on this agent
  6. a1.sources = r1
  7. a1.sinks = k1
  8. a1.channels = c1
  9. # Describe/configure the source
  10. a1.sources.r1.type = netcat
  11. a1.sources.r1.bind = localhost
  12. a1.sources.r1.port = 44444
  13. # Describe the sink
  14. a1.sinks.k1.type = logger
  15. # Use a channel which buffers events in memory
  16. a1.channels.c1.type = memory
  17. a1.channels.c1.capacity = 1000
  18. a1.channels.c1.transactionCapacity = 100
  19. # Bind the source and sink to the channel
  20. a1.sources.r1.channels = c1
  21. a1.sinks.k1.channel = c1
  22. ----

1.2.5 测试实例

  1. yum install -y telnet-* netcat-*

image_1f6jrufb41sqg1q00eau16ho18un3a.png-181.8kB

image_1f6jruof210jc1jvv10v1qf71js93n.png-439.6kB

  1. 运行一个agent 实例
  2. cd /opt/bigdata/flume/
  3. bin/flume-ng agent --conf conf --conf-file conf/test-flume.properties --name a1 -Dflume.root.logger=INFO,console

image_1f6jrvfm41j0crf9rlf11re1ah444.png-305.8kB


  1. 测试:
  2. telnet localhost 44444

image_1f6js04qtl68dma1ngu1uj3b0j4h.png-82.8kB


  1. 验证:

image_1f6js19o7dqe8lmup1mbri94u.png-387.8kB

二:flume 多节点的集群搭建

2.1 flume 多节点架构

  1. Flume-ng最大的改动就是不再有分工角色设置,所有的都是agent,可以彼此之间相连,多个agent连到一个agent,此agent也就相当于collector了,NG也支持负载均衡.

image_1f6js51ets8g5p6tbe1lgfupe5b.png-249.9kB

2.2 flume多节点的配置

  1. node02.flyfish.cnnode03.flyfish.cn收集日志信息,传给node01.flyfish.cn,再由node01.flyfish.cn上传到hdfs

  1. 打包 node01.flyfish 节点的flume
  2. cd /opt/bigdata/
  3. tar -zcvf flume.tar.gz flume
  4. scp flume.tar.gz root@node02.flyfish.cn:/opt/bigdata/
  5. scp flume.tar.gz root@node03.flyfish.cn:/opt/bigdata/

2.3 配置flume slave节点

  1. node02.flyfish.cnnode03.flyfish.cn 上面配置
  2. cd /opt/bigdata/
  3. tar -zxvf flume.tar.gz
  4. cd /opt/bigdata/flume/conf
  5. vim slave.conf
  6. -----
  7. # 主要作用是监听目录中的新增数据,采集到数据之后,输出到avro (输出到agent)
  8. # 注意:Flume agent的运行,主要就是配置source channel sink
  9. # 下面的a1就是agent的代号,source叫r1 channel叫c1 sink叫k1
  10. a1.sources = r1
  11. a1.sinks = k1
  12. a1.channels = c1
  13. #具体定义source
  14. a1.sources.r1.type = spooldir
  15. #先创建此目录,保证里面空的
  16. a1.sources.r1.spoolDir = /opt/bigdata/flume/logs
  17. #对于sink的配置描述 使用avro日志做数据的消费
  18. a1.sinks.k1.type = avro
  19. # hostname是最终传给的主机名称或者ip地址
  20. a1.sinks.k1.hostname = node01.flyfish.cn
  21. a1.sinks.k1.port = 44444
  22. #对于channel的配置描述 使用文件做数据的临时缓存 这种的安全性要高
  23. a1.channels.c1.type = file
  24. a1.channels.c1.checkpointDir = /opt/bigdata/flume/checkpoint
  25. a1.channels.c1.dataDirs = /opt/bigdata/flume/data
  26. #通过channel c1将source r1和sink k1关联起来
  27. a1.sources.r1.channels = c1
  28. a1.sinks.k1.channel = c1

2.4 配置flume 的master 端

  1. 配置flume master 端:node01.flyfish.cn
  2. cd /opt/bigdata/flume/conf
  3. vim master.conf
  4. ----
  5. # 获取slave1,2上的数据,聚合起来,传到hdfs上面
  6. # 注意:Flume agent的运行,主要就是配置source channel sink
  7. # 下面的a1就是agent的代号,source叫r1 channel叫c1 sink叫k1
  8. a1.sources = r1
  9. a1.sinks = k1
  10. a1.channels = c1
  11. #对于source的配置描述 监听avro
  12. a1.sources.r1.type = avro
  13. # hostname是最终传给的主机名称或者ip地址
  14. a1.sources.r1.bind = node01.flyfish.cn
  15. a1.sources.r1.port = 44444
  16. #定义拦截器,为消息添加时间戳
  17. a1.sources.r1.interceptors = i1
  18. a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
  19. #对于sink的配置描述 传递到hdfs上面
  20. a1.sinks.k1.type = hdfs
  21. #集群的nameservers名字
  22. #单节点的直接写:hdfs://192.168.100.11:8020
  23. #ns是hadoop集群名称 [这个地方前提已经搭好了hadoop2.7.7]
  24. a1.sinks.k1.hdfs.path = hdfs://192.168.100.11:8020/flume-test/%Y%m%d
  25. a1.sinks.k1.hdfs.filePrefix = events-
  26. a1.sinks.k1.hdfs.fileType = DataStream
  27. #不按照条数生成文件
  28. a1.sinks.k1.hdfs.rollCount = 0
  29. #HDFS上的文件达到128M时生成一个文件
  30. a1.sinks.k1.hdfs.rollSize = 134217728
  31. #HDFS上的文件达到60秒生成一个文件
  32. a1.sinks.k1.hdfs.rollInterval = 60
  33. #对于channel的配置描述 使用内存缓冲区域做数据的临时缓存
  34. a1.channels.c1.type = memory
  35. a1.channels.c1.capacity = 1000
  36. a1.channels.c1.transactionCapacity = 100
  37. #通过channel c1将source r1和sink k1关联起来
  38. a1.sources.r1.channels = c1
  39. a1.sinks.k1.channel = c1
  40. ----

2.5 启动测试

  1. node01.flyfish.cn:
  2. cd /opt/bigdata/flume/
  3. mkdir logs
  4. nohup bin/flume-ng agent -n a1 -c conf -f conf/master.conf -Dflume.root.logger=INFO,console >> flume.logs &
  5. node02.flyfish.cnnode03.flyfish.cn
  6. cd /opt/bigdata/flume/
  7. mkdir logs
  8. nohup bin/flume-ng agent -n a1 -c conf -f conf/slave.conf -Dflume.root.logger=INFO,console >> flume.logs &

image_1f6jsdlh2di61k73a761c2b14v768.png-95.4kB

image_1f6jsdvqt29bvge9k1mc71e1c6l.png-65.4kB

image_1f6jse8d71rf1a51v9o14n12sf72.png-95.5kB

  1. node01.flyfish.cn:
  2. hdfs dfs -mkdir /flume-test/
  3. hdfs dfs -chmod 777 /flume-test/

image_1f6jseu1e12m113594po1e621eoi7f.png-100.9kB


  1. node02.flyfish.cn:
  2. vim test-flume.txt
  3. -----
  4. 11111
  5. 22222
  6. 33333
  7. 44444
  8. 55555
  9. -----
  10. cp -p test-flume.txt /opt/bigdata/flume/logs

image_1f6jsfvf1bjtmf4162c0n1t9g7s.png-58.7kB

image_1f6jsg8me27c11h51h2p7sj1u9e89.png-69.6kB

  1. hdfs的页面上查看

image_1f6jsgu8o8k112br166ln710ki8m.png-91kB

image_1f6jsh7o0111srvg1nc014ho19re93.png-81.8kB

image_1f6jshghf1e4319jr190a1b46e939g.png-75.3kB

  1. 将这个数据download下来
  2. node01.flyfish.cn:
  3. hdfs dfs -get /flume-test/20210525
  4. cat events-.1621928807491

image_1f6jsi5m1m9i12dg1a4n9phqrf9t.png-225.3kB

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注