flume日志采集

- - CSDN博客推荐文章

1.1.2. Client端Log4j配置文件. （黄色文字为需要配置的内容）. //日志Appender修改为flume提供的Log4jAppender. //日志需要发送到的端口号，该端口要有ARVO类型的source在监听. //日志需要发送到的主机ip，该主机运行着ARVO类型的source.

Flume日志收集

- - 企业架构 - ITeye博客

转： http://www.cnblogs.com/oubo/archive/2012/05/25/2517751.html. Flume是一个分布式、可靠、和高可用的海量日志聚合的系统，支持在系统中定制各类数据发送方，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接受方（可定制）的能力.

分布式日志收集收集系统：Flume

- - 标点符

Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统. 支持在系统中定制各类数据发送方，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接受方（可定制）的能力. Flume 初始的发行版本目前被统称为 Flume OG（original generation），属于 cloudera.

Flume是Cloudera公司的一款高性能、高可能的分布式日志收集系统. 现在已经是Apache Top项目. 同Flume相似的日志收集系统还有 Facebook Scribe， Apache Chuwka， Apache Kafka(也是LinkedIn的). Flume是后起之秀，本文尝试简要分析Flume数据流通过程中提供的组件、可靠性保证来介绍Flume的主要设计，不涉及Flume具体的安装使用，也不涉及代码层面的剖析.

开源日志系统简介——Scribe，flume，kafka，Chukwa

- - 互联网 - ITeye博客

许多公司的平台每天会产生大量的日志（一般为流式数据，如，搜索引擎的pv，查询等），处理这些日志需要特定的日志系统，一般而言，这些系统需要具有以下特征：. （1）构建应用系统和分析系统的桥梁，并将它们之间的关联解耦；. （2）支持近实时的在线分析系统和类似于Hadoop之类的离线分析系统；. 即：当数据量增加时，可以通过增加节点进行水平扩展.

Flume + kafka + HDFS构建日志采集系统

- - 企业架构 - ITeye博客

Flume是一个非常优秀日志采集组件，类似于logstash，我们通常将Flume作为agent部署在application server上，用于收集本地的日志文件，并将日志转存到HDFS、kafka等数据平台中；关于Flume的原理和特性，我们稍后详解，本文只简述如何构建使用Flume + kafka + HDFS构建一套日志采集系统.

使用Flume+Kafka+SparkStreaming进行实时日志分析

- - CSDN博客推荐文章

每个公司想要进行数据分析或数据挖掘，收集日志、ETL都是第一步的，今天就讲一下如何实时地（准实时，每分钟分析一次）收集日志，处理日志，把处理后的记录存入Hive中，并附上完整实战代码. 思考一下，正常情况下我们会如何收集并分析日志呢. 首先，业务日志会通过Nginx（或者其他方式，我们是使用Nginx写入日志）每分钟写入到磁盘中，现在我们想要使用Spark分析日志，就需要先将磁盘中的文件上传到HDFS上，然后Spark处理，最后存入Hive表中，如图所示：.

FLUME监控每天按日期滚动的日志文件

- - 开源软件 - ITeye博客

原来的flume的配置如下：. 更改后的配置为：. 其中 locktail_rotate.sh 参见 https://github.com/ypenglyn/locktail/blob/master/locktail_rotate.sh. 已有 0 人发表留言，猛击->> 这里<<-参与讨论.

使用Flume+Kafka+SparkStreaming进行实时日志分析 - Trigl的博客 - CSDN博客

- -

Flume OG 与 Flume NG 的对比

- - 开源软件 - ITeye博客

很久没接触flume了，刚掀开官网一看，发现flume已然不是以前的那个flume了，其实早在flume技术群就听到NG这个字眼，以前没特注意，今天做了些对比，发现flume确实有了投胎换骨般的改变. 首先介绍下Flume OG & Flume NG这两个概念. Flume OG:Flume original generation 即Flume 0.9.x版本.

Property Name	Default	Description
type	–	The component type name has to be REGEX_FILTER
regex	”.*”	Regular expression for matching against events
excludeRegex	false	If true, regex determines events to exclude, otherwise regex determines events to include.

Name	Default	Description
channel	–
type	–	The component type name, needs to be hdfs
hdfs.path	–	HDFS directory path (eg hdfs://namenode/flume/webdata/)
hdfs.filePrefix	FlumeData	Name prefixed to files created by Flume in hdfs directory
hdfs.rollInterval	30	Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize	1024	File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount	10	Number of events written to file before it rolled (0 = never roll based on number of events)
hdfs.batchSize	1	number of events written to file before it flushed to HDFS
hdfs.txnEventMax	100
hdfs.codeC	–	Compression codec. one of following : gzip, bzip2, lzo, snappy
hdfs.fileType	SequenceFile	File format: currently SequenceFile,DataStream orCompressedStream(1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC
hdfs.maxOpenFiles	5000
hdfs.writeFormat	–	“Text” or “Writable”
hdfs.appendTimeout	1000
hdfs.callTimeout	10000
hdfs.threadsPoolSize	10	Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)
hdfs.rollTimerPoolSize	1	Number of threads per HDFS sink for scheduling timed file rolling
hdfs.kerberosPrincipal	–	Kerberos user principal for accessing secure HDFS
hdfs.kerberosKeytab	–	Kerberos keytab for accessing secure HDFS
hdfs.round	false	Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
hdfs.roundValue	1	Rounded down to the highest multiple of this (in the unit configured usinghdfs.roundUnit), less than current time.
hdfs.roundUnit	second	The unit of the round down value - second,minute orhour.
serializer	TEXT	Other possible options include AVRO_EVENT or the fully-qualified class name of an implementation of theEventSerializer.Builder interface.
serializer.*

flume日志采集

1. Log4j Appender

1.1. 使用说明

1.1.2. Client端Log4j配置文件

1.1.3. flume agent配置

1.2. 分析

1.3. 日志代码

Log.info(“this message has DEBUG in it”);

1.4. 采集到的数据样例

this message has DEBUG in it

this message has DEBUG in it

2. Exec source（放弃）

2.1. 使用说明

2.1.1. flume agent配置

# The configuration file needs to define the sources,

# the channels and the sinks.

# Sources, channels and sinks are defined per agent,

# in this case called 'agent'

# example.conf: A single-node Flume configuration

# Name the components on this agent

agent1.sources = source1

agent1.sinks = sink1

agent1.channels = channel1

# Describe/configure source1

#agent1.sources.source1.type = avro

agent1.sources.source1.type = exec

agent1.sources.source1.command = tail -f /home/yubojie/logs/ultraIDCPServer.log

#agent1.sources.source1.bind = 192.168.0.146

#agent1.sources.source1.port = 44444

agent1.sources.source1.interceptors = a

agent1.sources.source1.interceptors.a.type = org.apache.flume.interceptor.HostInterceptor$Builder

agent1.sources.source1.interceptors.a.preserveExisting = false

agent1.sources.source1.interceptors.a.hostHeader = hostname

# Describe sink1

#agent1.sinks.sink1.type = FILE_ROLL

#agent1.sinks.sink1.sink.directory = /home/yubojie/flume/apache-flume-1.2.0/flume-out

agent1.sinks.sink1.type = hdfs

agent1.sinks.sink1.hdfs.path = hdfs://localhost:9000/user/

agent1.sinks.sink1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory

agent1.channels.channel1.type = memory

agent1.channels.channel1.capacity = 1000

agent1.channels.channel1.transactionCapactiy = 100

# Bind the source and sink to the channel

agent1.sources.source1.channels = channel1

agent1.sinks.sink1.channel = channel1

2.2. 分析

2.3. 采集到的数据样例

2012/10/26 02:36:34 INFO LogTest this message has DEBUG 中文 in it

2012/10/26 02:40:12 INFO LogTest this message has DEBUG 中文 in it

2.4. 日志代码

Log.info(“this message has DEBUG 中文 in it”);

3. Syslog

3.1. 使用说明

3.1.1. Client端示例代码

3.1.2. 日志接收的flume agent配置

agent1.sinks.sink1.channel = channel1

3.2. 分析

4. 日志过滤Interceptor（FLUME-1358）

4.1. Regex FilteringInterceptor说明

4.2. 使用说明（测试配置）

4.2.1. 日志接收的Flume agent配置

5. HDFS SINK

5.1. 使用说明

5.2. 可配置项

5.3. Agent配置样例