Masahiro Nakagawa
Mar 14, 2015
Fossasia 2015
Fluentd
Unified logging layer
Who am I
> Masahiro Nakagawa
> github: @repeatedly
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> Living at OSS :)
> D language - Phobos, a.k.a standard library, committer
> Fluentd - Main maintainer
> MessagePack / RPC - D and Python (only RPC)
> The organizer of several meetups (Presto, DTM, etc…)
> etc…
Structured logging	

!
Reliable forwarding	

!
Pluggable architecture
http://fluentd.org/
github:fluent/fluentd
What’s Fluentd?
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Simple core + plugins written in Ruby
> Gem based various plugins
> http://www.fluentd.org/plugins
> List of users
> http://www.fluentd.org/testimonials
Before
✓ duplicated code for error handling...
✓ messy code for retrying mechanism...
So painful!
After
Concept / Design
Core Plugins
> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data

Core Plugins
> Divide & Conquer

> Buffering & Retrying

> Error handling

> Message routing

> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data

Common	

Concerns
Use Case	

Specific
> default second unit
> from data source
Event structure(log message)
✓ Time
> for message routing
> where is from?
✓ Tag
> JSON format
> MessagePack

internally
> schema-free
✓ Record
Reliable streaming data transfer
error retry
error retry retry
retry
Batch
Stream
Other stream
(micro batch)
Nagios
PostgreSQL
Hadoop
Alerting
Amazon S3
Analysis
Archiving
Elasticsearch
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
buffering / retrying / routing
M x N → M + N
plugins
Use case
Simple forwarding
# logs from a file	
<source>	
type tail	
path /var/log/httpd.log	
pos_file /tmp/pos_file	
format apache2	
tag backend.apache	
</source>	
!
# logs from client libraries	
<source>	
type forward	
port 24224	
</source>	
!
# store logs to MongoDB	
<match backend.*>	
type mongo	
database fluent	
collection test	
</match>
Less Simple Forwarding
- At-most-once / At-least-once

- HA (failover)	

- Load-balancing
All data
Near realtime and batch combo!
Hot data
# logs from a file	
<source>	
type tail	
path /var/log/httpd.log	
pos_file /tmp/pos_file	
format apache2	
tag web.access	
</source>	
!
# logs from client libraries	
<source>	
type forward	
port 24224	
</source>	
!
# store logs to ES and HDFS	
<match web.*>	
type copy	
<store>	
type elasticsearch	
logstash_format true	
</store>	
<store>	
type webhdfs	
host namenode	
port 50070	
path /path/on/hdfs/	
</store>	
</match>
CEP for Stream Processing
Norikra is a SQL based CEP engine: http://norikra.github.io/
Container Logging
> Kubernetes
!
!
!
!
!
> Google Compute Engine
> https://cloud.google.com/logging/docs/install/compute_install
Fluentd on Kubernetes / GCE
Slideshare
http://engineering.slideshare.net/2014/04/skynet-project-monitor-scale-and-auto-heal-a-system-in-the-cloud/
Log Analysis System And its designs in LINE Corp. 2014 early
Architecture
Internal Architecture
Input Parser Buffer Output FormatterFilter OutputFormatter
Internal Architecture
Input Parser Buffer Output FormatterFilter
“input-ish” “output-ish”
Input plugins
File tail (in_tail)
Syslog (in_syslog)
HTTP (in_http)
HTTP/2 (in_http2 WIP)
...
✓ Receive logs
✓ Or pull logs from data sources
✓ non-blocking
InpuInput
Parser plugins
JSON
Regexp
Apache/Nginx/Syslog
CSV/TSV

etc.
✓ Parse into JSON
✓ Common formats out of the box
✓ Some inputs plugin depends on

Parser plugin
✓ v0.10.46 and above
ParseParser
Filter plugins
grep
record_transformer
suppress
…
✓ Filter / Mutate record
✓ Record level and Stream level
✓ v0.12 and above
ParseParserFilter
Buffer plugins
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safety
Memory (buf_memory)
File (buf_file)
BuffeBuffer
Buffer internal
✓ Chunk = adjustable unit of data
✓ Buffer = Queue of chunks
chunk
chunk
chunk output
Input
Formatter plugins
✓ Format output
✓ Some plugins depends on

Formatter plugins
✓ v0.10.46 and above
JSON
CSV/TSV
“single value”
msgpack
FormattFormatter
Output plugins
✓ Write to external systems
✓ Buffered & Non-buffered
✓ 200+ plugins
Outpu
File (out_file)
Amazon S3 (out_s3)
MongoDB (out_mongo)
...
Output
Roadmap
> v0.10 (old stable)
> v0.12 (current stable)
> Filter / Label / At-least-once
> v0.14 (spring, 2015)
> New plugin APIs, ServerEngine, Time…
> v1 (summer, 2015)
> Fix new features / APIs
https://github.com/fluent/fluentd/wiki/V1-Roadmap
Goodies
fluent-bit
> Made for Embedded Linux
> OpenEmbedded & Yocto Project
> Intel Edison, RasPi & Beagle Black boards
> https://github.com/fluent/fluent-bit 
> Standalone application or Library mode
> Built-in plugins
> input: cpu, kmsg, output: fluentd
> First release at the end of Mar 2015
fluentd-ui
> Manage Fluentd instance via Web UI
> https://github.com/fluent/fluentd-ui











Treasure Agent (td-agent)
> Treasure Data distribution of Fluentd
> including Ruby and QA’ed plugins
> Treasure Agent 2 is current stable
> We recommend to use v2, not v1
> including fluentd-ui
> Next release, 2.2.0, uses fluentd v0.12
Embulk
> Bulk Loader version of Fluentd
> Pluggable architecture
> JRuby, JVM languages
> High performance parallel processing
> Share your script as a plugin
> https://github.com/embulk
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed
HDFS
MySQL
Amazon S3
Embulk
CSV Files
SequenceFile
Salesforce.com
Elasticsearch
Cassandra
Hive
Redis
✓ Parallel execution
✓ Data validation
✓ Error recovery
✓ Deterministic behaviour
✓ Idempotent retrying
Plugins Plugins
bulk load
Check: treasuredata.com
Cloud service for the entire data pipeline

Fluentd Unified Logging Layer At Fossasia