Deep Dive Into Apache Apex Application
Chaitanya Chebolu
Application Development Model
2
▪A Stream is a sequence of data tuples
▪A typical Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance is single-threaded
▪Directed Acyclic Graph (DAG) is made up of operators and streams
Directed Acyclic Graph (DAG)
Output
Stream
Tupl
e
Tupl
e
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
3
Typical application example
4
DAG Types
O1 O2
O3
O4
O5• Logical Plan
● Logical representation of computation
● Defines operators, streams and dataflow
• Physical Plan
● Deployable plan on cluster
● Contains partition information
of operators
● Has ready-to-deploy serialized operator
instances
Logical DAG
O1
P1
O1
P2
O1
P3
O2
P1
O2
P2
O2
P3
U
O3
O4
O5
Physical DAG
5
➔ All operators in DAG go through
this life-cycle
➔ Managed by Apex Platform
➔ Governed by control tuples
Operator Lifecycle
6
➔ Setup
◆Start of operator lifecycle
◆Do any initialization here
➔ beginWindow
◆Marks starting of window
➔ endWindow
◆Marks end of window
➔ teardown
◆Do any finalization here
◆End of operator lifecycle
Operator Lifecycle (contd...)
7
Operator Lifecycle (contd...)
➔ emitTuples
◆Called for Input Adapters
◆Called in an infinite while
loop by platform
➔ process
◆Called for Generic Operators
and Output Adapters
◆Associated to to a port
◆Called for every incoming
tuple
8
Operator Lifecycle (contd...)
➔ OutputPort::emit
◆Special method not part of
operator lifecycle
◆To be called by operator
code
◆Emits the tuples to next
operator
◆Bound by Window
9
Input
Operator
(Adapter)
Output
Operator
(Adapter)
Generic
Operators
LOGSReader Parser Counter Output
HDFS
Defining DAG
10
• MyApplication implements StreamingApplication
ᵒ Provide implementation for populateDAG
ᵒ Stitch the DAG
APIs : Application
11
• SampleInputOperator implements InputOperator
ᵒ Define output ports
ᵒ Define emitTuples method.
ᵒ Define beginWindow, endWindow, setup, teardown
APIs : InputOperator
12
• SampleOperator extends
BaseOperator
ᵒ Define input ports, output ports
ᵒ Define process methods
ᵒ Optional : Define beginWindow,
endWindow, setup, teardown
APIs : GenericOperator, OutputOperator
Application Specification (Java)
13
DAG API (compositional)
Writing an Operator
14
15
Writing an Operator
Operator Library
16
RDBMS
• Vertica
• MySQL
• Oracle
• JDBC
NoSQL
• Cassandra, Hbase
• Aerospike, Accumulo
• Couchbase/ CouchDB
• Redis, MongoDB
• Geode
Messaging
• Kafka
• Solace
• Flume, ActiveMQ
• Kinesis, NiFi
File Systems
• HDFS/ Hive
• NFS
• S3
Parsers
• XML
• JSON
• CSV
• Avro
• Parquet
Transformations
• Filters
• Rules
• Expression
• Dedup
• Enrich
Analytics
• Dimensional Aggregations
(with state management for
historical data + query)
Protocols
• HTTP
• FTP
• WebSocket
• MQTT
• SMTP
Other
• Elastic Search
• Script (JavaScript, Python, R)
• Solr
• Twitter
17
Java : 1.7.x
mvn : 3.0 +
git : 1.7 +
Apache hadoop : How to : Single node cluster
Apache Apex Core
ᵒ git clone git@github.com:apache/apex-core.git
ᵒ cd apex-core/
ᵒ git checkout master
ᵒ mvn clean install -DskipTests
Apache Apex Malhar
ᵒ git clone git@github.com:apache/apex-malhar.git
ᵒ cd apex-malhar/
ᵒ git checkout master
ᵒ mvn clean install -DskipTests
DataTorrent RTS community edition
Building Apache Apex
Monitoring Console
Logical View
18
Physical View
Real-Time Dashboards
19
Q&A
20
Resources
21
• http://apex.apache.org/
• Learn more: http://apex.apache.org/docs.html
• Subscribe - http://apex.apache.org/community.html
• Download - http://apex.apache.org/downloads.html
• Follow @ApacheApex - https://twitter.com/apacheapex
• Meetups – http://www.meetup.com/pro/apacheapex/
• More examples: https://github.com/DataTorrent/examples
• Slideshare: http://www.slideshare.net/ApacheApex/presentations
• https://www.youtube.com/results?search_query=apache+apex
• Free Enterprise License for Startups -
https://www.datatorrent.com/product/startup-accelerator/

Deep Dive into Apache Apex App Development

  • 1.
    Deep Dive IntoApache Apex Application Chaitanya Chebolu
  • 2.
    Application Development Model 2 ▪AStream is a sequence of data tuples ▪A typical Operator takes one or more input streams, performs computations & emits one or more output streams • Each Operator is YOUR custom business logic in java, or built-in operator from our open source library • Operator has many instances that run in parallel and each instance is single-threaded ▪Directed Acyclic Graph (DAG) is made up of operators and streams Directed Acyclic Graph (DAG) Output Stream Tupl e Tupl e er Operator er Operator er Operator er Operator er Operator er Operator
  • 3.
  • 4.
    4 DAG Types O1 O2 O3 O4 O5•Logical Plan ● Logical representation of computation ● Defines operators, streams and dataflow • Physical Plan ● Deployable plan on cluster ● Contains partition information of operators ● Has ready-to-deploy serialized operator instances Logical DAG O1 P1 O1 P2 O1 P3 O2 P1 O2 P2 O2 P3 U O3 O4 O5 Physical DAG
  • 5.
    5 ➔ All operatorsin DAG go through this life-cycle ➔ Managed by Apex Platform ➔ Governed by control tuples Operator Lifecycle
  • 6.
    6 ➔ Setup ◆Start ofoperator lifecycle ◆Do any initialization here ➔ beginWindow ◆Marks starting of window ➔ endWindow ◆Marks end of window ➔ teardown ◆Do any finalization here ◆End of operator lifecycle Operator Lifecycle (contd...)
  • 7.
    7 Operator Lifecycle (contd...) ➔emitTuples ◆Called for Input Adapters ◆Called in an infinite while loop by platform ➔ process ◆Called for Generic Operators and Output Adapters ◆Associated to to a port ◆Called for every incoming tuple
  • 8.
    8 Operator Lifecycle (contd...) ➔OutputPort::emit ◆Special method not part of operator lifecycle ◆To be called by operator code ◆Emits the tuples to next operator ◆Bound by Window
  • 9.
  • 10.
    10 • MyApplication implementsStreamingApplication ᵒ Provide implementation for populateDAG ᵒ Stitch the DAG APIs : Application
  • 11.
    11 • SampleInputOperator implementsInputOperator ᵒ Define output ports ᵒ Define emitTuples method. ᵒ Define beginWindow, endWindow, setup, teardown APIs : InputOperator
  • 12.
    12 • SampleOperator extends BaseOperator ᵒDefine input ports, output ports ᵒ Define process methods ᵒ Optional : Define beginWindow, endWindow, setup, teardown APIs : GenericOperator, OutputOperator
  • 13.
  • 14.
  • 15.
  • 16.
    Operator Library 16 RDBMS • Vertica •MySQL • Oracle • JDBC NoSQL • Cassandra, Hbase • Aerospike, Accumulo • Couchbase/ CouchDB • Redis, MongoDB • Geode Messaging • Kafka • Solace • Flume, ActiveMQ • Kinesis, NiFi File Systems • HDFS/ Hive • NFS • S3 Parsers • XML • JSON • CSV • Avro • Parquet Transformations • Filters • Rules • Expression • Dedup • Enrich Analytics • Dimensional Aggregations (with state management for historical data + query) Protocols • HTTP • FTP • WebSocket • MQTT • SMTP Other • Elastic Search • Script (JavaScript, Python, R) • Solr • Twitter
  • 17.
    17 Java : 1.7.x mvn: 3.0 + git : 1.7 + Apache hadoop : How to : Single node cluster Apache Apex Core ᵒ git clone git@github.com:apache/apex-core.git ᵒ cd apex-core/ ᵒ git checkout master ᵒ mvn clean install -DskipTests Apache Apex Malhar ᵒ git clone git@github.com:apache/apex-malhar.git ᵒ cd apex-malhar/ ᵒ git checkout master ᵒ mvn clean install -DskipTests DataTorrent RTS community edition Building Apache Apex
  • 18.
  • 19.
  • 20.
  • 21.
    Resources 21 • http://apex.apache.org/ • Learnmore: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html • Download - http://apex.apache.org/downloads.html • Follow @ApacheApex - https://twitter.com/apacheapex • Meetups – http://www.meetup.com/pro/apacheapex/ • More examples: https://github.com/DataTorrent/examples • Slideshare: http://www.slideshare.net/ApacheApex/presentations • https://www.youtube.com/results?search_query=apache+apex • Free Enterprise License for Startups - https://www.datatorrent.com/product/startup-accelerator/