Apache Apex
Architecture
Apex Platform Overview
2
Apache Malhar Library
3
Native Hadoop Integration
4
• YARN is the
resource
manager
• HDFS used
for storing
any
persistent
state
Application Programming Model
5
Directed Acyclic Graph (DAG)
 A Stream is a sequence of data tuples
 An Operator takes one or more input streams, performs computations & emits one or more output
streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance in single-threaded
 Directed Acyclic Graph (DAG) is made up of operations and streams
Output
Stream
Tuple Tuple
er
Operator
er
Operator
er
Operator
er
Operator
Application Specification
6
Apex Engine
Core Features
Partitioning and Scaling Out
8
• Operators can be dynamically scaled
• Flexible Streams split
• Parallel partitioning
• MxN partitioning
• Unifiers
Advanced Windowing Support
9
 Application window
 Sliding window and tumbling window
 Checkpoint window
 No artificial latency
Stateful Fault Tolerance
 Supported out of the box
– Application state
– Application master state
– No data loss
 Automatic recovery
 Lunch test
 Buffer server
10
Processing Semantics
 At least once
 At most once
 Exactly once
11
Data Locality
 Stream locality for placement of operators
– Rack local – Distributed deployment
– Node local – Data does not traverse NIC
– Container local – Data doesn’t need to be serialized
– Thread local – Operators run in same thread
 Data locality
12
Dynamic Updates
13
 Dynamic topology updates
– Properties of operators can be changed
– New operators can be added
Resources
14
Apache Apex Community Page
Apache Apex LinkedIn Group
Help Us Name the Apex Mascot
15
Poll on Meetup Page

Apache Apex Introduction with PubMatic