1
Introducing Exactly Once
Semantics in Apache Kafka™
Apurva Mehta, Software Engineer,
Gehrig Kunz, Technical Product Marketing Manager
2
Agenda
• Why exactly-once?
• An overview of messaging semantics
• Why are duplicates introduced?
• What is exactly-once semantics?
• Exactly-once semantics in Kafka: Is it Practical?
• Next Steps
3
Exactly Once Semantics is a hard problem
4
An overview of messaging semantics
• At-most once
• At-least once
• Exactly-once
5
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.
6
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.
In other words: make stream processing easy,
simple, and reliable enough for everyone.
7
Apache Kafka’s existing semantics
At Least Once
8
Kafka’s Existing Semantics
9
Kafka’s Existing Semantics
10
Kafka’s Existing Semantics
11
Kafka’s Existing Semantics
12
Kafka’s Existing Semantics
13
Kafka’s Existing Semantics
14
What do we do now???
Kafka’s Existing Semantics
15
Kafka’s Existing Semantics: At Least Once
16
Kafka’s Existing Semantics: At Least Once
17
Kafka’s Existing Semantics: At Least Once
18
Why are duplicates introduced?
Various failures must be handled correctly:
• Broker can fail
• Producer-to-Broker RPC can fail
• Producer or Consumer client can fail
19
TL;DR – What we have today
• At least once in order delivery per partition.
• Producer retries can introduce duplicates and headaches.
20
The age old engineering question
Before we make this work, are we sure we should?
21
KafkaCash: A Peer to Peer Lending App
A peer-to-peer lending platform.
22
Help Bob reach $1000, send him $10
23
KafkaCash, powered by Kafka
24
Offset commits
25
Reprocessed transfer, eek!
26
Lost money! Eek eek!
27
How did Kafka add exactly once semantics?
28
Exactly-once semantics in Kafka, explained
Apache Kafka’s guarantees are stronger in 3 ways:
• Idempotent producer: Exactly-once, in-order, delivery
per partition.
• Transactions: Atomic writes across partitions.
• Exactly-once stream processing across read-process-
write tasks.
29
Part 1/3 : Idempotent Producer
Exactly-once, in-order, delivery per partition
30
Idempotent Producer Semantics
A single --successful!-- producer.send will result in
exactly one copy of the message in the log in all
circumstances.
31
Producer Configs
• enable.idempotence = true
• max.inflight.requests.per.connection=1
• acks = “all”
• retries > 0 (preferably MAX_INT)
32
The idempotent producer
33
The idempotent producer
34
The idempotent producer
35
The idempotent producer
36
The idempotent producer
37
The idempotent producer
38
The idempotent producer
39
The idempotent producer
40
TL;DR: idempotent producer
• Works transparently -- only one config change.
• Sequence numbers and producer ids are in the log.
• Resilient to broker failures, producer retries, etc.
41
Part 2/3 : Transactions
Atomic writes across multiple partitions.
42
Transactions semantics
• Atomic writes across multiple partitions.
• All messages in a transaction are made visible together,
or none are.
• Consumers must be configured to skip uncommitted
messages.
43
Producer config for transactions
• transactional.id = ‘some string’
• Typically based on the partition identifier in a partitioned,
stateful, app.
• Enables transaction recovery across producer sessions.
44
The transaction API
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
45
Transactions
46
1. Initialize the producer
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
47
Initializing ‘transactions’
48
2. Begin transactions and send data
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
49
Transactional sends
50
Transactional sends
51
3. Commit transaction
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
52
Commit
53
Commit
54
Commit
55
Success!
56
Consumer configs
• isolation.level:
• “read_committed”, or
• “read_uncommitted”
57
What do you get with isolation levels?
• read_committed: consumers read to the point where there
are no open transactions.
• read_uncommitted: will read everything.
• Messages read in offset order.
58
TL;DR: Transactions
• Atomic, multi-partition, writes.
• Use the new producer APIs for transactions.
• Consumers can filter out uncommitted or aborted
transactional messages.
59
Part 3/3 : Stream Processing
Stream Processing with
Exactly Once Semantics
60
Streams config
• processing.mode = “exactly_once”
61
End-to-end exactly-once semantics
• The read-process-write operation is atomic.
• Thus streams tasks produce valid answers even when
failures happen.
62
Back to KafkaCash
63
Exactly Once Semantics in Kafka
Is it practical?
64
Performance boost for Apache Kafka 0.11!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Details: https://bit.ly/kafka-eos-perf
65
Gains due to more efficient message format
66
What about the idempotent producer and transactions?
• Transactions: 3-5% overhead for 100ms transactions, 1KB
messages.
• Longer transactions and better batching result in better
performance.
• 20% overhead relative to at-most once delivery without
ordering guarantees.
• Idempotent producer alone has negligible overhead.
67
Putting it together
• We talked through an idempotent producer
• How we added transactions with atomic writes
• The impact it has on stream processing
68
When is it available?
Available to use in Kafka 0.11, June 2017.
69
Where we’ve come
2007
High throughput
messaging broker
2008
Highly available
replicated log 2012
Top Level
Apache Project
2016
Streams API
Connect API
2017
Exactly Once
Semantics
70
San Francisco
August 28, 2017
Organized by Confluent
71
What’s next for you
slackpass.io/
confluentcommunity
v
Try it
v v
Join the Community Let us know what
you think
@ConfluentDownload Confluent
Open Source
72
Thank You!

Exactly-once Semantics in Apache Kafka