Taming Pythons with ZooKeeper




                                PyCon Finland 2012
@nailor
Content Squad
Day to day operations

• Consume boatloads of XML
• Build database out ofit
• Transcode audio
• Enrich data
• Build indexes
• Ship indexes
Shipping
Pre-defined order
The naïve solution

• Central orchestrator
• Assume all machines are running
• Run remote commands
• We need to know all the quirks
• Fail fast
Guess what? It breaks
...and replacing isn't simple
New tool is needed

• No central orchestrator
• Assume some machines are always down
• Run locally
• Shift responsibility to system owners
• Fail gracefully
CAP Theorem
The CAP Theorem
1.Consistency
2.Availability
3.Partition tolerance
Pick two. Any two will do.
We went for CA
Why?

• Users are mostly contained in single DC
• Inside a single DC connections are quite robust
• Remember the order? We need consistency
• Availability is everything
How?
Apache ZooKeeper
What?

• A distributed tree-like data structure
• Simple primitives
• Automatic leader elections
• Guaranteed hard consistency
• Ephemeral nodes
Tree-like structure

                              /




                      /dir/       /dir2/



                  /subdir/
Simple primitives

●
    Guaranteed atomic operations
●
    Counters
●
    Change notifications
Automatic leader election

●
    Nodes know who is the most up to date
●
    If no leader can be picked, ZooKeeper refuses to work
Guaranteed hard consistency

●
    Every change is sent to every node!
●
    Quorum for all operations is always required
Ephemeral nodes

●
    Node is present only if the client is alive
Library: zkPython
The Good:

●
    Thin
●
    Comes with ZooKeeper
●
    Maintained by the Apache ZooKeeper project
The bad:

●
    Thin
●
    C bindings only, no PyPy for you
The ugly:

●
    No documentation :(
There are others: Kazoo
The Good:

●
    Pure Python
●
    Recipes implemented
●
    Used by many (Quora, Mozilla, reddit, Zope)
The bad:

●
    Not much recipes done
●
    Not owned by the mainline
The ugly:

●
    Own implementation of the protocol
Dos and Don'ts
Don't ship large chunks
Monitor the ZooKeeper
Don't write there all the time
Stay in one DC
Spotify & ZooKeeper
Summary time!
Concurrency == hard
Distributed consistency == hard
No partitions? Go ZooKeeper!
Pick your weapon library
Remember tradeoffs
Thank you

Taming Pythons with ZooKeeper (Pyconfi edition)