Concurrency Constructs
      overview
      Let’s jump into
Who is there
Stas Shevchenko
Accenture

Based on
Survey of Concurrency Constructs
Ted Leung
Sun Microsystems
Mainstream is 10 years in the past
Something happened in




     2002
Fraction of Chip reachable
    in one clock cycle
Converting to Figure




[Source:] A Wire-Delay Scalable Microprocessor Architecture for High Performance
Systems
Processor Clock Speed
Growing CPU performance
Programming style impact on
       performance


         Sequences



       Multithreading
Mobile (up to 4)
1 Mainstream (4)
Office (12)
SUN Unit (up to 128)
Ignoring cores…
2 cores won't hurt you

4 cores will hurt a little

8 cores will hurt a bit

16 will start hurting

32 cores will hurt a lot (2009)

...

1 M cores ouch (2019)
Java is so Java
Java Memory Model
wait - notify
Thread.yield()
Thread.sleep()

synchronized

Executor Framework -> NIO (2)
Books
Books
We DO
• Threads
  – Program counter
  – Own stack
  – Shared Memory
• Locks
Shared memory
Unpredictable state
Crash in critical region
Issues
• Locks
  – manually lock and unlock
  – lock ordering is a big problem
  – locks are not compositional
• How do we decide what is concurrent?
• Need to pre-design, but now we have to
  retrofit concurrency via new requirements
Design Goals/Space
Mutual Exclusion
Serialization / Ordering
Inherent / Implicit vs Explicit
Fine / Medium / Coarse grained
Composability
A good solution
• Substantially less error prone
• Makes it much easier to identify concurrency
• Runs on today’s (and future) parallel hardware
  – Works if you keep adding cores/threads
Theory
Functional Programming
Actors
CSP
CCS
petri-nets
pi-calculus
join-calculus
The models
• Transactional Memory
  – Persistent data structures
• Actors
• Dataflow
• Tuple spaces
Transactional memory
analogous to database transactions
Hardware vs. software implementations
Idea goes as far back as 1986
First appearance in a programming language:
Concurrent Haskell 2005
Example Clojure
(defn deposit [account amount]
  (dosync
     (let [owner (account :owner)
            balance-ref (account :balance-ref)]
        (do
            (alter balance-ref + amount)
            (println “depositing” amount
                     (account :owner)))))))
STM Design Space
STM Algorithms / Strategies
  Granularity
      word vs block
Locks vs Optimistic concurrency
Conflict detection
  eager vs lazy
Contention management
STM Problems
• Non abortable operations
  – I/O
• STM Overhead
  – read/write barrier elimination
• Where to place transaction boundaries?
• Still need condition variables
  – ordering problems are important
Implementations
Clojure STM - via Refs
Akka/Scala
   copy from Clojure
Haskell/GHC
   Use logs and aborts txns
Persistent Data Structures
In Clojure, combined with STM
Motivated by copy on write
hash-map, vector, sorted map
Available Data Structures
Lists, Vectors, Maps
hash list based on Vlists
VDList - deques based on Vlists
red-black trees
Real Time Queues and Deques
deques, output-restricted deques
binary random access lists
binomial heaps
skew binary random access lists
skew binomial heaps
catenable lists
heaps with efficient merging
catenable deques
Problem
• Not really a full model
• Oriented towards functional programming
Actors
Invented by Carl Hewitt at MIT (1973)
Formal Model
   Programming languages
   Hardware
   Led to continuations, Scheme
Recently revived by Erlang
   Erlang’s model is not derived explicitly from
Actors
Simple
Example
object account extends Actor
{
     private var balance = 0

     def act() {
          loop {
                 react {
                       case Withdraw(amount) =>
                            balance -= amount
                            sender ! Balance(balance)
                       case Deposit(amount) =>
                            balance += amount
                            sender ! Balance(balance)
                       case BalanceRequest =>
                            sender ! Balance(balance)
                       case TerminateRequest =>
          }
     }
}
Problems
DOS of the actor mail queue
Multiple actor coordination
   reinvent transactions?
Actors can still deadlock and starve
Programmer defines granularity
   by choosing what is an actor
Impelmentations
Scala
   Akka Actors
   Scala Actors
   Lift Actors
Erlang
   OTP
CLR
   F# / Axum
Perfomance
Actor create/destroy
Message passing
Memory usage
Erlang vs JVM
Erlang
   per process GC heap
   tail call
   distributed
JVM
   per JVM heap
   tail call (fixed in JSR-292?, at least in scalac)
   not distributed
   few kinds of actors (Scala)
Actor Variants
Clojure Agents
   Designed for loosely coupled stuff
   Code/actions sent to agents
   Code is queued when it hits the agent
   Agent framework guarantees serialization
   State of agent is always available for read (unlike
actors which could be busy processing when you send a
read message)
   not in favor of transparent distribution
   Clojure agents can operate in an ‘open world’ - actors
answer a specific set of messages
Dataflow
Dataflow is a software architecture based on the idea
that changing the value of a variable should automatically
force recalculation of the values of variables which
depend on its value
Bill Ackerman’s PhD Thesis at MIT (1984)
Declarative Concurrency in functional languages
Research in the 1980’s and 90’s
Inherent concurrency
    Turns out to be very difficult to implement
Interest in declarative concurrency is slowly returning
The Model
Dataflow Variables
   create variable
   bind value
   read value or block
Threads
Dataflow Streams
   List whose tail is an unbound dataflow variable
Deterministic computation!
Example: Variables 1
object Test5 extends App {
 val x, y, z = new DataFlowVariable[Int]

    val main = thread {
     x << 1

        if (x() > y()) { z << x } else {z << y }
    }

    val setY = thread {
      Thread.sleep(5000)
      y << 2
    }

    main ! 'exit
    setY ! 'exit
}
Example: Streams (Oz)
fun {Ints N Max}
      if N == Max then nil
      else
            {Delay 1000}
            N|{Ints N+1 Max}
      end
end
fun {Sum S Stream}
      case Stream of nil then S
      [] H|T then S|{Sum H+S T} end
end
local X Y in
      thread X = {Ints 0 1000} end
      thread Y = {Sum 0 X} end
      {Browse Y}
end
Implementations
Mozart Oz
  http://www.mozart-oz.org/
Akka
  http://github.com/jboner/scala-dataflow
  dataflow variables and streams
Ruby library
  http://github.com/larrytheliquid/dataflow
  dataflow variables and streams
Groovy
  http://code.google.com/p/gparallelizer/
Problems
Can’t handle non-determinism
  like a server
  Need ports
      this leads to actor like things
Tuple Spaces Model
Originated in Linda (1984)
Popularized by Jini
The Model
Three operations
  write() (out)
  take() (in)
  read()
The Model
Space uncoupling
Time uncoupling
Readers are decoupled from Writers
Content addressable by pattern matching
Can emulate
   Actor like continuations
   CSP
   Message Passing
   Semaphores
Example
public class Account implements Entry {
     public Integer accountNo;
     public Integer value;
     public Account() { ... }
     public Account(int accountNo, int value {
          this.accountNo = newInteger(accountNo);
          this.value = newInteger(value);
     }
}

try {
        Account newAccount = new Account(accountNo, value);
        space.write(newAccount, null, Lease.FOREVER);
}

space.read(accountNo);
Implementations
Jini/JavaSpaces
   http://incubator.apache.org/river/RIVER/inde
x.html
BlitzSpaces
   http://www.dancres.org/blitz/blitz_js.html
PyLinda
   http://code.google.com/p/pylinda/
Rinda
   built in to Ruby
Problems
Low level
High latency to the space - the space is
contention point / hot spot
Scalability
More for distribution than concurrency
Projects
Scala
Erlang
Clojure
Kamaelia
Haskell
Axum/F#
Mozart/Oz
Akka
Work to be done
More in depth comparisons on 4+ core
platforms
Higher level frameworks
Application architectures/patterns
   Web
   Middleware
Outcomes
F..ck the shared state,
Mutable means non-scalable

It is not too early!
QA

Concurrency Constructs Overview

  • 1.
    Concurrency Constructs overview Let’s jump into
  • 2.
    Who is there StasShevchenko Accenture Based on Survey of Concurrency Constructs Ted Leung Sun Microsystems
  • 3.
    Mainstream is 10years in the past
  • 4.
  • 5.
    Fraction of Chipreachable in one clock cycle
  • 6.
    Converting to Figure [Source:]A Wire-Delay Scalable Microprocessor Architecture for High Performance Systems
  • 7.
  • 8.
  • 9.
    Programming style impacton performance Sequences Multithreading
  • 10.
  • 11.
  • 12.
  • 13.
    SUN Unit (upto 128)
  • 14.
    Ignoring cores… 2 coreswon't hurt you 4 cores will hurt a little 8 cores will hurt a bit 16 will start hurting 32 cores will hurt a lot (2009) ... 1 M cores ouch (2019)
  • 15.
    Java is soJava Java Memory Model wait - notify Thread.yield() Thread.sleep() synchronized Executor Framework -> NIO (2)
  • 16.
  • 17.
  • 18.
    We DO • Threads – Program counter – Own stack – Shared Memory • Locks
  • 19.
  • 20.
  • 21.
  • 22.
    Issues • Locks – manually lock and unlock – lock ordering is a big problem – locks are not compositional • How do we decide what is concurrent? • Need to pre-design, but now we have to retrofit concurrency via new requirements
  • 23.
    Design Goals/Space Mutual Exclusion Serialization/ Ordering Inherent / Implicit vs Explicit Fine / Medium / Coarse grained Composability
  • 24.
    A good solution •Substantially less error prone • Makes it much easier to identify concurrency • Runs on today’s (and future) parallel hardware – Works if you keep adding cores/threads
  • 25.
  • 26.
    The models • TransactionalMemory – Persistent data structures • Actors • Dataflow • Tuple spaces
  • 27.
    Transactional memory analogous todatabase transactions Hardware vs. software implementations Idea goes as far back as 1986 First appearance in a programming language: Concurrent Haskell 2005
  • 28.
    Example Clojure (defn deposit[account amount] (dosync (let [owner (account :owner) balance-ref (account :balance-ref)] (do (alter balance-ref + amount) (println “depositing” amount (account :owner)))))))
  • 29.
    STM Design Space STMAlgorithms / Strategies Granularity word vs block Locks vs Optimistic concurrency Conflict detection eager vs lazy Contention management
  • 30.
    STM Problems • Nonabortable operations – I/O • STM Overhead – read/write barrier elimination • Where to place transaction boundaries? • Still need condition variables – ordering problems are important
  • 31.
    Implementations Clojure STM -via Refs Akka/Scala copy from Clojure Haskell/GHC Use logs and aborts txns
  • 32.
    Persistent Data Structures InClojure, combined with STM Motivated by copy on write hash-map, vector, sorted map
  • 33.
    Available Data Structures Lists,Vectors, Maps hash list based on Vlists VDList - deques based on Vlists red-black trees Real Time Queues and Deques deques, output-restricted deques binary random access lists binomial heaps skew binary random access lists skew binomial heaps catenable lists heaps with efficient merging catenable deques
  • 34.
    Problem • Not reallya full model • Oriented towards functional programming
  • 35.
    Actors Invented by CarlHewitt at MIT (1973) Formal Model Programming languages Hardware Led to continuations, Scheme Recently revived by Erlang Erlang’s model is not derived explicitly from Actors
  • 36.
  • 37.
    Example object account extendsActor { private var balance = 0 def act() { loop { react { case Withdraw(amount) => balance -= amount sender ! Balance(balance) case Deposit(amount) => balance += amount sender ! Balance(balance) case BalanceRequest => sender ! Balance(balance) case TerminateRequest => } } }
  • 38.
    Problems DOS of theactor mail queue Multiple actor coordination reinvent transactions? Actors can still deadlock and starve Programmer defines granularity by choosing what is an actor
  • 39.
    Impelmentations Scala Akka Actors Scala Actors Lift Actors Erlang OTP CLR F# / Axum
  • 40.
  • 41.
    Erlang vs JVM Erlang per process GC heap tail call distributed JVM per JVM heap tail call (fixed in JSR-292?, at least in scalac) not distributed few kinds of actors (Scala)
  • 42.
    Actor Variants Clojure Agents Designed for loosely coupled stuff Code/actions sent to agents Code is queued when it hits the agent Agent framework guarantees serialization State of agent is always available for read (unlike actors which could be busy processing when you send a read message) not in favor of transparent distribution Clojure agents can operate in an ‘open world’ - actors answer a specific set of messages
  • 43.
    Dataflow Dataflow is asoftware architecture based on the idea that changing the value of a variable should automatically force recalculation of the values of variables which depend on its value Bill Ackerman’s PhD Thesis at MIT (1984) Declarative Concurrency in functional languages Research in the 1980’s and 90’s Inherent concurrency Turns out to be very difficult to implement Interest in declarative concurrency is slowly returning
  • 44.
    The Model Dataflow Variables create variable bind value read value or block Threads Dataflow Streams List whose tail is an unbound dataflow variable Deterministic computation!
  • 45.
    Example: Variables 1 objectTest5 extends App { val x, y, z = new DataFlowVariable[Int] val main = thread { x << 1 if (x() > y()) { z << x } else {z << y } } val setY = thread { Thread.sleep(5000) y << 2 } main ! 'exit setY ! 'exit }
  • 46.
    Example: Streams (Oz) fun{Ints N Max} if N == Max then nil else {Delay 1000} N|{Ints N+1 Max} end end fun {Sum S Stream} case Stream of nil then S [] H|T then S|{Sum H+S T} end end local X Y in thread X = {Ints 0 1000} end thread Y = {Sum 0 X} end {Browse Y} end
  • 47.
    Implementations Mozart Oz http://www.mozart-oz.org/ Akka http://github.com/jboner/scala-dataflow dataflow variables and streams Ruby library http://github.com/larrytheliquid/dataflow dataflow variables and streams Groovy http://code.google.com/p/gparallelizer/
  • 48.
    Problems Can’t handle non-determinism like a server Need ports this leads to actor like things
  • 49.
    Tuple Spaces Model Originatedin Linda (1984) Popularized by Jini
  • 50.
    The Model Three operations write() (out) take() (in) read()
  • 51.
    The Model Space uncoupling Timeuncoupling Readers are decoupled from Writers Content addressable by pattern matching Can emulate Actor like continuations CSP Message Passing Semaphores
  • 52.
    Example public class Accountimplements Entry { public Integer accountNo; public Integer value; public Account() { ... } public Account(int accountNo, int value { this.accountNo = newInteger(accountNo); this.value = newInteger(value); } } try { Account newAccount = new Account(accountNo, value); space.write(newAccount, null, Lease.FOREVER); } space.read(accountNo);
  • 53.
    Implementations Jini/JavaSpaces http://incubator.apache.org/river/RIVER/inde x.html BlitzSpaces http://www.dancres.org/blitz/blitz_js.html PyLinda http://code.google.com/p/pylinda/ Rinda built in to Ruby
  • 54.
    Problems Low level High latencyto the space - the space is contention point / hot spot Scalability More for distribution than concurrency
  • 55.
  • 56.
    Work to bedone More in depth comparisons on 4+ core platforms Higher level frameworks Application architectures/patterns Web Middleware
  • 57.
    Outcomes F..ck the sharedstate, Mutable means non-scalable It is not too early!
  • 58.

Editor's Notes

  • #19 and sometimes called the instruction address register (IAR)[1] or just part of the instruction sequencer,[2] is a processor register that indicates where a computer is in its programsequence.
  • #24 For the concept, see Mutually exclusive events.&quot;mutex&quot; redirects here. For the computer program object that negotiates mutual exclusion among threads, see lock (computer science).Coarse-grained systems consist of fewer, larger components than fine-grained systems; a coarse-grained description of a system regards large subcomponents while a fine-grained description regards smaller components of which the larger ones are composed.
  • #26 - the process calculi (or process algebras) are a diverse family of related approaches for formally modellingconcurrent systems.Communicating Sequential Processes (limbo and Go) - process calculusThe approach taken in developing CSP into a process algebra was influenced by Robin Milner&apos;s work on theCalculus of Communicating Systems (CCS), and vice versa. CCS is useful for evaluating the qualitative correctness of properties of a system such as deadlock or livelockA Petri net (also known as a place/transition net or P/T net) is one of several mathematical modeling languages for the description of distributed systemsPi - as a continuation of work on the process calculus CCS (Calculus of Communicating Systems). The π-calculus allows channel names to be communicated along the channels themselves, and in this way it is able to describe concurrent computations whose network configuration may change during the computation.The join-calculus is a member of the -calculus family of process calculi, and can be considered, at its core, an asynchronous -calculus with several strong restrictions[3]:Scope restriction, reception, and replicated reception are syntactically merged into a single construct, the definition;Communication occurs only on defined names;For every defined name there is exactly one replicated reception.
  • #27 Transactional memory attempts to simplify parallel programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing.A dataflow network is a network of concurrently executing processes or automata that can communicate by sending data over channels (seemessage passing.)A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that there are a group of processors that produce pieces of data and a group of processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor. Tuple space may be thought as a form of distributed shared memory.Tuple spaces were the theoretical underpinning of the Linda language developed by David Gelernter and Nicholas Carriero at Yale University.Implementations of tuple spaces have also been developed for Java (JavaSpaces), Lisp, Lua, Prolog, Python, Ruby, Smalltalk, Tcl, and the.NET framework.
  • #30 . A contention management process is invoked when a conflict occurs between a first transaction and a second transaction. The pre-determined commit order is used in the contention management process to aid in determining whether the first transaction or the second transaction should win the conflict and be allowed to proceed.&quot;
  • #45 Dataflow concurrency is deterministic. This means that it will always behave the same. If you run it once and it yields output 5 then it will do that every time, run it 10 million times, same result. If it on the other hand deadlocks the first time you run it, then it will deadlock every single time you run it. Also, there is no difference between sequential code and concurrent code. These properties makes it very easy to reason about concurrency. The limitation is that the code needs to be side-effect free, e.g. deterministic. You can’t use exceptions, time, random etc., but need to treat the part of your program that uses dataflow concurrency as a pure function with input and output.The best way to learn how to program with dataflow variables is to read the fantastic book Concepts, Techniques, and Models of Computer Programming. By Peter Van Roy and Seif Haridi.