Taskflow

Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++

Why Taskflow?

Taskflow is faster, more expressive, and easier for drop-in integration than many of existing task programming frameworks in handling complex parallel workloads.

Taskflow lets you quickly implement task decomposition strategies that incorporate both regular and irregular compute patterns, together with an efficient work-stealing scheduler to optimize your multithreaded performance.

Static Tasking	Subflow Tasking

Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions that were otherwise difficult to do with existing tools.

Conditional Tasking

Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.

Taskflow Composition

Taskflow supports heterogeneous tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing.

Concurrent CPU-GPU Tasking

Taskflow provides visualization and tooling needed for profiling Taskflow programs.

Taskflow Profiler

We are committed to support trustworthy developments for both academic and industrial research projects in parallel computing. Check out Who is Using Taskflow and what our users say:

"Taskflow is the cleanest Task API I've ever seen." Damien Hocking @Corelium Inc
"Taskflow has a very simple and elegant tasking interface. The performance also scales very well." Glen Fraser
"Taskflow lets me handle parallel processing in a smart way." Hayabusa @Learning
"Taskflow improves the throughput of our graph engine in just a few hours of coding." Jean-Michaël @KDAB
"Best poster award for open-source parallel programming library." Cpp Conference 2018
"Second Prize of Open-source Software Competition." ACM Multimedia Conference 2019

See a quick poster presentation below and visit the documentation to learn more about Taskflow. Technical details can be referred to our IEEE TPDS paper.

Start Your First Taskflow Program

The following program (simple.cpp) creates a taskflow of four tasks A, B, C, and D, where A runs before B and C, and D runs after B and C. When A finishes, B and C can run in parallel. Try it live on Compiler Explorer (godbolt)!

#include <taskflow/taskflow.hpp>  // Taskflow is header-only

int main(){
  
  tf::Executor executor;
  tf::Taskflow taskflow;

  auto [A, B, C, D] = taskflow.emplace(  // create four tasks
    [] () { std::cout << "TaskA\n"; },
    [] () { std::cout << "TaskB\n"; },
    [] () { std::cout << "TaskC\n"; },
    [] () { std::cout << "TaskD\n"; } 
  );                                  
                                      
  A.precede(B, C);  // A runs before B and C
  D.succeed(B, C);  // D runs after  B and C
                                      
  executor.run(taskflow).wait(); 

  return 0;
}

Taskflow is header-only and there is no wrangle with installation. To compile the program, clone the Taskflow project and tell the compiler to include the headers.

~$ git clone https://github.com/taskflow/taskflow.git  # clone it only once
~$ g++ -std=c++20 examples/simple.cpp -I. -O2 -pthread -o simple
~$ ./simple
TaskA
TaskC 
TaskB 
TaskD

Visualize Your First Taskflow Program

Taskflow comes with a built-in profiler, TFProf, for you to profile and visualize taskflow programs in an easy-to-use web-based interface.

# run the program with the environment variable TF_ENABLE_PROFILER enabled
~$ TF_ENABLE_PROFILER=simple.json ./simple
~$ cat simple.json
[
{"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]}
]
# paste the profiling json data to https://taskflow.github.io/tfprof/

In addition to execution diagram, you can dump the graph to a DOT format and visualize it using a number of free GraphViz tools.

// dump the taskflow graph to a DOT format through std::cout
taskflow.dump(std::cout);

Express Task Graph Parallelism

Taskflow empowers users with both static and dynamic task graph constructions to express end-to-end parallelism in a task graph that embeds in-graph control flow.

Create a Subflow Graph

Taskflow supports dynamic tasking for you to create a subflow graph from the execution of a task to perform dynamic parallelism. The following program spawns a task dependency graph parented at task B.

tf::Task A = taskflow.emplace([](){}).name("A");  
tf::Task C = taskflow.emplace([](){}).name("C");  
tf::Task D = taskflow.emplace([](){}).name("D");  

tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) { 
  tf::Task B1 = subflow.emplace([](){}).name("B1");  
  tf::Task B2 = subflow.emplace([](){}).name("B2");  
  tf::Task B3 = subflow.emplace([](){}).name("B3");  
  B3.succeed(B1, B2);  // B3 runs after B1 and B2
}).name("B");

A.precede(B, C);  // A runs before B and C
D.succeed(B, C);  // D runs after  B and C

Integrate Control Flow to a Task Graph

Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions in an end-to-end task graph.

tf::Task init = taskflow.emplace([](){}).name("init");
tf::Task stop = taskflow.emplace([](){}).name("stop");

// creates a condition task that returns a random binary
tf::Task cond = taskflow.emplace(
  [](){ return std::rand() % 2; }
).name("cond");

init.precede(cond);

// creates a feedback loop {0: cond, 1: stop}
cond.precede(cond, stop);

Offload a Task to a GPU

Taskflow supports GPU tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing using Nvidia CUDA Graph.

__global__ void saxpy(size_t N, float alpha, float* dx, float* dy) {
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n) {
    y[i] = a*x[i] + y[i];
  }
}
  
// create a CUDA Graph task
tf::Task cudaflow = taskflow.emplace([&]() {
  tf::cudaGraph cg;
  tf::cudaTask h2d_x = cg.copy(dx, hx.data(), N);
  tf::cudaTask h2d_y = cg.copy(dy, hy.data(), N);
  tf::cudaTask d2h_x = cg.copy(hx.data(), dx, N);
  tf::cudaTask d2h_y = cg.copy(hy.data(), dy, N);
  tf::cudaTask saxpy = cg.kernel((N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy);
  saxpy.succeed(h2d_x, h2d_y)
       .precede(d2h_x, d2h_y);
  
  // instantiate an executable CUDA graph and run it through a stream
  tf::cudaGraphExec exec(cg);
  tf::cudaStream stream;
  stream.run(exec).synchronize();
}).name("CUDA Graph Task");

Compose Task Graphs

Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.

tf::Taskflow f1, f2;

// create taskflow f1 of two tasks
tf::Task f1A = f1.emplace([]() { std::cout << "Task f1A\n"; })
                 .name("f1A");
tf::Task f1B = f1.emplace([]() { std::cout << "Task f1B\n"; })
                 .name("f1B");

// create taskflow f2 with one module task composed of f1
tf::Task f2A = f2.emplace([]() { std::cout << "Task f2A\n"; })
                 .name("f2A");
tf::Task f2B = f2.emplace([]() { std::cout << "Task f2B\n"; })
                 .name("f2B");
tf::Task f2C = f2.emplace([]() { std::cout << "Task f2C\n"; })
                 .name("f2C");

tf::Task f1_module_task = f2.composed_of(f1)
                            .name("module");

f1_module_task.succeed(f2A, f2B)
              .precede(f2C);

Launch Asynchronous Tasks

Taskflow supports asynchronous tasking. You can launch tasks asynchronously to dynamically explore task graph parallelism.

tf::Executor executor;

// create asynchronous tasks directly from an executor
std::future<int> future = executor.async([](){ 
  std::cout << "async task returns 1\n";
  return 1;
}); 
executor.silent_async([](){ std::cout << "async task does not return\n"; });

// create asynchronous tasks with dynamic dependencies
tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
tf::AsyncTask D = executor.silent_dependent_async([](){ printf("D\n"); }, B, C);

executor.wait_for_all();

Execute a Taskflow

The executor provides several thread-safe methods to run a taskflow. You can run a taskflow once, multiple times, or until a stopping criteria is met. These methods are non-blocking with a tf::Future<void> return to let you query the execution status.

// runs the taskflow once
tf::Future<void> run_once = executor.run(taskflow); 

// wait on this run to finish
run_once.get();

// run the taskflow four times
executor.run_n(taskflow, 4);

// runs the taskflow five times
executor.run_until(taskflow, [counter=5](){ return --counter == 0; });

// block the executor until all submitted taskflows complete
executor.wait_for_all();

Leverage Standard Parallel Algorithms

Taskflow defines algorithms for you to quickly express common parallel patterns using standard C++ syntaxes, such as parallel iterations, parallel reductions, and parallel sort.

tf::Task task1 = taskflow.for_each( // assign each element to 100 in parallel
  first, last, [] (auto& i) { i = 100; }    
);
tf::Task task2 = taskflow.reduce(   // reduce a range of items in parallel
  first, last, init, [] (auto a, auto b) { return a + b; }
);
tf::Task task3 = taskflow.sort(     // sort a range of items in parallel
  first, last, [] (auto a, auto b) { return a < b; }
);

Additionally, Taskflow provides composable graph building blocks for you to efficiently implement common parallel algorithms, such as parallel pipeline.

// create a pipeline to propagate five tokens through three serial stages
tf::Pipeline pl(num_parallel_lines,
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    if(pf.token() == 5) {
      pf.stop();
    }
  }},
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    printf("stage 2: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  }},
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    printf("stage 3: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  }}
);
taskflow.composed_of(pl)
executor.run(taskflow).wait();

Workflow: High-Level Declarative Dataflow Library

The Workflow library is a high-level declarative dataflow abstraction built on top of Taskflow, providing:

🔑 Key-based I/O: All inputs/outputs accessed via string keys for clarity and flexibility
🚀 Declarative API: Create nodes with input specifications; dependencies auto-inferred
⚡ Type Safety: Compile-time type-safe nodes (TypedNode) for zero-overhead performance
🔀 Runtime Flexibility: Dynamic type handling (AnyNode) for heterogeneous data
🔗 Unified Interface: Polymorphic INode base class for all node types
🎨 Graph Builder: High-level API managing construction, execution, and visualization

What's New (Workflow)

create_subtask(name, builder_fn): build-and-run a fresh subgraph at task execution (recommended for loop bodies).
Sink callbacks: create_any_sink(..., callback) and create_typed_sink<...>(..., callback) to collect/process final results.
Cleaner console: internal node prints removed; use callbacks for precise logs.

Quick Example (Declarative API)

#include <workflow/nodeflow.hpp>
#include <taskflow/taskflow.hpp>

int main() {
  namespace wf = workflow;
  tf::Executor executor;
  wf::GraphBuilder builder("my_workflow");

  // Create source node with output keys
  auto [A, tA] = builder.create_typed_source("A",
    std::make_tuple(3.5, 7),
    std::vector<std::string>{"x", "k"}
  );

  // Create typed nodes with automatic dependency inference
  auto [B, tB] = builder.create_typed_node<double>("B",
    {{"A", "x"}},  // Input: from A's "x" output
    [](const std::tuple<double>& in) {
      return std::make_tuple(std::get<0>(in) + 1.0);
    },
    {"b"}  // Output key
  );

  auto [C, tC] = builder.create_typed_node<double>("C",
    {{"A", "x"}},
    [](const std::tuple<double>& in) {
      return std::make_tuple(2.0 * std::get<0>(in));
    },
    {"c"}
  );

  auto [D, tD] = builder.create_typed_node<double, double>("D",
    {{"B", "b"}, {"C", "c"}},  // Multiple inputs
    [](const std::tuple<double, double>& in) {
      return std::make_tuple(std::get<0>(in) * std::get<1>(in));
    },
    {"prod"}
  );

  // Create sink node
  auto [H, tH] = builder.create_any_sink("H",
    {{"D", "prod"}}
  );

  // No manual dependencies needed! Auto-inferred from input specs:
  // - B depends on A (via {"A", "x"})
  // - C depends on A (via {"A", "x"})
  // - D depends on B, C (via {{"B", "b"}, {"C", "c"}})
  // - H depends on D (via {"D", "prod"})

  builder.run(executor);
  builder.dump(std::cout);  // Visualize graph
  return 0;
}

Key Features:

✅ Key-based I/O: Inputs/outputs accessed via string keys ({"A", "x"})
✅ Automatic Dependencies: Dependencies inferred from input specifications
✅ Type Inference: Output types auto-inferred from functor return type
✅ Adapter Tasks: Automatically created for Typed → Any connections

Building Workflow Examples

Enable the workflow library during CMake configuration:

mkdir build && cd build
cmake .. -DTF_BUILD_WORKFLOW=ON
cmake --build . --target declarative_example
./workflow/declarative_example

# Or build advanced control flow example
cmake --build . --target advanced_control_flow
./workflow/advanced_control_flow

Advanced Control Flow: The workflow library also supports condition nodes, multi-condition nodes, pipeline nodes, and loop nodes with declarative API. See workflow/examples/advanced_control_flow.cpp for examples.

DOT snapshots (examples):

loop_only: Loop (diamond) -> LoopBody [0], LoopExit [1]
advanced_control_flow: B (diamond)-> C/D; F (diamond)-> G/H/I; Loop (diamond)-> LoopBody/LoopExit

See workflow/README.md and readme/guide_workflow.md for detailed documentation and technical roadmap.

Supported Compilers

To use Taskflow, you only need a compiler that supports C++17:

GNU C++ Compiler at least v8.4 with -std=c++17
Clang C++ Compiler at least v6.0 with -std=c++17
Microsoft Visual Studio at least v19.14 with /std:c++17
AppleClang Xcode Version at least v12.0 with -std=c++17
Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
Intel C++ Compiler at least v19.0.1 with -std=c++17
Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

Although Taskflow supports primarily C++17, you can enable C++20 compilation through -std=c++20 (or /std:c++20 for MSVC) to achieve better performance due to new C++20 features.

Learn More about Taskflow

Visit our project website and documentation to learn more about Taskflow. To get involved:

See release notes to stay up-to-date with newest versions
Read the step-by-step tutorial at cookbook
Submit an issue at GitHub issues
Find out our technical details at references
Watch our technical talks at YouTube

We are committed to support trustworthy developments for both academic and industrial research projects in parallel and heterogeneous computing. If you are using Taskflow, please cite the following paper we published at 2021 IEEE TPDS:

Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin, "Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 33, no. 6, pp. 1303-1320, June 2022

More importantly, we appreciate all Taskflow contributors and the following organizations for sponsoring the Taskflow project!

Taskflow project is also supported by ADS.FUND.

License

Taskflow is licensed with the MIT License. You are completely free to re-distribute your work derived from Taskflow.

Name		Name	Last commit message	Last commit date
Latest commit History 2,613 Commits
.github		.github
.vscode		.vscode
3rd-party		3rd-party
benchmarks		benchmarks
cmake		cmake
docs		docs
doxygen		doxygen
examples		examples
image		image
readme		readme
sandbox		sandbox
scripts		scripts
taskflow		taskflow
tfprof		tfprof
unittests		unittests
workflow		workflow
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
TaskflowConfig.cmake.in		TaskflowConfig.cmake.in
adsfund.json		adsfund.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taskflow

Why Taskflow?

Start Your First Taskflow Program

Visualize Your First Taskflow Program

Express Task Graph Parallelism

Create a Subflow Graph

Integrate Control Flow to a Task Graph

Offload a Task to a GPU

Compose Task Graphs

Launch Asynchronous Tasks

Execute a Taskflow

Leverage Standard Parallel Algorithms

Workflow: High-Level Declarative Dataflow Library

What's New (Workflow)

Quick Example (Declarative API)

Building Workflow Examples

Supported Compilers

Learn More about Taskflow

License

About

Uh oh!

Releases

Packages

Languages

License

Mapoet/taskflow

Folders and files

Latest commit

History

Repository files navigation

Taskflow

Why Taskflow?

Start Your First Taskflow Program

Visualize Your First Taskflow Program

Express Task Graph Parallelism

Create a Subflow Graph

Integrate Control Flow to a Task Graph

Offload a Task to a GPU

Compose Task Graphs

Launch Asynchronous Tasks

Execute a Taskflow

Leverage Standard Parallel Algorithms

Workflow: High-Level Declarative Dataflow Library

What's New (Workflow)

Quick Example (Declarative API)

Building Workflow Examples

Supported Compilers

Learn More about Taskflow

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages