The Feldera team welcomes contributions from the community. Before you start working with Feldera, please
read our Developer Certificate of Origin (DCO).
To acknowledge the DCO, sign your commits by adding Signed-off-by: Your Name <your@email.com> to the last
line of each Git commit message. Your signature certifies that you wrote the patch or have the right to pass
it on as an open-source patch. The e-mail address used to sign must match the e-mail address of the Git
author. If you set your user.name and user.email git config values, you can sign your commit automatically
with git commit -s.
Our team develops and tests using Linux and MacOS. Windows Subsystem for Linux works fine too.
The Feldera container images and CI workflows use Linux. You can see our setup in our Dockerfile.
Our dependencies for building the project are:
- C and C++ compiler toolchain (e.g., gcc, gcc++)
- cmake
- libssl-dev
- libsasl2-dev
- zlib1g-dev
- a Rust tool chain (install rustup and the default toolchain)
- a Java Virtual Machine (at least Java 19)
- maven
- graphviz
- Python 3.10
- Bun (https://bun.sh/docs/installation)
Additional dependencies are automatically installed by the Rust, maven, Python, and TypeScript build tools.
We recommend forking the Feldera repository and contributing from a fork. This page has instructions on how to fork a repository. After forking do not forget to add Feldera as a remote repository:
git remote add upstream https://github.com/feldera/feldera.gitThis is a rough outline of what a contributor's workflow looks like:
- Create a topic branch from where you want to base your work
- Make commits of logical units
- Make sure your commit messages are in the proper format (see below)
- Push your changes to a topic branch in the repository (push to your fork if you don't have commit access to the Feldera repository --- pushing directly to the repo is preferred because then CI will be able to add benchmark results to the PR in the comments).
- Submit a pull request
Example:
git checkout -b my-new-feature main
git commit -a
git push origin my-new-featureWhen your branch gets out of sync with the feldera/main branch, use the following to update:
git checkout my-new-feature
git fetch -a
git pull --rebase upstream main
git push --force-with-lease upstream my-new-featureIf you don't have permissions replace the last command with
git push --force-with-lease origin my-new-feature
If your PR fails to pass CI or needs changes based on code review, you'll most likely want to squash these changes into existing commits.
If your pull request contains a single commit or your changes are related to the most recent commit, you can simply amend the commit.
git add <files to add>
git commit --amend
git push --force-with-lease origin my-new-featureIf you need to squash changes into an earlier commit, you can use:
git add <files to add>
git commit --fixup <commit>
git rebase -i --autosquash main
git push --force-with-lease origin my-new-featureBe sure to add a comment to the PR indicating your new changes are ready to review, as GitHub does not generate a notification when you git push.
Since we run benchmarks as part of the CI, it's a good practice to preserve the commit IDs of the feature branch we've worked on (and benchmarked). Unfortunately, the github UI does not have support for this (it only allows rebase, squash and merge commits to close PRs). Therefore, it's recommended to merge PRs using the following git CLI invocation:
git checkout main
git merge --ff-only feature-branch-name
git push upstream mainExecute the following command to make git push check the code for formatting issues.
GITDIR=$(git rev-parse --git-dir)
ln -sf $(pwd)/scripts/pre-push ${GITDIR}/hooks/pre-pushWe follow the conventions on How to Write a Git Commit Message.
Be sure to include any related GitHub issue references in the commit message. See GFM syntax for referencing issues and commits.
When opening a new issue, try to roughly follow the commit message format conventions above.
Feldera is implemented in Rust and uses Rust's cargo build system. The SQL
to DBSP compiler is implemented in Java and uses maven as its build system.
You can build the rust sources by runnning the following at the top level of this tree.
cargo build
To build the SQL to DBSP compiler, run the following from sql-to-dbsp-compiler:
./build.sh
If you want to develop Feldera without installing the required toolchains
locally, you can use Github Codespaces; from
https://github.com/feldera/feldera, click on the green <> Code button,
then select Codespaces and click on "Create codespace on main".
DBSP is a key crate that powers Feldera's pipelines. To learn how the DBSP core works, we recommend starting with the tutorial.
From the project root:
cargo doc --open
Then search for dbsp::tutorial.
Another good place to start is the circuit::circuit_builder module documentation,
or the examples folder. For more sophisticated examples, try looking
at the nexmark benchmark in the benches directory.
The repository has a number of benchmarks available in the benches directory that provide a comparison of DBSP's
performance against a known set of tests.
Each benchmark has its own options and behavior, as outlined below.
You can run the complete set of Nexmark queries, with the default settings, with:
cargo bench --bench nexmarkBy default this will run each query with a total of 100 million events emitted at 10M per second (by two event generator threads), using 2 CPU cores for processing the data.
To run just the one query, q3, with only 10 million events, but using 8 CPU cores to process the data and 6 event generator threads, you can run:
cargo bench --bench nexmark -- --query q3 --max-events 10000000 --cpu-cores 8 --num-event-generators 6For further options that you can use with the Nexmark benchmark,
cargo bench --bench nexmark -- --helpAn extensive blog post about the implementation of Nexmark in DBSP: https://liveandletlearn.net/post/vmware-take-3-experience-with-rust-and-dbsp/
The pipeline manager serves as the API server for Feldera. It persists API state in a Postgres DB instance. Here are some guidelines when contributing code that affects this database's schema.
- We use SQL migrations to apply the schema to a live database to facilitate upgrades. We use refinery to manage migrations.
- The migration files can be found in
crates/pipeline-manager/migrations - Do not modify an existing migration file. If you want to evolve the schema, add a new SQL or rust file to the
migrations folder
following refinery's versioning and naming scheme. The migration
script should update an existing schema as opposed to assuming a clean slate. For example, use
ALTER TABLEto add a new column to an existing table and fill that column for existing rows with the appropriate defaults. - If you add a new migration script
V{i}, add tests for migrations fromV{i-1}toV{i}. For example, add tests that invoke the pipeline manager APIs before and after the migration.
By default, the pipeline-manager and pipelines install a tracing subscriber which logs
the Feldera crates at INFO level and all other crates at WARN level.
This can be overridden by setting the RUST_LOG environment variable.
For example, the following would be the same as the default with additionally
backtrace enabled:
RUST_BACKTRACE=1 RUST_LOG=warn,pipeline_manager=info,feldera_types=info,project=info,dbsp=info,dbsp_adapters=info,dbsp_nexmark=info cargo run --package=pipeline-manager --bin pipeline-manager -- --dev-modeThe release process is done through github actions. Launch the "Create a release" action manually from the github actions UI. You have to provide a Git SHA which you want to release as the new version. The release CI scripts will then run in this order:
- ci-release.yml
- Publishes a new release on github for the commit
- Adds the binaries that we built during the merge queue
- Tags the docker image that we also built during the merge queue as $version and
latest
- Publishes a new release on github for the commit
- ci-post-release.yml
- Releases the python library to pypi
- Releases the rust crates to crates.io
- Determines the next version (this is controlled by a
RELEASE_NEXT_VERSIONvariable in the repo settings) - Bumps the versions in Cargo.toml and pyproject.toml and openapi.yaml to the next version
- Commits and pushes the changes to main
Note that the release process requires that the commit you want to release was merged into main through the merge queue, otherwise the build artifacts will not be available.