Skip to content

Conversation

@benjamin-awd
Copy link
Contributor

Summary

This PR adds a generic Arrow codec to support Apache Arrow IPC serialization. This will enable sinks (e.g. ClickHouse) to serialize and transmit structured events efficiently as compared to row+text-based formats. By introducing a unified Arrow serialization layer, Vector can now interoperate more easily with Arrow-native systems and improve performance for columnar workflows.

Vector configuration

An example of how this configuration would look with a sink:

sinks:
  type: clickhouse
  host: http://localhost:8123
  table: my_table
  batch_encoding:
     codec: arrow_stream

How did you test this PR?

Tested using a Clickhouse sink implementation (not included in this PR in order to keep the scope limited)

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Split from: #24075 (comment) Related: #24074 (requires this to be implemented)
Related: #1374 -- should hopefully allow this to move forward

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@benjamin-awd benjamin-awd requested a review from a team as a code owner October 31, 2025 16:00
@github-actions github-actions bot added domain: sinks Anything related to the Vector's sinks domain: codecs Anything related to Vector's codecs (encoding/decoding) labels Oct 31, 2025
@benjamin-awd benjamin-awd changed the title Add ch arrow codec Add Arrow IPC Stream batch codec Oct 31, 2025
@benjamin-awd benjamin-awd changed the title Add Arrow IPC Stream batch codec feat(codecs): add arrow IPC stream batch codec Oct 31, 2025
@sonnens
Copy link
Contributor

sonnens commented Oct 31, 2025

This is great, I feel like it'd be trivial (in a different PR, I mean) to leverage it to add parquet support, since it's basically the same

@pront pront self-assigned this Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: codecs Anything related to Vector's codecs (encoding/decoding) domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants