Conversation
56f4f84 to
52065ab
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
Three hard blocks, one correctness concern.
mythical-fred
left a comment
There was a problem hiding this comment.
Two code bugs (inline) plus missing docs — blocking on the docs.
threads and fatal_on_max_retries are new user-visible config options but neither appears in docs.feldera.com/docs/connectors/sinks/delta.md. The Documentation checkbox is unchecked. Please add entries to the config table for both options.
6738817 to
ac1dfbd
Compare
Use SplitCursor to split the batch and distribute it across tasks, each task retries encoding and writing to delta lake and then returns Add actions which main task retries to commit to delta lake Signed-off-by: Swanand Mulay <73115739+swanandx@users.noreply.github.com>
ac1dfbd to
adf3e2c
Compare
ryzhyk
left a comment
There was a problem hiding this comment.
Since we don't do anything meaningful with fatal errors today, let's not bother adding that option. We'll do another pass to rework connector error handling later.
| } | ||
|
|
||
| // prints timing for manual comparison, and asserts output correctness | ||
| #[test] |
There was a problem hiding this comment.
Let's turn this into a proper benchmark under adapters/bench.
What benchmark results did you get?
There was a problem hiding this comment.
$cargo bench --package dbsp_adapters --bench delta_encoder --features with-deltalake
Running benches/delta_encoder.rs (target/release/deps/delta_encoder-836871c7c7c0c9cd)
Gnuplot not found, using plotters backend
delta_indexed_encode/workers/1
time: [35.170 ms 35.201 ms 35.233 ms]
thrpt: [2.8382 Melem/s 2.8408 Melem/s 2.8433 Melem/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
delta_indexed_encode/workers/2
time: [24.756 ms 25.310 ms 25.873 ms]
thrpt: [3.8650 Melem/s 3.9511 Melem/s 4.0394 Melem/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
delta_indexed_encode/workers/4
time: [15.615 ms 15.899 ms 16.199 ms]
thrpt: [6.1734 Melem/s 6.2897 Melem/s 6.4042 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
delta_indexed_encode/workers/8
time: [9.1359 ms 9.2460 ms 9.3869 ms]
thrpt: [10.653 Melem/s 10.816 Melem/s 10.946 Melem/s]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
mythical-fred
left a comment
There was a problem hiding this comment.
My previous two blockers (schema minimum = 1 and docs for threads) are resolved — thank you. One remaining issue:
| pub max_retries: Option<u32>, | ||
|
|
||
| /// Number of parallel threads used by the connector. | ||
| /// |
There was a problem hiding this comment.
Commit message requested changes on the latest commit is a hard block. Please squash or rebase: git rebase -i and rewrite it with an imperative summary describing what changed, e.g. [adapters] Use Option<usize> for threads; remove fatal_on_max_retries. Resources: https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
ryzhyk
left a comment
There was a problem hiding this comment.
Thanks for addressing all comments! I only have a couple of questions left (inline)
| if threads > 1 && key_schema.is_none() { | ||
| return Err(ControllerError::invalid_transport_configuration( | ||
| endpoint_name, | ||
| "parallel writes (threads > 1) require an index (key_schema) to ensure \ |
There was a problem hiding this comment.
"Parallel writes (threads > 1) require the view to have a unique key to ensure correct ordering of inserts and deletes. Please specify the `index` property in the connector configuration. For more details, see: https://docs.feldera.com/connectors/unique_keys"
| // cargo bench -p dbsp_adapters --bench delta_encoder | ||
| #[test] | ||
| #[ignore] | ||
| fn bench_parallel_encoding() { |
There was a problem hiding this comment.
why keep the test and why #[ignore] it? If it improves test coverage, perhaps we can make a version with less data that can run as a test?
| let mut storage_options: serde_json::Value = serde_json::to_value(object_store_config).unwrap(); | ||
| storage_options["uri"] = json!(table_uri); | ||
| storage_options["mode"] = json!("truncate"); | ||
| if let Some(threads) = threads { |
There was a problem hiding this comment.
I don't see it used in any test.
Describe Manual Test Plan
Added unit tests and ran locally
for write retry, I changed local permissions for local path to simulate write failure
and to restore
& commit retries were tried by renaming the directory
what can be tested further:
Checklist
Breaking Changes?
we're adding new config options for delta connector but should not be a breaking change.