Skip to content

feat(table): add LsmWriteSpec for the MemWAL LSM write path#3396

Open
touch-of-grey wants to merge 1 commit into
lancedb:mainfrom
touch-of-grey:MergeInsertSpec
Open

feat(table): add LsmWriteSpec for the MemWAL LSM write path#3396
touch-of-grey wants to merge 1 commit into
lancedb:mainfrom
touch-of-grey:MergeInsertSpec

Conversation

@touch-of-grey
Copy link
Copy Markdown
Contributor

@touch-of-grey touch-of-grey commented May 17, 2026

Summary

Adds LsmWriteSpec and Table::set_lsm_write_spec / unset_lsm_write_spec to
install and clear the spec that selects Lance's MemWAL LSM-style write path:

  • LsmWriteSpec::bucket(column, num_buckets) hash-partitions writes by the
    unenforced primary key column; LsmWriteSpec::unsharded() uses a single
    MemWAL shard. with_maintained_indexes(...) lists indexes the MemWAL keeps
    up to date.
  • set_lsm_write_spec persists the spec in the MemWAL index;
    unset_lsm_write_spec removes it (dropping the MemWAL index), reverting to
    the standard merge_insert path. unset is idempotent.
  • Bindings: Python and TypeScript (setLsmWriteSpec / unsetLsmWriteSpec).
    RemoteTable returns NotSupported.

The actual merge_insert LSM dispatch and ShardWriter write path are a
follow-up PR — this PR only sets and clears the spec.

Context

Split out from #3354; the unenforced primary key half landed in #3394.
Addresses review feedback: renamed from MergeInsertSpec (the spec is reusable
beyond merge_insert); runtime-only knobs (writer tuning, per-row validation)
are kept out of the persisted spec; and an unset was added.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added enhancement New feature or request Python Python SDK Rust Rust related issues labels May 17, 2026
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the other PR, should add typescript support

Comment thread python/src/table.rs Outdated
}

#[pymethods]
impl MergeInsertSpec {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should call this LsmWriteSpec, since we should be able to reuse it also for inserts for tables without primary key

Comment thread python/src/table.rs Outdated
})
}

pub fn set_merge_insert_spec<'a>(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add an unset function so that people can revert to the old behavior if necessary

Comment thread python/src/table.rs Outdated
/// row of each input batch is hashed (to pick the target writer).
/// The caller becomes responsible for pre-sharding the input.
/// No-op for the unsharded variant.
pub fn assume_pre_sharded(&self) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the information of this API is not really stored, it should be supplied at runtime when running merge insert, not a part of the persisted sharding spec.


let mut dataset = (*table.dataset.get().await?).clone();
dataset
.initialize_mem_wal_with_shards(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examine what information exactly are persisted by the MemWAL index, we should not record other unnecessary information in the spec. Things like tunings and assume pre-sharded are runtime configs that should not be a part of the shard spec, but should be added into the specific function call.

Comment thread rust/lancedb/src/table/merge.rs Outdated
/// `num_updated_rows`, and `num_deleted_rows` are all zero and this field
/// holds the total row count written.
#[serde(default)]
pub num_rows: u64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the merge result and merge writer related changes should be added separatedly, this PR should only concern setting and clearing the spec.

Add LsmWriteSpec (Bucket / Unsharded) and Table::set_lsm_write_spec /
unset_lsm_write_spec to install and clear the spec that selects Lance's
MemWAL LSM write path. The actual merge_insert dispatch and writer are a
follow-up. Python and TypeScript bindings included.

Split out from lancedb#3354; the unenforced primary key half landed in lancedb#3394.
@touch-of-grey touch-of-grey changed the title feat(table): add MergeInsertSpec for MemWAL ShardWriter merge_insert feat(table): add LsmWriteSpec for the MemWAL LSM write path May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Python Python SDK Rust Rust related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants