Skip to content

Add per-index default search parameters (default_probes, default_ef_search)#933

Open
lossyrob wants to merge 21 commits intopgvector:masterfrom
lossyrob:feature/235-search-defaults-index-option
Open

Add per-index default search parameters (default_probes, default_ef_search)#933
lossyrob wants to merge 21 commits intopgvector:masterfrom
lossyrob:feature/235-search-defaults-index-option

Conversation

@lossyrob
Copy link

@lossyrob lossyrob commented Dec 5, 2025

Search Defaults Index Option

Summary

This PR adds per-index default search parameters (default_probes for IVFFlat and default_ef_search for HNSW) that automatically configure search behavior without requiring session-level SET commands. Index defaults take effect when no explicit session setting is active, while still respecting user overrides.

Problem Solved

Before this feature, configuring search parameters like the number of IVFFlat probes or HNSW ef_search required:

  • Setting session-level GUC variables before each query
  • Wrapping queries in transaction blocks with SET LOCAL statements
  • Using database-wide defaults that apply to all indexes

This created complexity when managing multiple tables with different accuracy/performance tradeoffs, partitioned tables with per-partition indexes, or applications sharing database connections.

Solution

Per-index defaults allow administrators to specify optimal search parameters at index creation time. Queries automatically use the appropriate settings based on which index is selected.

Related Issues

Artifacts

Changes Summary

Key Changes

  • New IVFFlat option: default_probes - specifies default number of probes when no session SET is active
  • New HNSW option: default_ef_search - specifies default ef_search value when no session SET is active
  • Precedence rules: Explicit SET > Index default > GUC default
  • Cost estimation: Query planner considers per-index defaults for accurate cost estimates
  • ALTER INDEX support: Modify defaults after creation without index rebuild

Files Modified

  • src/ivfflat.h, src/ivfflat.c, src/ivfutils.c, src/ivfscan.c - IVFFlat support
  • src/hnsw.h, src/hnsw.c, src/hnswutils.c, src/hnswscan.c - HNSW support
  • test/sql/ivfflat_vector.sql, test/expected/ivfflat_vector.out - IVFFlat tests
  • test/sql/hnsw_vector.sql, test/expected/hnsw_vector.out - HNSW tests
  • README.md - User documentation
  • CHANGELOG.md - Release notes

Usage Examples

IVFFlat

-- Create index with default probes
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) 
    WITH (lists = 100, default_probes = 10);

-- Queries automatically use 10 probes without SET command
SELECT * FROM items ORDER BY embedding <-> '[1,2,3]' LIMIT 10;

-- Override with session SET when needed
SET ivfflat.probes = 20;
SELECT * FROM items ORDER BY embedding <-> '[1,2,3]' LIMIT 10;

HNSW

-- Create index with default ef_search
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) 
    WITH (default_ef_search = 100);

-- Queries automatically use ef_search=100
SELECT * FROM items ORDER BY embedding <-> '[1,2,3]' LIMIT 10;

Modify After Creation

ALTER INDEX idx_items SET (default_probes = 15);
ALTER INDEX idx_items RESET (default_probes);  -- Remove default

Testing

All 14 regression tests pass:

ok 1  - bit                                        43 ms
ok 2  - btree                                     113 ms
ok 3  - cast                                       69 ms
ok 4  - copy                                      125 ms
ok 5  - halfvec                                    92 ms
ok 6  - hnsw_bit                                   93 ms
ok 7  - hnsw_halfvec                              360 ms
ok 8  - hnsw_sparsevec                            299 ms
ok 9  - hnsw_vector                               360 ms
ok 10 - ivfflat_bit                                76 ms
ok 11 - ivfflat_halfvec                           262 ms
ok 12 - ivfflat_vector                            312 ms
ok 13 - sparsevec                                  52 ms
ok 14 - vector_type                                68 ms
# All 14 tests passed.

Test Coverage

  • Index default used when GUC not explicitly SET
  • Explicit SET overrides index default
  • RESET returns to using index default
  • ALTER INDEX can modify defaults without rebuild
  • Sentinel value (0) acts as unset
  • Invalid values rejected

Acceptance Criteria

  • IVFFlat index with default_probes = N uses N probes during search
  • HNSW index with default_ef_search = N uses ef_search of N during search
  • Explicit SET in session overrides any index default
  • Indexes without new options behave identically to current behavior
  • Query planner cost estimates reflect per-index defaults
  • ALTER INDEX can add, modify, or remove default search parameters
  • Index default value of 0 treated as "unset"
  • SET LOCAL within a transaction overrides index defaults

Deployment Considerations

  • No SQL migration required - Index options are automatically available after C code registration
  • Backward compatible - Existing indexes without new options behave identically to current behavior
  • No data migration needed - Options stored in index metadata automatically

Breaking Changes

None. This is a purely additive feature that preserves full backward compatibility.

- Spec.md: Feature specification for per-index default search parameters
- SpecResearch.md: Research on GUC source tracking and reloptions
- CodeResearch.md: Code analysis of existing pgvector patterns
- ImplementationPlan.md: 5-phase implementation plan

Implements RFC from pgvector#235
…-option_plan

[Search Defaults Index Option] Planning: Implementation plan for per-index search defaults
Phase 1 of implementing per-index default search parameters (Issue pgvector#235).

This commit extends the index options to support:
- default_probes for IVFFlat indexes
- default_ef_search for HNSW indexes

Changes:
- Added defaultProbes field to IvfflatOptions struct
- Added defaultEfSearch field to HnswOptions struct
- Registered new reloptions in IvfflatInit() and HnswInit()
- Added parsing entries in ivfflatoptions() and hnswoptions()
- Implemented getter functions IvfflatGetDefaultProbes() and HnswGetDefaultEfSearch()

The new options use 0 as a sentinel value meaning "unset", which allows
the existing GUC defaults to take precedence when no index default is specified.

This phase establishes the foundation; subsequent phases will add:
- GUC source detection for precedence resolution
- Scan integration to use effective values
- Cost estimation updates
- Test coverage
…-option_phase1

[Search Defaults Index Option] Phase 1: Index Option Infrastructure
Add IvfflatGetEffectiveProbes() and HnswGetEffectiveEfSearch() functions
that implement the precedence rules for resolving search parameters:
1. Explicit SET command takes precedence (source == PGC_S_SESSION)
2. Index default value if set (> 0)
3. GUC default value

These functions use PostgreSQL's find_option() API to detect whether a
GUC was explicitly set in the current session, enabling the per-index
default to take effect when users haven't explicitly overridden it.

Phase 2 of Search Defaults Index Option implementation.
…-option_phase2

[Search Defaults Index Option] Phase 2: GUC Resolution Logic
Update scan functions to use the new effective value resolution functions
instead of directly reading GUC variables:

- ivfflatbeginscan(): Use IvfflatGetEffectiveProbes(index) instead of
  ivfflat_probes for determining the number of probes to use

- GetScanItems(): Use HnswGetEffectiveEfSearch(index) instead of
  hnsw_ef_search for the HNSW search layer ef parameter

- ResumeScanItems(): Use HnswGetEffectiveEfSearch(index) instead of
  hnsw_ef_search for the batch size in iterative scans

This implements the precedence rules where:
1. Explicit SET command takes precedence
2. Index default value (if set via default_probes/default_ef_search option)
3. GUC default value (ivfflat.probes=1 or hnsw.ef_search=40)

Phase 3 of 235-search-defaults-index-option implementation.
…-option_phase3

[Search Defaults Index Option] Phase 3: Scan Integration
- ivfflatcostestimate(): Use IvfflatGetEffectiveProbes(index) instead of
  directly reading ivfflat_probes GUC, enabling index-specific default
  probes to influence cost estimates

- hnswcostestimate(): Use HnswGetEffectiveEfSearch(index) instead of
  directly reading hnsw_ef_search GUC, enabling index-specific default
  ef_search to influence cost estimates

This ensures the query planner uses the same effective search parameter
values that will be used at scan time, improving plan quality when
indexes have per-index search defaults configured.

Implementation notes:
- Moved index_close() after the effective value function calls to ensure
  the index relation remains valid when accessing rd_options
- All 14 existing regression tests pass
…-option_phase4

[Search Defaults Index Option] Implementation Phase 4: Cost Estimation
Phase 5: Comprehensive test coverage for index search parameter defaults

IVFFlat tests (test/sql/ivfflat_vector.sql):
- Test CREATE INDEX with default_probes option
- Test query using index default when GUC not explicitly SET
- Test explicit SET ivfflat.probes overrides index default
- Test RESET ivfflat.probes returns to using index default
- Test ALTER INDEX changes default_probes value
- Test ALTER INDEX RESET removes default_probes
- Test default_probes = 0 acts as unset (uses GUC default)
- Test invalid values rejected (default_probes = -1)

HNSW tests (test/sql/hnsw_vector.sql):
- Test CREATE INDEX with default_ef_search option
- Test query using index default when GUC not explicitly SET
- Test explicit SET hnsw.ef_search overrides index default
- Test RESET hnsw.ef_search returns to using index default
- Test ALTER INDEX changes default_ef_search value
- Test ALTER INDEX RESET removes default_ef_search
- Test default_ef_search = 0 acts as unset (uses GUC default)
- Test invalid values rejected (default_ef_search = -1)

All 14 regression tests pass.
…-option_phase5

[Search Defaults Index Option] Phase 5: Tests
- Create comprehensive Docs.md for the Search Defaults Index Option feature
- Update README.md with documentation for new index options
- Add CHANGELOG entry for the new feature
…-option_docs

[Search Defaults Index Option] Documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Setting search defaults as an index option

1 participant