Optimize visiting neighbors by hlinnaka · Pull Request #386 · pgvector/pgvector

hlinnaka · 2023-12-22T14:39:30Z

Here are two more micro-optimizations of HnswSearchLayer, for HNSW index build

1st Commit: Add a bulk-variant of AddToVisited

The idea is to move code around so that we collect all the 'hash' values to an array in a tight loop, before performing the hash table lookups. This codepath causes a lot of CPU cache misses, as the elements are scattered around memory, and performing all the fetches upfront allows the CPU to schedule fetching those cachelines sooner. That's my theory of why this works, anyway :-).

This gives a 5%-10% speedup on my laptop, on HNSW index build of a subset of the 'angular-100' dataset (same test I used on my previous PRs). I'd love to hear how this performs on other systems, as this could be very dependent on CPU details.

2nd & 3rd commits: Calculate 4 neighbor distances at a time in HNSW search

This is just a proof of concept at this stage, but shows promising results. The idea is to have a variant of the distance function that calculates the distance from one point 'q' to 4 other points in one call. That gives the vectorized loop in the distance function more work to do in each iteration. If you think this is worthwhile, I can spend more time polishing this, adding these array-variants as proper AM support functions, etc.

This gives another 10% speedup on the same test on my laptop. It could possibly be optimized further, by providing variants with different array sizes, or a variable-length version and let the function itself vectorize it in the optimal way. With some refactoring, I think we could also use this in CheckElementCloser(). This might also work well together with PR #311, but I haven't tested that.

jkatz · 2023-12-22T18:21:46Z

🚀 I still have on my TODO to run some serious benchmarks on these patches; I'll work to get those up and running to see how it performs under different contexts on some larger machines. Given they're focused on index building, my thought process is to test:

"Regression" tests using the ANN Benchmark suite to capture both performance/recall -- need to do two sets: with and without parallel build
Increasing concurrent building benchmarks (discussed here)
Testing with the parallel build benchmark on different large data sets (10MM + 100MM) with a focus on timing

ankane · 2024-01-08T20:55:51Z

Hey @hlinnaka, thanks for more PRs! I'm currently working on in-memory parallel index builds, but will dig into this and #388 once that's finished (as it may affect the impact on build time).

hlinnaka · 2024-01-29T09:15:12Z

Rebased this

hlinnaka · 2024-06-11T08:43:04Z

I was reminded of this by https://www.postgresql.org/message-id/CA%2BhUKGJ_7NKd46nx1wbyXWriuZSNzsTfm%2BrhEuvU6nxZi3-KVw%40mail.gmail.com.

Rebased.

ankane · 2024-09-19T09:35:38Z

Hey @hlinnaka, I incorporated some of this in 8dde14a as part of work to reduce memory usage for HNSW index scans.

(still planning to visit more of it, but want to get further along in that/filtering work before introducing more complexity)

ankane · 2024-09-21T22:53:33Z

Pushed a version of bulk hashing to the bulk-hash branch, but seeing very little difference on Linux x86-64 and Mac arm64. Let me know if you're still seeing a difference (or if I messed something up with the code).

hlinnaka · 2024-10-07T14:17:15Z

Pushed a version of bulk hashing to the bulk-hash branch, but seeing very little difference on Linux x86-64 and Mac arm64. Let me know if you're still seeing a difference (or if I messed something up with the code).

Thanks! I'm still seeing a modest ~4-5 % difference from the 'bulk-hash' patch. It feels smaller than what I saw before, but it's still measurable and repeatable.

(I got a new laptop since I wrote this, so I cannot compare on the same hardware anymore)

Rebased this again. The first commit is essentially the same as the bulk-hash branch.

Thanks to CPU cache effects (I think), it's faster to fetch all the hashes first. This gives a 5%-10% speedup in HNSW build of a 100-dimension on my laptop.

Introduce a helper function to calculate the distances of all candidates in an array. It doesn't change much on its own, but paves the way for further optimizations, in next commit.

In my testing, this gives a further 10% speedup in HNSW index build.

ankane · 2024-10-08T05:17:05Z

@jkatz If you have a chance to test the bulk-hash branch vs its previous commit, I'd be curious to see what you find.

jkatz · 2024-10-08T14:13:07Z

@ankane By previous commit do you mean what's currently on master? Happy to test. (Finishing up one comparing performance btw PG16 + PG17).

ankane · 2024-10-08T15:55:52Z

Cool, I meant the commit it was branched from (d5e8fc9), but master should work if you merge the latest changes into the branch (just want to make sure the single commit is the only difference).

jkatz · 2024-10-11T15:37:46Z

@ankane Understood. I'm working on this test now.

hlinnaka mentioned this pull request Jan 18, 2024

Parallel index builds for HNSW #409

Closed

hlinnaka force-pushed the optimize-visit-neighbors branch from 822af36 to 6e6e1c7 Compare January 29, 2024 09:14

hlinnaka force-pushed the optimize-visit-neighbors branch from 6e6e1c7 to e40e687 Compare June 11, 2024 08:42

hlinnaka mentioned this pull request Jun 12, 2024

Post-fitering performance timescale/pgvectorscale#87

Closed

hlinnaka force-pushed the optimize-visit-neighbors branch from e40e687 to e0f4a19 Compare October 7, 2024 14:15

hlinnaka added 3 commits October 7, 2024 17:20

Fetch hashes in separate loop before performing hash table lookups

4dcb570

Thanks to CPU cache effects (I think), it's faster to fetch all the hashes first. This gives a 5%-10% speedup in HNSW build of a 100-dimension on my laptop.

Refactor distance calculations in HnswSearchlayer

529b9f8

Introduce a helper function to calculate the distances of all candidates in an array. It doesn't change much on its own, but paves the way for further optimizations, in next commit.

PoC: Calculate 4 neighbor distances at a time in HNSW search

55142c4

In my testing, this gives a further 10% speedup in HNSW index build.

hlinnaka force-pushed the optimize-visit-neighbors branch from e0f4a19 to 55142c4 Compare October 7, 2024 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize visiting neighbors#386

Optimize visiting neighbors#386
hlinnaka wants to merge 3 commits intopgvector:masterfrom
hlinnaka:optimize-visit-neighbors

hlinnaka commented Dec 22, 2023

Uh oh!

jkatz commented Dec 22, 2023

Uh oh!

ankane commented Jan 8, 2024

Uh oh!

hlinnaka commented Jan 29, 2024

Uh oh!

hlinnaka commented Jun 11, 2024

Uh oh!

ankane commented Sep 19, 2024

Uh oh!

ankane commented Sep 21, 2024

Uh oh!

hlinnaka commented Oct 7, 2024

Uh oh!

ankane commented Oct 8, 2024

Uh oh!

jkatz commented Oct 8, 2024

Uh oh!

ankane commented Oct 8, 2024

Uh oh!

jkatz commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

hlinnaka commented Dec 22, 2023

1st Commit: Add a bulk-variant of AddToVisited

2nd & 3rd commits: Calculate 4 neighbor distances at a time in HNSW search

Uh oh!

jkatz commented Dec 22, 2023

Uh oh!

ankane commented Jan 8, 2024

Uh oh!

hlinnaka commented Jan 29, 2024

Uh oh!

hlinnaka commented Jun 11, 2024

Uh oh!

ankane commented Sep 19, 2024

Uh oh!

ankane commented Sep 21, 2024

Uh oh!

hlinnaka commented Oct 7, 2024

Uh oh!

ankane commented Oct 8, 2024

Uh oh!

jkatz commented Oct 8, 2024

Uh oh!

ankane commented Oct 8, 2024

Uh oh!

jkatz commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants