Index build speed optimizations #178
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When looking at the low-level ivfflat build, I saw that two things take a significant share of time:
I did some experiments and found that I could speed up these things with a marginal precision decrease (considering that overall ivfflat is approximate and the share of exact answers in select query is significantly < 100% at any reasonable ratio of probes/lists).
The code is as follows:
https://github.com/pashkinelfe/pgvector/tree/index-build-speed-optimizations
There are two patches
Use float instead of double at dot product calculation. It has the most pronounced effect as dot product calculation contains floating point multiplication of all vector dimensions. Intentionally I left double the overall distance function and the calculations that are done once per vector pair (not at every dimension) as they will increase speed only marginally (and also for compatibility). The low-level reason for this speed-up is that it makes CPU (armv8) using vector multiply-add instruction (fmadd) instead of vector multiplication + conversion to double + addition (fmul + fcvt + fadd) at each vector dimension. This change can also speed up the select's speed due to faster lists traversal that includes dot product calculation to the pairs of (sample vector - index vector).
Caclulate arccos value for spherical big circle distance as a quadratic Lagrange approximation plus sign extension. The effect of this is less pronounced so it could decided to be merged separately.
Overall performance measurements are done at original state (arccos + double dot product), first patch (arccos + float), second (Lagrange approximation + double) and both (Lagrange approximation + float). The dataset is a real 900K sized set of OpenAI vectors (https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M), but the same results are also for random 900K set. Number of lists are chosen at recommended value (900) and 3 time recommended and 1/3 times recommended. The most pronounced effect (55% index build time decrease) is when the number of lists is more than recommended, which has a good reason to be build to increase select speed at cost of index build time. But for recommended value the effect is also very pronounced i.e. 30% index build time decrease with both patches.
Absolute index build times:

Relative index build times as a ratio to original unpatched code:

Regarding index quality, I consider that these small changes in distance calculations will not be so pronounced to be seen on precision vs probes/lists plot as in #163 (comment) or https://github.com/erikbern/ann-benchmarks. I don't expect seeing any changes in precision vs probes/lists or qps plot for selects performance. Still I'll try to publish these benchmarks for all patch variants vs current state.