Skip to content

Conversation

@pashkinelfe
Copy link
Contributor

When looking at the low-level ivfflat build, I saw that two things take a significant share of time:

  1. Calculation of dot product
  2. Caclulation of arccos value.

I did some experiments and found that I could speed up these things with a marginal precision decrease (considering that overall ivfflat is approximate and the share of exact answers in select query is significantly < 100% at any reasonable ratio of probes/lists).

The code is as follows:
https://github.com/pashkinelfe/pgvector/tree/index-build-speed-optimizations

There are two patches

  1. Use float instead of double at dot product calculation. It has the most pronounced effect as dot product calculation contains floating point multiplication of all vector dimensions. Intentionally I left double the overall distance function and the calculations that are done once per vector pair (not at every dimension) as they will increase speed only marginally (and also for compatibility). The low-level reason for this speed-up is that it makes CPU (armv8) using vector multiply-add instruction (fmadd) instead of vector multiplication + conversion to double + addition (fmul + fcvt + fadd) at each vector dimension. This change can also speed up the select's speed due to faster lists traversal that includes dot product calculation to the pairs of (sample vector - index vector).

  2. Caclulate arccos value for spherical big circle distance as a quadratic Lagrange approximation plus sign extension. The effect of this is less pronounced so it could decided to be merged separately.

Overall performance measurements are done at original state (arccos + double dot product), first patch (arccos + float), second (Lagrange approximation + double) and both (Lagrange approximation + float). The dataset is a real 900K sized set of OpenAI vectors (https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M), but the same results are also for random 900K set. Number of lists are chosen at recommended value (900) and 3 time recommended and 1/3 times recommended. The most pronounced effect (55% index build time decrease) is when the number of lists is more than recommended, which has a good reason to be build to increase select speed at cost of index build time. But for recommended value the effect is also very pronounced i.e. 30% index build time decrease with both patches.

Absolute index build times:
image

Relative index build times as a ratio to original unpatched code:
image

Regarding index quality, I consider that these small changes in distance calculations will not be so pronounced to be seen on precision vs probes/lists plot as in #163 (comment) or https://github.com/erikbern/ann-benchmarks. I don't expect seeing any changes in precision vs probes/lists or qps plot for selects performance. Still I'll try to publish these benchmarks for all patch variants vs current state.

calculation

On ARM this makes CPU using vector multiply-add instruction (fmadd)
instead of vector multiplication + conversion to double + addition
(fmul + fcvt + fadd) at each vector dimension.

Output of distance functions and calculations that are are done once
per vector pair are left double as this don't make speed difference
and for compatibility.
@jkatz
Copy link
Contributor

jkatz commented Jul 6, 2023

+1 for c93807e (moving to float4 for distances) at least for building; it feels odd to me to cast from float4 to float8 at the end given the loss of information from not storing the float8 values during the calculation phase, but I don't know if this negatively impacts the final results.

For 422f0a2 I'd want to see what the changes are precision are, if any, against some of the known benchmarks (e.g. ANN Benchmarks as mentioned).

@pashkinelfe
Copy link
Contributor Author

pashkinelfe commented Jul 6, 2023

@jkatz thanks for your review! I agree with you for c93807e and created a separate PR for this patch alone #180.

Changing the distance functions interface from double to float will need modification of vector.sql interface which may need a major pgvector upgrade (If anyone used float8 function in current pgvector to fill something in the db, his workflow can break at pgvector upgrade if we change this function to float4)

If we multiply-add float numbers and convert the result to double at each vector component, then in principle it's alike to make these calculations in float and convert to double later. I agree we'd rather use float result but I consider it's legit in a current state as well. As of now, I've left it as is for compatibility and a small expected build speed gain.

If we're ready for changing SQL, I'd easily modify the patch as requested.

@pashkinelfe pashkinelfe force-pushed the index-build-speed-optimizations branch 2 times, most recently from d50c3a2 to 17653a9 Compare July 7, 2023 11:38
@pashkinelfe pashkinelfe force-pushed the index-build-speed-optimizations branch from 17653a9 to 03cca39 Compare July 14, 2023 07:04
@ankane
Copy link
Member

ankane commented Jul 18, 2023

Thanks again @pashkinelfe. Closing this out due to the findings in #180 (comment).

@ankane ankane closed this Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants