Add 8-bit scalar quantization support for IVF index.#231
Add 8-bit scalar quantization support for IVF index.#231bohanliu5 wants to merge 1 commit intopgvector:masterfrom
IVF index.#231Conversation
following benefits on top of the existing `ivfflat` index: * Up to 2X index query time improvement. * ~25% faster index build time. * 4X savings on index storage. * Vectors with 8,000 dimensions can be supported with quantization.
| L2 distance | ||
|
|
||
| ```sql | ||
| CREATE INDEX ON items USING ivf (embedding vector_l2_ops) WITH (lists = 100, quantizer = 'SQ8'); |
There was a problem hiding this comment.
Why is this implemented as a new index access method instead of as an option on the existing ivfflat code? Having it as an existing option would make it simpler for users to manage their indexes.
There was a problem hiding this comment.
We discussed this with @ankane. Our understanding is that flat means no encoding - here are some examples from Milvus and Faiss and it would be good to be consistent.
Agree that we should make it simpler for users to use. We think ivf with an quantizer option would provide more flexibility - we could also support ivf WITH (quantizer = 'flat') if needed.
There was a problem hiding this comment.
OR as another examples
CREATE INDEX ON items USING ivf (lists=100); -- defaults to `quantizer=flat` or `quantizer=NULL` or however one wants to represent "flat"
CREATE INDEX ON items USING ivf (lists=100, quantizer='SQ8'); That said, given ivfflat index AM is out there, we do need to be careful about introducing new access methods. Effectively we need to treat ivfflat as if it's not going away. Maybe ivf becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat infrastructure.
There was a problem hiding this comment.
Maybe ivf becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat infrastructure.
We discussed the same proposal with @ankane as well. I can update the PR to support quantizer='flat', aliasing to ivfflat. +1 on using ivf as the preferred choice.
|
Thanks for the proposal! I have a few general comments:
Do you have any tests that you ran against a known data set (e.g. ANN Benchmarks) that show how this implementation compares against
|
|
I have the same question as @jkatz - why a new index type, why not the "ops" part, so |
As for recall - at least for some ANN Benchmarks datasets - there should not be any difference, as for example SIFT 128 dataset has only integer values in range 0 ... 218 in its vectors, though still stored as FP32 |
|
Thanks for your comments! And yes, we ran the ANN benchmark with various datasets, an example gain from Deep1B on
|
|
Hi @bohanliu5, thanks for the PR! I really appreciate all the work. It looks like this introduces a lot of complexity to the code. I think there's some that can be removed (using the existing index type, no cross page storage at first), but I'm also concerned that there's enough that can't (which may not justify the benefits right now). I'd like to see how the performance and complexity compare to product quantization, which I'm planning to focus on after HNSW, so I think it makes sense to wait on this. |
|
Does scalar quantization only make sense with IVF index, or can it be used with HNSW too? |
|
@hlinnaka SQ is a general technique to reduce the number of bytes required to store a vector, so the short answer is yes. However, compared to product quantization (PQ), SQ can only reduce the storage so much, as you ultimately can only reduce so many bits before you lose too much information. I agree with Andrew's analysis above (in many ways), but in particular, around focus on PQ first, which should have a more dramatic effect on reducing memory consumption, though we'd have to test for the impact on recall. |
|
@bohanliu5 do you plan to work on this one? Would be awesome to have this integration. |
It brings the following benefits on top of the existing
ivfflatindex: