Skip to content

draft: competition 2 BaseANN changes#111

Draft
sourcesync wants to merge 6 commits into
mainfrom
gw/competition_2_base_class
Draft

draft: competition 2 BaseANN changes#111
sourcesync wants to merge 6 commits into
mainfrom
gw/competition_2_base_class

Conversation

@sourcesync
Copy link
Copy Markdown
Collaborator

@sourcesync sourcesync commented May 11, 2023

This is a draft PR to get ideas/feedback around possible changes to BaseANN for the second competition.

Assumptions:

  • each object in a dataset could have a dense vector or a sparse vector (or both)
  • each object could also have a set of scalars associated with it
  • need to add new delete() and insert() virtual methods for the competition streaming task
  • range_query() won't change for the second competition

Possible Approaches:

  • augment query() with additional parameters called meta_filter and sparse_vector which default to empty set and None respectively. This assumes the query() X parameter is still a dense vector
  • create a new virtual method called hybrid_query() which exposes a three parameters dense_vector, sparse_vector and meta_filter.
  • add delete() call with an I parameter which contains a list of dataset indices to delete
  • add insert() with an X parameter which contains a batch of vectors to add. Each element of X could be a hybrid of dense vector, sparse vector, and scalars.

( @harsha-simhadri had asked me to kick-start this, but if anything has already been done here I'm happy close this PR )

@sourcesync sourcesync changed the title competition 2 BaseANN changes draft: competition 2 BaseANN changes May 11, 2023
@harsha-simhadri
Copy link
Copy Markdown
Owner

We need to add insert and delete too

@harsha-simhadri
Copy link
Copy Markdown
Owner

We need to add insert and delete too

To be more specific, lets use batch_insert() and batch_delete().

@sourcesync
Copy link
Copy Markdown
Collaborator Author

@mdouze @ingberam I went ahead and prototyped the new/updated candidate virtual methods in BaseANN. See the Files Changed tab for the proposals.

@maumueller
Copy link
Copy Markdown
Collaborator

@sourcesync Thanks so much for going ahead with this! As I understood, there is no hybrid query, but only a dense query, add, remove (possible with metadata) and a sparse query. Given these different settings, I think it makes more sense to have BaseDenseANN and BaseSparseANN subclasses from BaseANN. Also, there are probably going to be different runners for the different scenarios since I imagine participants targeting single scenarios (and maybe they want to use their own runner and not provide a wrapper.)

My plan was to merge the baselines next week and then refactor the code around them. It would be great if we could join forces @sourcesync

@sourcesync
Copy link
Copy Markdown
Collaborator Author

@sourcesync Thanks so much for going ahead with this! As I understood, there is no hybrid query, but only a dense query, add, remove (possible with metadata) and a sparse query. Given these different settings, I think it makes more sense to have BaseDenseANN and BaseSparseANN subclasses from BaseANN. Also, there are probably going to be different runners for the different scenarios since I imagine participants targeting single scenarios (and maybe they want to use their own runner and not provide a wrapper.)

My plan was to merge the baselines next week and then refactor the code around them. It would be great if we could join forces @sourcesync

Awesome @Martin Aumüller. Yeah, I think approaching this with different base classes is a great choice esp. if "hybrid" is not a part of this competition.

( Note that if you already have a branch / PR going, I'm quite happy to close this draft PR...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants