Conversation
|
Hey, thank you for your contribution! Would you be able to add a test here: integration/test_batch_v4.py? Numpy is available as a test dependency |
|
To avoid any confusion in the future about your contribution to Weaviate, we work with a Contributor License Agreement. If you agree, you can simply add a comment to this PR that you agree with the CLA so that we can merge. |
agree with the CLA |
b196d53 to
02abfbd
Compare
|
Hi @dirkkul, I think should be fine now. The two previous tests which failed (DB issue and timeout on grpc) were not connected to this PR. |
802a74d to
a2f116b
Compare
|
@dirkkul friendly reminder: can this branch be merged? |
a2f116b to
5120b1a
Compare
integration/test_collection.py
Outdated
| DATE3 = datetime.datetime.strptime("2019-06-10", "%Y-%m-%d").replace(tzinfo=datetime.timezone.utc) | ||
|
|
||
|
|
||
| def get_numpy_vector(input_list: list) -> Any: |
There was a problem hiding this comment.
we have numpy in our dev requirements, so you can assume it is present for tests
There was a problem hiding this comment.
thanks, simplified
(and rebased - hopefully that fixes the failing test)
| vector_bytes=( | ||
| pack_vector(obj.vector) | ||
| if obj.vector is not None and isinstance(obj.vector, list) | ||
| if obj.vector is not None and not isinstance(obj.vector, dict) |
There was a problem hiding this comment.
I think handling of vector input should be at the input level, eg in insert_many and batch.add_objects and not here
There was a problem hiding this comment.
Could you please elaborate what you mean by this and some hints how to accomplish?
I was staring at the code and don't see how I could push the validation upstream, since obj.vector can be both None and a dict. Couple of lines below where vectors is constructed: if obj.vector is not None and isinstance(obj.vector, dict), which means either vector_bytes or vectors will be not-None, but not both at the same time.
For readability, we could move these two next to each other - just cosmetics.
Note: the only thing I would do is to move pack_vector outside as is _pack_named_vectors, and rename the latter to _pack_named_vector. But that is again just cosmetics :)
There was a problem hiding this comment.
I think the correct place would be 'weaviate/collections/data/data.py::insert_manyand then the cleanup should be herevector=obj.vector`. But keep in mind that vectors can be lists and dictionaries (for named vectors) and both should be supported
There was a problem hiding this comment.
Hi @dirkkul, thanks for coming back. However, I still don't see how this could be achieved without a major refactoring.
There are 3 possibilities in the current program flow, starting from data/_DataCollectionAsync.insert_many():
| type(obj) | type(obj.vector) | _BatchObject.vector | vector_bytes | vectors |
|---|---|---|---|---|
| !DataObject | n/a | None | None | None |
| DataObject | !dict | obj.vector | pack_vector(...) | None |
| DataObject | dict | obj.vector | None | _pack_named_vectors(...) |
So move one needs to move some of the logic from __grpc_objects() into insert_many(), e.g. creating a list of tuples containing (_BatchObject, vector_bytes, vectors). Is this what you are suggesting or am I on the wrong path please?
Update: the last commit contains a first try at refactoring. Let me know your thoughts please.
e36d66c to
3649fc5
Compare
3649fc5 to
9f21009
Compare
c564504 to
e98b0aa
Compare
|
Hey, I think you are mixing two things up:
We would want that user supplied vectors that are numpy arrays are converted to python lists. So if you do it should take care of things. |
addbb98 to
00a60a8
Compare
00a60a8 to
27422ac
Compare
|
Hi @dirkkul, thanks for your patience, fingers crossed I got it now... |
|
Friendly ping @dirkkul |
Fixes #1002
This is the simplest fix which allows a 1d numpy array to be passed in (additionally, anything which can be converted in the function
util.get_vector). This will raise aTypeErrorif the input is incorrect, which is converted later toWeaviateInvalidInputError.Note that passing in something else then
types.VECTOR, e.g. a numpy array, will raise a mypy error.