Skip to content

index.find() tries to reshape and fails #1822

@nikhilmakan02

Description

@nikhilmakan02

Initial Checks

  • I have read and followed the docs and still think this is a bug

Description

Apologies the title of this is not the best. I have a very odd case and can't seem to understand what is causing it. I have also failed at recreating the issue in a simpler example.

I have a Doc List where each document has been built with the same process however the data is obviously different for each doc. I am using the hnswlib backend.

The issue I have is after I built the doc list with no issues I then try to run a .find() on the individual elements of the doc list, some of which fail and some don't. The error I get on some of these can be seen in the traceback below.

Code Snippet:

class AddressDoc(BaseDoc):
    ELID: int
    FULL_ADDRESS: str
    EMBEDDINGS: NdArray[768]

def build_doc_list(data):
    st = time.time()
    dl = DocList[AddressDoc](
            AddressDoc(
                ELID=0000000,
                FULL_ADDRESS="",
                EMBEDDINGS=d["EMBEDDINGS"],
            )
            for d in data
    )
    logger.info(f"Doc list created... {time.time()-st}")
    return dl

doc_index = HnswDocumentIndex[AddressDoc](work_dir=db_path)
dl = build_doc_list(data)

# This works!
results = doc_index.find(dl[2], search_field="EMBEDDINGS", limit=1)

# This doesn't!
results = doc_index.find(dl[3], search_field="EMBEDDINGS", limit=1)

type(dl[2].EMBEDDINGS) == type(dl[3].EMBEDDINGS) # returns True
type(dl[2].EMBEDDINGS.shape) == type(dl[3].EMBEDDINGS.shape) # returns True

I have compared dl[2] and dl[3] left right and center and can't understand what the issue is. The embeddings array in both documents are the same shape which I have checked with numpy (.shape, .ndims, .size). I can't understand what the difference is between the two that causes the error below.

Traceback below:

File /usr/local/lib/python3.11/site-packages/docarray/index/abstract.py:503, in BaseDocIndex.find(self, query, search_field, limit, **kwargs)
    [501](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=500)     query_vec = query
    [502](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=501) query_vec_np = self._to_numpy(query_vec)
--> [503](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=502) docs, scores = self._find(
    [504](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=503)     query_vec_np, search_field=search_field, limit=limit, **kwargs
    [505](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=504) )
    [507](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=506) if isinstance(docs, List) and not isinstance(docs, DocList):
    [508](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=507)     docs = self._dict_list_to_docarray(docs)

File /usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py:328, in HnswDocumentIndex._find(self, query, limit, search_field)
    [324](file:///usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py?line=323) def _find(
...
--> [197](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=196)     return cls._docarray_from_native(x.reshape(source.shape))
    [198](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=197) elif len(source.shape) > 0:
    [199](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=198)     return cls._docarray_from_native(np.zeros(source.shape))

ValueError: cannot reshape array of size 768 into shape (768,768)

Example Code

No response

Python, DocArray & OS Version

0.39.0

Affected Components

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions