Conversation
Codecov Report
@@ Coverage Diff @@
## main #29 +/- ##
==========================================
+ Coverage 83.09% 83.27% +0.18%
==========================================
Files 67 67
Lines 3135 3151 +16
==========================================
+ Hits 2605 2624 +19
+ Misses 530 527 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
docarray/array/document.py
Outdated
| and isinstance(index[0], (slice, Sequence)) | ||
| ): | ||
| # edge case where 2 ids are passed | ||
| if isinstance(index[0], str) and index[0] in self._id2offset: |
There was a problem hiding this comment.
this logic itself is also problematic, it does not work with column selector https://docarray.jina.ai/fundamentals/documentarray/access-attributes/
There was a problem hiding this comment.
you can try da['2', 'id'] under your branch it will break
There was a problem hiding this comment.
I would suggest a complete readthrough on "access attributes" chapter before fixing edge cases like this
There was a problem hiding this comment.
Now it should be fixed. There is still the possibility that an id has exactly the same str than an element_selector or a column_selector in which case I guess the notation could be ambiguous as Nan mentions below.
I think this is a 'very stretched edge case' but it's a possibility:
from docarray import Document, DocumentArray
da = DocumentArray([Document(id='@r', text='a1'),
Document(id='2', text='a2'),
Document(id='3', text='a3')])Then da['@r'] should be a single item or a DocumentArray with all root items?
We could assume id strings do not start with @, otherwise the notation in getitem/setitem seems ambiguous. To me this is a reasonable assumption but I guess we should make a Note in the docs?
|
I think MWE: from docarray import Document, DocumentArray
da = DocumentArray([Document(id='1'),
Document(id='2'),
Document(id='3')])
da['1', '2']leads to the error, Traceback (most recent call last):
File "/Users/nanwang/Codes/jina-ai/docarray/toys/toy2.py", line 7, in <module>
print(da['1', '2'])
File "/Users/nanwang/Codes/jina-ai/docarray/docarray/array/document.py", line 154, in __getitem__
return _docs._get_attributes(*_attrs)
File "/Users/nanwang/Codes/jina-ai/docarray/docarray/document/mixins/attribute.py", line 22, in _get_attributes
value = getattr(self, k)
AttributeError: 'Document' object has no attribute '2' |
|
One principle we can stick to is to use the first arg to select ids\index\slices and the second arg to select the attributes. If we follow this rule, the grammar for selecting by integers should be refactored to - da[1, 2, 3]
+ da[[1, 2, 3]]This pattern is similar as the pandas import pandas as pd
df = pd.DataFrame([['a', 1, 'xx'], ['b', 2, 'xx'], ['c', 3, 'xx']], columns=['name', 'age', 'place'])
df.set_index('name', inplace=True)
# select by one id
print(df.loc['a'])
"""output
age 1
place xx
Name: a, dtype: object
age place
"""
# select by multiple ids
print(df.loc[['a', 'b']])
"""output
name
a 1 xx
b 2 xx
1
"""
# select by one id & one attribute
print(df.loc['a', 'age'])
"""output
1
"""
# select by multiple ids & one attribute
print(df.loc[['a', 'b'], 'age'])
"""output
name
a 1
b 2
Name: age, dtype: int64
"""
# select by multiple ids & multiple attribute
print(df.loc[['a', 'b'], ['age', 'place']])
"""output
age place
name
a 1 xx
b 2 xx
""" |
|
This error that you point out @nan-wang is precisely what I explain in the example on top of the PR and should be solved by the PR. |

The
DocumentArraymethod__getitem__is incorrect in the case two ids are passed:Minimal working example to showcase bug:
Currently selecting with two doc ids produces
This PR solves this and produces