(find-documentarray)=
You can use {meth}~docarray.array.mixins.find.FindMixin.find to select Documents from a DocumentArray based the
conditions specified in a query object. You can use da.find(query) to filter Documents and get nearest neighbors
from da:
- To filter Documents, the
queryobject is a Python dictionary object that defines the filtering conditions using a MongoDB-like query language. - To find nearest neighbors, the
queryobject needs to be a NdArray-like, a Document, or a DocumentArray object that defines embedding. You can also use.match()function for this purpose, and there is a minor interface difference between these two functions, which is described {ref}in the next chapter<match-documentarray>.
:class: note
The filter query syntax depends on which {ref}`document store <doc-store>` you use. Some may have their own query language.
Let's see some examples in action. First, let's prepare a DocumentArray:
from jina import Document, DocumentArray
da = DocumentArray(
[
Document(
text='journal',
weight=25,
tags={'h': 14, 'w': 21, 'uom': 'cm'},
modality='A',
),
Document(
text='notebook',
weight=50,
tags={'h': 8.5, 'w': 11, 'uom': 'in'},
modality='A',
),
Document(
text='paper',
weight=100,
tags={'h': 8.5, 'w': 11, 'uom': 'in'},
modality='D',
),
Document(
text='planner',
weight=75,
tags={'h': 22.85, 'w': 30, 'uom': 'cm'},
modality='D',
),
Document(
text='postcard',
weight=45,
tags={'h': 10, 'w': 15.25, 'uom': 'cm'},
modality='A',
),
]
)
da.summary() Documents Summary
Length 5
Homogenous Documents True
Common Attributes ('id', 'text', 'tags', 'weight', 'modality')
Attributes Summary
Attribute Data type #Unique values Has empty value
──────────────────────────────────────────────────────────
id ('str',) 5 False
weight ('int',) 5 False
modality ('str',) 2 False
tags ('dict',) 5 False
text ('str',) 5 False
A query filter document uses query operators to specify conditions:
{ <field1>: { <operator1>: <value1> }, ... }
Here field1 is {ref}any field name<doc-fields> of a Document object. To access nested fields, you can use the dunder expression.
For example, tags__timestamp accesses the doc.tags['timestamp'] field.
value1 can be either a user given Python object, or a substitution field with curly bracket {field}
Finally, operator1 can be one of the following:
| Query Operator | Description |
|---|---|
$eq |
Equal to (number, string) |
$ne |
Not equal to (number, string) |
$gt |
Greater than (number) |
$gte |
Greater than or equal to (number) |
$lt |
Less than (number) |
$lte |
Less than or equal to (number) |
$in |
Is in an array |
$nin |
Not in an array |
$regex |
Match the specified regular expression |
$size |
Match array/dict field that have the specified size. $size does not accept ranges of values. |
$exists |
Matches documents that have the specified field; {ref}predefined fields<doc-fields> having a default value (for example empty string, or 0) are considered as not existing; if the expression specifies a field x in tags (tags__x), then the operator tests that x is not None. |
To select all modality='D' Documents:
r = da.find({'modality': {'$eq': 'D'}})
pprint(r.to_dict(exclude_none=True)) # just for pretty print[
{
"id": "92aee5d665d0c4dd34db10d83642aded",
"modality": "D",
"tags": {
"h": 8.5,
"uom": "in",
"w": 11.0
},
"text": "paper",
"weight": 100.0
},
{
"id": "1a9d2139b02bc1c7842ecda94b347889",
"modality": "D",
"tags": {
"h": 22.85,
"uom": "cm",
"w": 30.0
},
"text": "planner",
"weight": 75.0
}
]To select all Documents whose .tags['h']>10,
r = da.find({'tags__h': {'$gt': 10}})[
{
"id": "4045a9659875fd1299e482d710753de3",
"modality": "A",
"tags": {
"h": 14.0,
"uom": "cm",
"w": 21.0
},
"text": "journal",
"weight": 25.0
},
{
"id": "cf7691c445220b94b88ff116911bad24",
"modality": "D",
"tags": {
"h": 22.85,
"uom": "cm",
"w": 30.0
},
"text": "planner",
"weight": 75.0
}
]Beside using a predefined value, you can also use a substitution with {field}. Notice those curly braces. For example:
r = da.find({'tags__h': {'$gt': '{tags__w}'}})[
{
"id": "44c6a4b18eaa005c6dbe15a28a32ebce",
"modality": "A",
"tags": {
"h": 14.0,
"uom": "cm",
"w": 10.0
},
"text": "journal",
"weight": 25.0
}
]You can combine multiple conditions using the following operators:
| Boolean Operator | Description |
|---|---|
$and |
Join query clauses with a logical AND |
$or |
Join query clauses with a logical OR |
$not |
Inverts the effect of a query expression |
r = da.find({'$or': [{'weight': {'$eq': 45}}, {'modality': {'$eq': 'D'}}]})[
{
"id": "22985b71b6d483c31cbe507ed4d02bd1",
"modality": "D",
"tags": {
"h": 8.5,
"uom": "in",
"w": 11.0
},
"text": "paper",
"weight": 100.0
},
{
"id": "a071faf19feac5809642e3afcd3a5878",
"modality": "D",
"tags": {
"h": 22.85,
"uom": "cm",
"w": 30.0
},
"text": "planner",
"weight": 75.0
},
{
"id": "411ecc70a71a3f00fc3259bf08c239d1",
"modality": "A",
"tags": {
"h": 10.0,
"uom": "cm",
"w": 15.25
},
"text": "postcard",
"weight": 45.0
}
]