fix: do not inspect input with cloud inference#1024
Conversation
✅ Deploy Preview for poetic-froyo-8baba7 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughThis change refactors the conditional logic for embedding model inference checks within several methods of both the Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
qdrant_client/qdrant_client.py (1)
730-758: Same nested-if / generator comments apply here – please replicate the simplification / refactor inquery_points_groupsto keep the two code paths in lock-step.🧰 Tools
🪛 Ruff (0.11.9)
730-733: Use a single
ifstatement instead of nestedifstatements(SIM102)
🧹 Nitpick comments (2)
qdrant_client/qdrant_client.py (2)
552-560: Collapse the nestedifto a single predicate for clarityThe static-analysis hint (SIM102) is right: the two–level
ifadds indentation without adding meaning.
You can shave four lines and one indent level:- if not self.cloud_inference: - if self._inference_inspector.inspect(query) or self._inference_inspector.inspect( - prefetch - ): + if ( + not self.cloud_inference + and ( + self._inference_inspector.inspect(query) + or self._inference_inspector.inspect(prefetch) + ) + ):Same change applies to
query_points_groups.🧰 Tools
🪛 Ruff (0.11.9)
557-560: Use a single
ifstatement instead of nestedifstatements(SIM102)
562-591: Minor: avoid burning a generator just to take the first element
_embed_modelsreturns an iterator. Grabbing the first element withnext(iter(...))materialises one item and then drops the generator, which is a little wasteful and makes the intent less obvious.A more idiomatic pattern is:
- query = ( - next(iter(self._embed_models(query, ...))) - if query is not None - else None - ) + if query is not None: + query, *_ = self._embed_models( + query, is_query=True, batch_size=self.local_inference_batch_size + )(Not performance-critical here, but improves readability).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
qdrant_client/async_qdrant_client.py(6 hunks)qdrant_client/qdrant_client.py(6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
qdrant_client/qdrant_client.py (2)
qdrant_client/embed/type_inspector.py (1)
inspect(24-50)qdrant_client/qdrant_fastembed.py (2)
query(612-696)_embed_models(867-879)
qdrant_client/async_qdrant_client.py (2)
qdrant_client/embed/type_inspector.py (1)
inspect(24-50)qdrant_client/async_qdrant_fastembed.py (2)
query(573-647)_embed_models(797-808)
🪛 Ruff (0.11.9)
qdrant_client/qdrant_client.py
557-560: Use a single if statement instead of nested if statements
(SIM102)
730-733: Use a single if statement instead of nested if statements
(SIM102)
qdrant_client/async_qdrant_client.py
528-531: Use a single if statement instead of nested if statements
(SIM102)
696-699: Use a single if statement instead of nested if statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Redirect rules - poetic-froyo-8baba7
- GitHub Check: Header rules - poetic-froyo-8baba7
- GitHub Check: Pages changed - poetic-froyo-8baba7
- GitHub Check: Python 3.11.x on ubuntu-latest test
- GitHub Check: Python 3.12.x on ubuntu-latest test
- GitHub Check: Python 3.10.x on ubuntu-latest test
- GitHub Check: Python 3.13.x on ubuntu-latest test
- GitHub Check: Python 3.9.x on ubuntu-latest test
🔇 Additional comments (10)
qdrant_client/qdrant_client.py (4)
421-428: Condition now correctly skips local inference on cloud-inference runsSwapping the two‐step
requires_inferencelogic forif not self.cloud_inference and self._inference_inspector.inspect(requests):makes the intent explicit and prevents the (expensive) inspection+embedding when
cloud_inferenceis enabled.
Looks good.
1617-1632: LGTM – early exit when cloud inference is enabledThe new guard prevents the local embedder from doing unnecessary work. No further issues spotted.
1682-1687: Same concise guard pattern correctly applied toupdate_vectors
2135-2140: Consistent fix inbatch_update_points– mirrors the other write paths.qdrant_client/async_qdrant_client.py (6)
398-403: Avoid re-allocating the generator just to wrap it inlist()
_embed_models()already yields an iterable whose length matches the input.
If the upstream_client.query_batch_pointscan consume any iterable (it eventually converts to Protobuf / JSON anyway) there is no need to materialise the whole thing:- if not self.cloud_inference and self._inference_inspector.inspect(requests): - requests = list( - self._embed_models( - requests, is_query=True, batch_size=self.local_inference_batch_size - ) - ) + if not self.cloud_inference and self._inference_inspector.inspect(requests): + # Pass a lazy generator straight through; saves memory for big batches + requests = self._embed_models( + requests, is_query=True, batch_size=self.local_inference_batch_size + )This keeps memory usage constant and avoids one extra traversal.
[ suggest_optional_refactor ]
528-562: Combine the twoiflevels to satisfy Ruff SIM102 and skip pointless embeddingsYou run
_embed_modelsonprefetcheven when onlyqueryrequired inference, which can double the work on large lists.
Merging the conditions lets you decide per-argument which one really needs embedding:- if not self.cloud_inference: - if self._inference_inspector.inspect(query) or self._inference_inspector.inspect( - prefetch - ): + if ( + not self.cloud_inference + and ( + self._inference_inspector.inspect(query) + or self._inference_inspector.inspect(prefetch) + ) + ):Then, before embedding, re-check each object:
if query is not None and self._inference_inspector.inspect(query): query = next( self._embed_models( query, is_query=True, batch_size=self.local_inference_batch_size ) ) if isinstance(prefetch, list) and any(self._inference_inspector.inspect(p) for p in prefetch): prefetch = list( self._embed_models( prefetch, is_query=True, batch_size=self.local_inference_batch_size ) ) elif prefetch is not None and self._inference_inspector.inspect(prefetch): prefetch = next( self._embed_models( prefetch, is_query=True, batch_size=self.local_inference_batch_size ) )This removes the nested
if, fulfils the static-analysis hint, and avoids unnecessary work.
[ suggest_essential_refactor ]🧰 Tools
🪛 Ruff (0.11.9)
528-531: Use a single
ifstatement instead of nestedifstatements(SIM102)
696-724: Same SIM102 / redundant-embedding issue as inquery_pointsThe logic duplicates the pattern fixed above. Applying the same consolidation keeps the two public APIs in sync and prevents divergent behaviour in the future.
[ duplicate_comment ]🧰 Tools
🪛 Ruff (0.11.9)
696-699: Use a single
ifstatement instead of nestedifstatements(SIM102)
1564-1579: Minor readability – inline condition is 👍Nice simplification compared to the previous
requires_inferencevariable.
No functional concerns here.
[ approve_code_changes ]
1627-1633: Consider early-returning wheninspectisFalseto save one allocation- if not self.cloud_inference and self._inference_inspector.inspect(points): - points = list( - self._embed_models( - points, is_query=False, batch_size=self.local_inference_batch_size - ) - ) + if self.cloud_inference or not self._inference_inspector.inspect(points): + return await self._client.update_vectors( + collection_name=collection_name, + points=points, + wait=wait, + ordering=ordering, + shard_key_selector=shard_key_selector, + ) + + points = list( + self._embed_models( + points, is_query=False, batch_size=self.local_inference_batch_size + ) + )This fast-path avoids the second pass over
pointswhen inference is not needed.
[ suggest_optional_refactor ]
2070-2075: Generator vs. list againSame remark as for
query_batch_points: if the lower layer accepts anyIterable, do not eagerlylist()the result.
[ duplicate_comment ]
Previous implementation assumed that we'd need to send a header with cloud inference in order to do inference remotely, however, in the current state we don't need to do this anymore, since input requiring inference selected on Qdrant's side