feat(language_model): add LiteLLM provider for 100+ backings#1386
feat(language_model): add LiteLLM provider for 100+ backings#1386RheagalFire wants to merge 6 commits into
Conversation
|
cc @sscargal |
edwinyyyu
left a comment
There was a problem hiding this comment.
Is there a reason not to add litellm as an optional dependency in server pyproject.toml?
| def get_litellm_language_model_conf(self, name: str) -> "LiteLLMLanguageModelConf": | ||
| """Get LiteLLM language model configuration by name.""" | ||
| return self.litellm_language_model_confs[name] | ||
|
|
There was a problem hiding this comment.
The parse and to_yaml_dict needs update too
| max_attempts: int, | ||
| generate_response_call_uuid: object, | ||
| ) -> ChatCompletion | AsyncIterator[object] | object: | ||
| import litellm |
There was a problem hiding this comment.
update the project.toml file to install the module
There was a problem hiding this comment.
Pull request overview
Adds a new server-side LiteLLMLanguageModel provider intended to expand MemMachine’s supported LLM backends by delegating requests to the LiteLLM SDK while reusing the existing OpenAI chat-completions parsing/streaming/tool-call logic.
Changes:
- Introduces
LiteLLMLanguageModel(subclassingOpenAIChatCompletionsLanguageModel) that routes requests vialitellm.acompletion. - Extends language model configuration and the
LanguageModelManagerto register/build/remove LiteLLM-backed models. - Adds unit tests for the LiteLLM adapter behavior (dispatch, kwarg forwarding, retries, and parent parsing).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/server/src/memmachine_server/common/resource_manager/language_model_manager.py | Wires LiteLLM configs into manager lookups and adds a builder for LiteLLMLanguageModel. |
| packages/server/src/memmachine_server/common/language_model/litellm_language_model.py | New LiteLLM adapter that swaps the request implementation to litellm.acompletion and adds retry logic. |
| packages/server/src/memmachine_server/common/configuration/language_model_conf.py | Adds LiteLLMLanguageModelConf and a litellm_language_model_confs collection plus accessors. |
| packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py | New unit tests validating LiteLLM dispatch/forwarding/retry behavior and inherited parsing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from typing import Any | ||
| from unittest.mock import AsyncMock, MagicMock, patch | ||
|
|
| # Inject a fake litellm module so tests don't require the real package. | ||
| if "litellm" not in sys.modules: | ||
| _fake_litellm = types.ModuleType("litellm") | ||
| _fake_litellm.acompletion = AsyncMock() | ||
| sys.modules["litellm"] = _fake_litellm | ||
|
|
||
| from memmachine_server.common.data_types import ExternalServiceAPIError | ||
| from memmachine_server.common.language_model.litellm_language_model import ( | ||
| LiteLLMLanguageModel, | ||
| LiteLLMLanguageModelParams, | ||
| _is_retryable_litellm_error, | ||
| ) |
| ret: LanguageModel | None = None | ||
| if name in self.conf.openai_responses_language_model_confs: | ||
| ret = self._build_openai_responses_language_model(name) | ||
| if name in self.conf.openai_chat_completions_language_model_confs: | ||
| ret = self._build_openai_chat_completions_language_model(name) | ||
| if name in self.conf.amazon_bedrock_language_model_confs: | ||
| ret = self._build_amazon_bedrock_language_model(name) | ||
| if name in self.conf.litellm_language_model_confs: | ||
| ret = self._build_litellm_language_model(name) |
| litellm = [ | ||
| "litellm>=1.63.0", | ||
| ] |
| try: | ||
| import litellm | ||
| except ImportError as e: | ||
| raise ImportError( | ||
| "litellm is required for LiteLLMLanguageModel. " | ||
| "Install it with: pip install memmachine-server[litellm]" | ||
| ) from e |
|
Please ensure CI passes. |
sscargal
left a comment
There was a problem hiding this comment.
@RheagalFire, in addition to @edwinyyyu and CoPilot feedback, please sign your commits. Thanks.
7fe5012 to
951bbf6
Compare
Signed-off-by: RheagalFire <arishalam121@gmail.com>
951bbf6 to
5859715
Compare
…test Signed-off-by: Aarish Alam <arishalam121@gmail.com>
Signed-off-by: Aarish Alam <arishalam121@gmail.com>
711c275 to
6b5e6ee
Compare
…nguage-model Signed-off-by: Aarish Alam <arishalam121@gmail.com> # Conflicts: # uv.lock
Purpose of the change
Today every new LLM backing in MemMachine (Cohere, Mistral, Groq, Together, ...) requires writing another
LanguageModelfrom scratch alongsideOpenAIChatCompletionsLanguageModel/OpenAIResponsesLanguageModel/AmazonBedrockLanguageModel. This PR adds a singleLiteLLMLanguageModelthat delegates the actual provider call to the LiteLLM SDK, givingMemMachine coverage of the 100+ providers LiteLLM supports (OpenAI, Anthropic, AWS Bedrock, Vertex AI, Cohere, Mistral, Groq, Perplexity, Together, Fireworks, Cerebras, Databricks, IBM Watsonx, AI21,
Replicate, DeepInfra, NVIDIA NIM, xAI, Sambanova, ...) by changing only the
modelspec.It also adds a third deployment mode (LiteLLM proxy server) useful for centralized credential management and audit logging.
Description
LiteLLMLanguageModelsubclassesOpenAIChatCompletionsLanguageModeland overrides only_request_chat_completionto calllitellm.acompletion(**args)instead ofclient.chat.completions.create(**args).LiteLLM normalizes every backing's response to OpenAI's
ChatCompletionshape, so the parent's parsing, streaming, tool-call accumulation, structured-output handling, and metrics paths inherit unchanged.Configuration mirrors the existing OpenAI shape:
In embedded mode (no
api_base), LiteLLM resolves credentials from each backing's standard env var (ANTHROPIC_API_KEY,OPENAI_API_KEY,AWS_ACCESS_KEY_ID, ...) at call time. In proxy mode, calls routethrough a LiteLLM proxy server that holds the credentials.
Dependency:
litellm>=1.60,<1.85. Imported lazily inside the request function so users who don't configure alitellm_language_model_confsentry don't need it. Happy to make this an optional extra(
memmachine-server[litellm]) instead if preferred.Files added:
packages/server/src/memmachine_server/common/language_model/litellm_language_model.py(new, 224 LOC)packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py(new, 270 LOC)Files modified:
packages/server/src/memmachine_server/common/configuration/language_model_conf.py(+71): newLiteLLMLanguageModelConf,litellm_language_model_confsdict onLanguageModelsConf, helper accessors.packages/server/src/memmachine_server/common/resource_manager/language_model_manager.py(+50): wiredlitellminto_is_configured,get_all_names,_build_language_model,add_language_model_config,remove_language_model; new_build_litellm_language_modelbuilder.Fixes/Closes
N/A (no related issue; happy to open one and link if preferred).
Type of change
How Has This Been Tested?
Unit tests: 10 new tests in
test_litellm_language_model.pycovering init, dispatch, kwarg forwarding, retry behavior, and inherited parsing.Coverage:
LiteLLMLanguageModel.__init__does not require anAsyncOpenAIclient (parent's hard requirement is bypassed cleanly)._request_chat_completiondispatches tolitellm.acompletionwith the right model spec.api_key/api_base/api_versionare forwarded only when set on params; caller-supplied kwargs win over shim defaults (no silent override).extra_kwargsfrom params (metadata,tags, ...) reachlitellm.acompletion.RateLimitError,APITimeoutError,APIConnectionError,InternalServerError,ServiceUnavailableError,Timeout) trigger exponential backoff; non-retryable ones raise immediately.max_attempts, retryable errors surface asExternalServiceAPIError.generate_responseinherits the parent's OpenAI-shape parsing unchanged when_request_chat_completionreturns aChatCompletion.Lint + type-check (CI parity):
End-to-end test (Anthropic via Azure AI Foundry):
Output:
'pong.'. The wrapped call routed throughlitellm.acompletionto Anthropic and the response was parsed by the inherited OpenAI parser. The sameLiteLLMLanguageModelwould route via OpenAI / Bedrock/ Cohere / Mistral / ... by changing only the
modelspec.Test Results: All 10 unit tests pass; lint, format, and ty are clean; live E2E returns the expected reply.
Checklist
-sSper CONTRIBUTING.md; will rebase before merge if needed)Maintainer Checklist
Screenshots/Gifs
N/A (backend-only change; live E2E output included in "How Has This Been Tested?").
Further comments
Out of scope (happy to follow up):
LiteLLMEmbedder. LiteLLM also exposeslitellm.aembedding(Cohere, Voyage, Mistral, Bedrock Titan, Vertex, ...). Glad to ship this in a separate PR if you'd like the same single-implementationcoverage on the embedder side.
litellmtodependenciesdirectly. Currently lazy-imported inside the request function; can promote to an optional extramemmachine-server[litellm]if you'd prefer it gated.