feat(language_model): add LiteLLM provider for 100+ backings by RheagalFire · Pull Request #1386 · MemMachine/MemMachine

RheagalFire · 2026-05-01T19:24:51Z

Purpose of the change

Today every new LLM backing in MemMachine (Cohere, Mistral, Groq, Together, ...) requires writing another LanguageModel from scratch alongside OpenAIChatCompletionsLanguageModel /
OpenAIResponsesLanguageModel / AmazonBedrockLanguageModel. This PR adds a single LiteLLMLanguageModel that delegates the actual provider call to the LiteLLM SDK, giving
MemMachine coverage of the 100+ providers LiteLLM supports (OpenAI, Anthropic, AWS Bedrock, Vertex AI, Cohere, Mistral, Groq, Perplexity, Together, Fireworks, Cerebras, Databricks, IBM Watsonx, AI21,
Replicate, DeepInfra, NVIDIA NIM, xAI, Sambanova, ...) by changing only the model spec.

It also adds a third deployment mode (LiteLLM proxy server) useful for centralized credential management and audit logging.

Description

LiteLLMLanguageModel subclasses OpenAIChatCompletionsLanguageModel and overrides only _request_chat_completion to call litellm.acompletion(**args) instead of client.chat.completions.create(**args).
LiteLLM normalizes every backing's response to OpenAI's ChatCompletion shape, so the parent's parsing, streaming, tool-call accumulation, structured-output handling, and metrics paths inherit unchanged.

Configuration mirrors the existing OpenAI shape:

language_models:                                                                                                                                                                                                   
  litellm_language_model_confs:            
    sonnet:                                                                                                                                                                                                        
      model: anthropic/claude-sonnet-4-6                     
    gpt4o:                                                                                                                                                                                                         
      model: openai/gpt-4o                                                                                                                                                                                         
    cohere:                                                  
      model: cohere/command-r-plus-08-2024                                                                                                                                                                         
                                                  
    # Proxy mode (centralized credentials)                   
    proxied:                                                                                                                                                                                                       
      model: anthropic/claude-sonnet-4-6                     
      api_base: http://localhost:4000                                                                                                                                                                              
      api_key: sk-fastagent-proxy-1234

In embedded mode (no api_base), LiteLLM resolves credentials from each backing's standard env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, AWS_ACCESS_KEY_ID, ...) at call time. In proxy mode, calls route
through a LiteLLM proxy server that holds the credentials.

Dependency: litellm>=1.60,<1.85. Imported lazily inside the request function so users who don't configure a litellm_language_model_confs entry don't need it. Happy to make this an optional extra
(memmachine-server[litellm]) instead if preferred.

Files added:

packages/server/src/memmachine_server/common/language_model/litellm_language_model.py (new, 224 LOC)
packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py (new, 270 LOC)

Files modified:

packages/server/src/memmachine_server/common/configuration/language_model_conf.py (+71): new LiteLLMLanguageModelConf, litellm_language_model_confs dict on LanguageModelsConf, helper accessors.
packages/server/src/memmachine_server/common/resource_manager/language_model_manager.py (+50): wired litellm into _is_configured, get_all_names, _build_language_model, add_language_model_config,
remove_language_model; new _build_litellm_language_model builder.

Fixes/Closes

N/A (no related issue; happy to open one and link if preferred).

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Unit Test
End-to-end Test (manual against a real provider)

Unit tests: 10 new tests in test_litellm_language_model.py covering init, dispatch, kwarg forwarding, retry behavior, and inherited parsing.

$ pytest packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py -v                                                                                                    
                                                  
test_init_does_not_require_openai_client                                       PASSED                                                                                                                              
test_request_chat_completion_dispatches_to_litellm                             PASSED
test_request_chat_completion_forwards_api_base_and_key                         PASSED                                                                                                                              
test_request_chat_completion_does_not_overwrite_explicit_kwargs                PASSED                                                                                                                              
test_request_chat_completion_extra_kwargs_forwarded                            PASSED                                                                                                                              
test_request_chat_completion_retries_on_retryable_error                        PASSED                                                                                                                              
test_request_chat_completion_raises_external_service_error_after_max_attempts  PASSED
test_request_chat_completion_non_retryable_error_raises_immediately            PASSED                                                                                                                              
test_is_retryable_litellm_error_recognizes_known_classes                       PASSED
test_generate_response_uses_parent_parsing                                     PASSED                                                                                                                              
                                                                                                                                                                                                                   
10 passed in 5.24s

Coverage:

LiteLLMLanguageModel.__init__ does not require an AsyncOpenAI client (parent's hard requirement is bypassed cleanly).
_request_chat_completion dispatches to litellm.acompletion with the right model spec.
api_key / api_base / api_version are forwarded only when set on params; caller-supplied kwargs win over shim defaults (no silent override).
extra_kwargs from params (metadata, tags, ...) reach litellm.acompletion.
Retryable errors (RateLimitError, APITimeoutError, APIConnectionError, InternalServerError, ServiceUnavailableError, Timeout) trigger exponential backoff; non-retryable ones raise immediately.
After max_attempts, retryable errors surface as ExternalServiceAPIError.
generate_response inherits the parent's OpenAI-shape parsing unchanged when _request_chat_completion returns a ChatCompletion.

Lint + type-check (CI parity):

$ ruff check <touched files>           ─►  All checks passed!                                                                                                                                                      
$ ruff format --check <touched files>  ─►  3 files already formatted                                                                                                                                               
$ ty check --project packages/server --python-version 3.12 <touched files>  ─►  All checks passed!

End-to-end test (Anthropic via Azure AI Foundry):

import asyncio                                                                                                                                                                                                     
from memmachine_server.common.language_model.litellm_language_model import (                                                                                                                                       
    LiteLLMLanguageModel, LiteLLMLanguageModelParams,                                                                                                                                                              
)                                                                                                                                                                                                                  
                                                                                                                                                                                                                   
async def main():                                                                                                                                                                                                  
    lm = LiteLLMLanguageModel(LiteLLMLanguageModelParams(                                                                                                                                                          
        model="anthropic/claude-sonnet-4-6",                                                                                                                                                                       
    ))                                            
    text, tools = await lm.generate_response(                                                                                                                                                                      
        system_prompt="You answer with a single word.",                                                                                                                                                          
        user_prompt="Reply with: pong.",                                                                                                                                                                           
    )                                                                                                                                                                                                            
    print(text)        # 'pong.'                                                                                                                                                                                   
    print(tools)       # []                                   
                                                                                                                                                                                                                   
asyncio.run(main())

Output: 'pong.'. The wrapped call routed through litellm.acompletion to Anthropic and the response was parsed by the inherited OpenAI parser. The same LiteLLMLanguageModel would route via OpenAI / Bedrock
/ Cohere / Mistral / ... by changing only the model spec.

Test Results: All 10 unit tests pass; lint, format, and ty are clean; live E2E returns the expected reply.

Checklist

Maintainer Checklist

Confirmed all checks passed
Contributor has signed the commit(s)
Reviewed the code
Run, Tested, and Verified the change(s) work as expected

Screenshots/Gifs

N/A (backend-only change; live E2E output included in "How Has This Been Tested?").

Further comments

Out of scope (happy to follow up):

A parallel LiteLLMEmbedder. LiteLLM also exposes litellm.aembedding (Cohere, Voyage, Mistral, Bedrock Titan, Vertex, ...). Glad to ship this in a separate PR if you'd like the same single-implementation
coverage on the embedder side.
Adding litellm to dependencies directly. Currently lazy-imported inside the request function; can promote to an optional extra memmachine-server[litellm] if you'd prefer it gated.

RheagalFire · 2026-05-01T19:25:45Z

cc @sscargal

edwinyyyu

Is there a reason not to add litellm as an optional dependency in server pyproject.toml?

malatewang · 2026-05-01T22:08:55Z

+    def get_litellm_language_model_conf(self, name: str) -> "LiteLLMLanguageModelConf":
+        """Get LiteLLM language model configuration by name."""
+        return self.litellm_language_model_confs[name]
+


The parse and to_yaml_dict needs update too

malatewang · 2026-05-01T22:10:11Z

+        max_attempts: int,
+        generate_response_call_uuid: object,
+    ) -> ChatCompletion | AsyncIterator[object] | object:
+        import litellm


update the project.toml file to install the module

Copilot

Pull request overview

Adds a new server-side LiteLLMLanguageModel provider intended to expand MemMachine’s supported LLM backends by delegating requests to the LiteLLM SDK while reusing the existing OpenAI chat-completions parsing/streaming/tool-call logic.

Changes:

Introduces LiteLLMLanguageModel (subclassing OpenAIChatCompletionsLanguageModel) that routes requests via litellm.acompletion.
Extends language model configuration and the LanguageModelManager to register/build/remove LiteLLM-backed models.
Adds unit tests for the LiteLLM adapter behavior (dispatch, kwarg forwarding, retries, and parent parsing).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
packages/server/src/memmachine_server/common/resource_manager/language_model_manager.py	Wires LiteLLM configs into manager lookups and adds a builder for `LiteLLMLanguageModel`.
packages/server/src/memmachine_server/common/language_model/litellm_language_model.py	New LiteLLM adapter that swaps the request implementation to `litellm.acompletion` and adds retry logic.
packages/server/src/memmachine_server/common/configuration/language_model_conf.py	Adds `LiteLLMLanguageModelConf` and a `litellm_language_model_confs` collection plus accessors.
packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py	New unit tests validating LiteLLM dispatch/forwarding/retry behavior and inherited parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+from typing import Any
+from unittest.mock import AsyncMock, MagicMock, patch
+


Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

+# Inject a fake litellm module so tests don't require the real package.
+if "litellm" not in sys.modules:
+    _fake_litellm = types.ModuleType("litellm")
+    _fake_litellm.acompletion = AsyncMock()
+    sys.modules["litellm"] = _fake_litellm
+
+from memmachine_server.common.data_types import ExternalServiceAPIError
+from memmachine_server.common.language_model.litellm_language_model import (
+    LiteLLMLanguageModel,
+    LiteLLMLanguageModelParams,
+    _is_retryable_litellm_error,
+)


        ret: LanguageModel | None = None
        if name in self.conf.openai_responses_language_model_confs:
            ret = self._build_openai_responses_language_model(name)
        if name in self.conf.openai_chat_completions_language_model_confs:
            ret = self._build_openai_chat_completions_language_model(name)
        if name in self.conf.amazon_bedrock_language_model_confs:
            ret = self._build_amazon_bedrock_language_model(name)
+        if name in self.conf.litellm_language_model_confs:
+            ret = self._build_litellm_language_model(name)


+litellm = [
+    "litellm>=1.63.0",
+]


+        try:
+            import litellm
+        except ImportError as e:
+            raise ImportError(
+                "litellm is required for LiteLLMLanguageModel. "
+                "Install it with: pip install memmachine-server[litellm]"
+            ) from e


edwinyyyu · 2026-05-06T23:22:34Z

Please ensure CI passes.

sscargal

@RheagalFire, in addition to @edwinyyyu and CoPilot feedback, please sign your commits. Thanks.

Signed-off-by: RheagalFire <arishalam121@gmail.com>

…mport

…test Signed-off-by: Aarish Alam <arishalam121@gmail.com>

Signed-off-by: Aarish Alam <arishalam121@gmail.com>

…nguage-model Signed-off-by: Aarish Alam <arishalam121@gmail.com> # Conflicts: # uv.lock

malatewang requested review from edwinyyyu, jealous and malatewang and removed request for edwinyyyu and jealous May 1, 2026 21:40

edwinyyyu approved these changes May 1, 2026

View reviewed changes

edwinyyyu reviewed May 1, 2026

View reviewed changes

malatewang requested changes May 1, 2026

View reviewed changes

sscargal requested review from Copilot and sscargal May 1, 2026 23:53

Copilot started reviewing on behalf of sscargal May 1, 2026 23:53 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

sscargal requested a review from Copilot May 6, 2026 22:46

Copilot started reviewing on behalf of sscargal May 6, 2026 22:47 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

sscargal requested changes May 6, 2026

View reviewed changes

RheagalFire force-pushed the feat/add-litellm-language-model branch from 7fe5012 to 951bbf6 Compare May 7, 2026 08:08

RheagalFire added 3 commits May 7, 2026 13:53

feat(language_model): add LiteLLM provider for 100+ backings

a30d813

Signed-off-by: RheagalFire <arishalam121@gmail.com>

fix: add LiteLLM to config parse/serialize, add optional dep, guard i…

36f7636

…mport

fix: resolve CI failures — lint, type check, uv.lock, elif chain

5859715

RheagalFire force-pushed the feat/add-litellm-language-model branch from 951bbf6 to 5859715 Compare May 7, 2026 08:23

RheagalFire added 2 commits May 7, 2026 13:58

fix: address Copilot feedback — restore sys.modules, add ImportError …

40a08c6

…test Signed-off-by: Aarish Alam <arishalam121@gmail.com>

fix: resolve CI lint and type check failures

6b5e6ee

Signed-off-by: Aarish Alam <arishalam121@gmail.com>

RheagalFire force-pushed the feat/add-litellm-language-model branch from 711c275 to 6b5e6ee Compare May 11, 2026 22:21

Merge remote-tracking branch 'upstream/main' into feat/add-litellm-la…

c34dccb

…nguage-model Signed-off-by: Aarish Alam <arishalam121@gmail.com> # Conflicts: # uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(language_model): add LiteLLM provider for 100+ backings#1386

feat(language_model): add LiteLLM provider for 100+ backings#1386
RheagalFire wants to merge 6 commits into
MemMachine:mainfrom
RheagalFire:feat/add-litellm-language-model

RheagalFire commented May 1, 2026

Uh oh!

RheagalFire commented May 1, 2026

Uh oh!

edwinyyyu left a comment

Uh oh!

malatewang May 1, 2026

Uh oh!

malatewang May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

edwinyyyu commented May 6, 2026

Uh oh!

sscargal left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		from typing import Any
		from unittest.mock import AsyncMock, MagicMock, patch

Conversation

RheagalFire commented May 1, 2026

Purpose of the change

Description

Fixes/Closes

Type of change

How Has This Been Tested?

Checklist

Maintainer Checklist

Screenshots/Gifs

Further comments

Uh oh!

RheagalFire commented May 1, 2026

Uh oh!

edwinyyyu left a comment

Choose a reason for hiding this comment

Uh oh!

malatewang May 1, 2026

Choose a reason for hiding this comment

Uh oh!

malatewang May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

edwinyyyu commented May 6, 2026

Uh oh!

sscargal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants