Skip to content

feat(language_model): add LiteLLM provider for 100+ backings#1386

Open
RheagalFire wants to merge 6 commits into
MemMachine:mainfrom
RheagalFire:feat/add-litellm-language-model
Open

feat(language_model): add LiteLLM provider for 100+ backings#1386
RheagalFire wants to merge 6 commits into
MemMachine:mainfrom
RheagalFire:feat/add-litellm-language-model

Conversation

@RheagalFire
Copy link
Copy Markdown

Purpose of the change

Today every new LLM backing in MemMachine (Cohere, Mistral, Groq, Together, ...) requires writing another LanguageModel from scratch alongside OpenAIChatCompletionsLanguageModel /
OpenAIResponsesLanguageModel / AmazonBedrockLanguageModel. This PR adds a single LiteLLMLanguageModel that delegates the actual provider call to the LiteLLM SDK, giving
MemMachine coverage of the 100+ providers LiteLLM supports (OpenAI, Anthropic, AWS Bedrock, Vertex AI, Cohere, Mistral, Groq, Perplexity, Together, Fireworks, Cerebras, Databricks, IBM Watsonx, AI21,
Replicate, DeepInfra, NVIDIA NIM, xAI, Sambanova, ...) by changing only the model spec.

It also adds a third deployment mode (LiteLLM proxy server) useful for centralized credential management and audit logging.

Description

LiteLLMLanguageModel subclasses OpenAIChatCompletionsLanguageModel and overrides only _request_chat_completion to call litellm.acompletion(**args) instead of client.chat.completions.create(**args).
LiteLLM normalizes every backing's response to OpenAI's ChatCompletion shape, so the parent's parsing, streaming, tool-call accumulation, structured-output handling, and metrics paths inherit unchanged.

Configuration mirrors the existing OpenAI shape:

language_models:                                                                                                                                                                                                   
  litellm_language_model_confs:            
    sonnet:                                                                                                                                                                                                        
      model: anthropic/claude-sonnet-4-6                     
    gpt4o:                                                                                                                                                                                                         
      model: openai/gpt-4o                                                                                                                                                                                         
    cohere:                                                  
      model: cohere/command-r-plus-08-2024                                                                                                                                                                         
                                                  
    # Proxy mode (centralized credentials)                   
    proxied:                                                                                                                                                                                                       
      model: anthropic/claude-sonnet-4-6                     
      api_base: http://localhost:4000                                                                                                                                                                              
      api_key: sk-fastagent-proxy-1234                                                                                                                                                                             

In embedded mode (no api_base), LiteLLM resolves credentials from each backing's standard env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, AWS_ACCESS_KEY_ID, ...) at call time. In proxy mode, calls route
through a LiteLLM proxy server that holds the credentials.

Dependency: litellm>=1.60,<1.85. Imported lazily inside the request function so users who don't configure a litellm_language_model_confs entry don't need it. Happy to make this an optional extra
(memmachine-server[litellm]) instead if preferred.

Files added:

  • packages/server/src/memmachine_server/common/language_model/litellm_language_model.py (new, 224 LOC)
  • packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py (new, 270 LOC)

Files modified:

  • packages/server/src/memmachine_server/common/configuration/language_model_conf.py (+71): new LiteLLMLanguageModelConf, litellm_language_model_confs dict on LanguageModelsConf, helper accessors.
  • packages/server/src/memmachine_server/common/resource_manager/language_model_manager.py (+50): wired litellm into _is_configured, get_all_names, _build_language_model, add_language_model_config,
    remove_language_model; new _build_litellm_language_model builder.

Fixes/Closes

N/A (no related issue; happy to open one and link if preferred).

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Unit Test
  • End-to-end Test (manual against a real provider)

Unit tests: 10 new tests in test_litellm_language_model.py covering init, dispatch, kwarg forwarding, retry behavior, and inherited parsing.

$ pytest packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py -v                                                                                                    
                                                  
test_init_does_not_require_openai_client                                       PASSED                                                                                                                              
test_request_chat_completion_dispatches_to_litellm                             PASSED
test_request_chat_completion_forwards_api_base_and_key                         PASSED                                                                                                                              
test_request_chat_completion_does_not_overwrite_explicit_kwargs                PASSED                                                                                                                              
test_request_chat_completion_extra_kwargs_forwarded                            PASSED                                                                                                                              
test_request_chat_completion_retries_on_retryable_error                        PASSED                                                                                                                              
test_request_chat_completion_raises_external_service_error_after_max_attempts  PASSED
test_request_chat_completion_non_retryable_error_raises_immediately            PASSED                                                                                                                              
test_is_retryable_litellm_error_recognizes_known_classes                       PASSED
test_generate_response_uses_parent_parsing                                     PASSED                                                                                                                              
                                                                                                                                                                                                                   
10 passed in 5.24s                                                                                                                                                                                                 

Coverage:

  • LiteLLMLanguageModel.__init__ does not require an AsyncOpenAI client (parent's hard requirement is bypassed cleanly).
  • _request_chat_completion dispatches to litellm.acompletion with the right model spec.
  • api_key / api_base / api_version are forwarded only when set on params; caller-supplied kwargs win over shim defaults (no silent override).
  • extra_kwargs from params (metadata, tags, ...) reach litellm.acompletion.
  • Retryable errors (RateLimitError, APITimeoutError, APIConnectionError, InternalServerError, ServiceUnavailableError, Timeout) trigger exponential backoff; non-retryable ones raise immediately.
  • After max_attempts, retryable errors surface as ExternalServiceAPIError.
  • generate_response inherits the parent's OpenAI-shape parsing unchanged when _request_chat_completion returns a ChatCompletion.

Lint + type-check (CI parity):

$ ruff check <touched files>           ─►  All checks passed!                                                                                                                                                      
$ ruff format --check <touched files>  ─►  3 files already formatted                                                                                                                                               
$ ty check --project packages/server --python-version 3.12 <touched files>  ─►  All checks passed!                                                                                                                 

End-to-end test (Anthropic via Azure AI Foundry):

import asyncio                                                                                                                                                                                                     
from memmachine_server.common.language_model.litellm_language_model import (                                                                                                                                       
    LiteLLMLanguageModel, LiteLLMLanguageModelParams,                                                                                                                                                              
)                                                                                                                                                                                                                  
                                                                                                                                                                                                                   
async def main():                                                                                                                                                                                                  
    lm = LiteLLMLanguageModel(LiteLLMLanguageModelParams(                                                                                                                                                          
        model="anthropic/claude-sonnet-4-6",                                                                                                                                                                       
    ))                                            
    text, tools = await lm.generate_response(                                                                                                                                                                      
        system_prompt="You answer with a single word.",                                                                                                                                                          
        user_prompt="Reply with: pong.",                                                                                                                                                                           
    )                                                                                                                                                                                                            
    print(text)        # 'pong.'                                                                                                                                                                                   
    print(tools)       # []                                   
                                                                                                                                                                                                                   
asyncio.run(main())                               

Output: 'pong.'. The wrapped call routed through litellm.acompletion to Anthropic and the response was parsed by the inherited OpenAI parser. The same LiteLLMLanguageModel would route via OpenAI / Bedrock
/ Cohere / Mistral / ... by changing only the model spec.

Test Results: All 10 unit tests pass; lint, format, and ty are clean; live E2E returns the expected reply.

Checklist

  • I have signed the commit(s) within this pull request (needs -sS per CONTRIBUTING.md; will rebase before merge if needed)
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation (N/A; new optional provider; happy to add docs in a follow-up)
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected

Screenshots/Gifs

N/A (backend-only change; live E2E output included in "How Has This Been Tested?").

Further comments

Out of scope (happy to follow up):

  • A parallel LiteLLMEmbedder. LiteLLM also exposes litellm.aembedding (Cohere, Voyage, Mistral, Bedrock Titan, Vertex, ...). Glad to ship this in a separate PR if you'd like the same single-implementation
    coverage on the embedder side.
  • Adding litellm to dependencies directly. Currently lazy-imported inside the request function; can promote to an optional extra memmachine-server[litellm] if you'd prefer it gated.

@RheagalFire
Copy link
Copy Markdown
Author

cc @sscargal

@malatewang malatewang requested review from edwinyyyu, jealous and malatewang and removed request for edwinyyyu and jealous May 1, 2026 21:40
Copy link
Copy Markdown
Contributor

@edwinyyyu edwinyyyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to add litellm as an optional dependency in server pyproject.toml?

def get_litellm_language_model_conf(self, name: str) -> "LiteLLMLanguageModelConf":
"""Get LiteLLM language model configuration by name."""
return self.litellm_language_model_confs[name]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parse and to_yaml_dict needs update too

max_attempts: int,
generate_response_call_uuid: object,
) -> ChatCompletion | AsyncIterator[object] | object:
import litellm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the project.toml file to install the module

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new server-side LiteLLMLanguageModel provider intended to expand MemMachine’s supported LLM backends by delegating requests to the LiteLLM SDK while reusing the existing OpenAI chat-completions parsing/streaming/tool-call logic.

Changes:

  • Introduces LiteLLMLanguageModel (subclassing OpenAIChatCompletionsLanguageModel) that routes requests via litellm.acompletion.
  • Extends language model configuration and the LanguageModelManager to register/build/remove LiteLLM-backed models.
  • Adds unit tests for the LiteLLM adapter behavior (dispatch, kwarg forwarding, retries, and parent parsing).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
packages/server/src/memmachine_server/common/resource_manager/language_model_manager.py Wires LiteLLM configs into manager lookups and adds a builder for LiteLLMLanguageModel.
packages/server/src/memmachine_server/common/language_model/litellm_language_model.py New LiteLLM adapter that swaps the request implementation to litellm.acompletion and adds retry logic.
packages/server/src/memmachine_server/common/configuration/language_model_conf.py Adds LiteLLMLanguageModelConf and a litellm_language_model_confs collection plus accessors.
packages/server/server_tests/memmachine_server/common/language_model/test_litellm_language_model.py New unit tests validating LiteLLM dispatch/forwarding/retry behavior and inherited parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +12
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment on lines +18 to +29
# Inject a fake litellm module so tests don't require the real package.
if "litellm" not in sys.modules:
_fake_litellm = types.ModuleType("litellm")
_fake_litellm.acompletion = AsyncMock()
sys.modules["litellm"] = _fake_litellm

from memmachine_server.common.data_types import ExternalServiceAPIError
from memmachine_server.common.language_model.litellm_language_model import (
LiteLLMLanguageModel,
LiteLLMLanguageModelParams,
_is_retryable_litellm_error,
)
Comment on lines 180 to +188
ret: LanguageModel | None = None
if name in self.conf.openai_responses_language_model_confs:
ret = self._build_openai_responses_language_model(name)
if name in self.conf.openai_chat_completions_language_model_confs:
ret = self._build_openai_chat_completions_language_model(name)
if name in self.conf.amazon_bedrock_language_model_confs:
ret = self._build_amazon_bedrock_language_model(name)
if name in self.conf.litellm_language_model_confs:
ret = self._build_litellm_language_model(name)
Comment on lines +63 to +65
litellm = [
"litellm>=1.63.0",
]
Comment on lines +159 to +165
try:
import litellm
except ImportError as e:
raise ImportError(
"litellm is required for LiteLLMLanguageModel. "
"Install it with: pip install memmachine-server[litellm]"
) from e
@edwinyyyu
Copy link
Copy Markdown
Contributor

Please ensure CI passes.

Copy link
Copy Markdown
Contributor

@sscargal sscargal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RheagalFire, in addition to @edwinyyyu and CoPilot feedback, please sign your commits. Thanks.

@RheagalFire RheagalFire force-pushed the feat/add-litellm-language-model branch from 7fe5012 to 951bbf6 Compare May 7, 2026 08:08
@RheagalFire RheagalFire force-pushed the feat/add-litellm-language-model branch from 951bbf6 to 5859715 Compare May 7, 2026 08:23
…test

Signed-off-by: Aarish Alam <arishalam121@gmail.com>
Signed-off-by: Aarish Alam <arishalam121@gmail.com>
@RheagalFire RheagalFire force-pushed the feat/add-litellm-language-model branch from 711c275 to 6b5e6ee Compare May 11, 2026 22:21
…nguage-model

Signed-off-by: Aarish Alam <arishalam121@gmail.com>

# Conflicts:
#	uv.lock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants