feat(proxy): nested model access groups with cycle-safe resolution (#28032) by wtfashwin · Pull Request #28075 · BerriAI/litellm

wtfashwin · 2026-05-16T18:59:49Z

What

Adds hierarchical model access groups so a group can include other groups as members. Permission changes on a child group propagate to every parent that includes it, with no manual fan-out.

Example:

restricted-image     -> [dall-e-3, stable-diffusion-xl]
restricted-reasoning -> [o1, o3-mini]
project-x            -> [restricted-image, restricted-reasoning]

A key/team assigned project-x now resolves to all four real models.

Design

Storage

New LiteLLM_AccessGroupMembership(parent_group, child_group) table holds parent->child edges. Existing tag-based groups in model_info.access_groups are untouched - additive, fully backwards compatible.

model LiteLLM_AccessGroupMembership {
  id           String   @id @default(uuid())
  parent_group String
  child_group  String
  created_at   DateTime @default(now())

  @@unique([parent_group, child_group])
  @@index([parent_group])
}

Migration: litellm-proxy-extras/litellm_proxy_extras/migrations/20260516210000_add_access_group_membership/migration.sql with IF NOT EXISTS everywhere. Synced across all three schema.prisma files per the CLAUDE.md rule.

Resolution

resolve_nested_groups() does DFS expansion with on-path cycle detection. Cyclic edges are logged and skipped, not raised - this runs on the auth path and a malformed row must not 500 the proxy. On-path tracking (vs. ever-visited) means legitimate DAG-shared subtrees do NOT produce spurious cycle warnings.

_get_models_from_access_groups() gets a new optional group_memberships kwarg, defaulted to None. When absent/empty, behavior is bit-for-bit identical to today.

Write path

validate_models_exist() now accepts optional known_access_groups so a group's members can themselves be groups
_classify_member_names() splits the caller's list into real models / child groups / unknown
Real models -> existing update_deployments_with_access_group() path (tags deployments)
Child groups -> new upsert_group_memberships() path (writes edges, create_many(skip_duplicates=True))
Self-references (parent == child) are rejected at write time with 400
Multi-hop cycles are caught by the read-path guard

Both create_model_group and update_access_group go through a shared _dual_write_group_membership() helper. update clears existing parent-edges before re-inserting. delete cleans up edges where this group appears on either side via delete_group_membership_edges().

Request-path activation

get_available_models_for_user() (utils.py) and model_info_v1() (proxy_server.py) now fetch the membership map once via get_group_memberships_from_db() (single query, no N+1) and pass it through to get_key_models / get_team_models. When prisma_client is None the map stays empty and behavior is identical to today.

`AccessGroupInfo` API

Two new fields, always present (empty lists when no edges), additive only:

class AccessGroupInfo(BaseModel):
    access_group: str
    model_names: List[str]
    deployment_count: int
    parent_groups: List[str] = Field(default_factory=list)
    child_groups: List[str] = Field(default_factory=list)

get_all_access_groups_from_db() expands model_names transitively, surfaces pure-composition groups (groups that exist only in the membership table, no deployment tags), and populates parent/child fields.

Tests

35 tests in tests/test_litellm/proxy/auth/test_nested_access_groups.py, all pure functions, no DB / no fake Prisma client needed:

resolution: flat compat, single/multi-hop nesting, propagation, pure-composition, diamond
cycles: self-ref, indirect, 3-hop, self-ref nested deep in chain (each fires the warning exactly once at the correct node)
DAG correctness: shared subtree produces no false warnings, per-top-level visited isolation
scale: 50-level linear chain (recursion depth proof), 100-child fanout (no quadratic blowup)
classify: router-model precedence over group name, mixed input, input-order preservation
validate: accepts known groups, reports unknown, backwards compatible without groups, fail-closed when router is None
defensive: empty inputs, unknown seed, missing child reference, empty child list, unicode/special-char names
equivalence: group_memberships=None vs {} behave identically

Local checks

$ uv run --no-sync black --check <changed files>
All done!  6 files would be left unchanged.

$ uv run --no-sync ruff check .
All checks passed!

$ uv run pytest tests/test_litellm/proxy/auth/test_nested_access_groups.py -x -vv
============== 35 passed in 1.52s ==============

Backwards compatibility

Surface	Guarantee
`_get_models_from_access_groups` without `group_memberships`	Bit-for-bit identical
`get_key_models`, `get_team_models`, `get_complete_model_list` without new kwarg	Identical
`validate_models_exist` without `known_access_groups`	Identical
`AccessGroupInfo` response	New fields are additive with `[]` defaults
`model_info["access_groups"]` storage	Untouched

Out of scope (deliberate)

UI: dashboard tree-view / drag-and-drop editor for nested groups (separate PR)
New LiteLLM_AccessGroup parent table: would require migrating every existing tag into rows + rewriting every read path. The membership-table approach gets the feature in without that disruption
Per-request hot-path activation in user_api_key_auth: current activation is on the listing/info endpoints. Hot-path would need router-level caching of the membership map; can follow up if maintainers want it

Checklist

CLA signed
Scope isolated to one feature
Tests added (tests/test_litellm/proxy/auth/test_nested_access_groups.py)
make lint passes for changed files (ruff clean, black clean on touched files)
Schema synced across all 3 schema.prisma files + migration added
Prisma client methods only (no raw SQL); single query for membership read; create_many(skip_duplicates=True) for batch writes
Duplicate-PR check: no overlapping open or closed PR for [Feature]: Nested model access groups #28032 / nested access groups / AccessGroupMembership

[Infra] Promote internal staging to main

Holds parent/child edges between access groups so a group can include another group's models. Schema + migration only; resolution next. Refs BerriAI#28032

_get_models_from_access_groups takes a new optional group_memberships map (parent -> [child_groups]) and expands recursively via _resolve_nested_groups. Cycles are logged and skipped, not raised - this runs on the auth path and a malformed row must not 500 the proxy. Backwards compatible: callers that don't pass group_memberships get identical behavior to today. get_key_models, get_team_models thread the new kwarg through. Refs BerriAI#28032

validate_models_exist accepts known access group names so a group's members can themselves be groups. _classify_member_names splits the caller's list into real models vs child groups, and each is routed to its own write path. create_model_group / update_access_group: classify -> dual-write (deployments tagged for real models, parent->child edges for child groups). update_access_group also clears prior edges where this group is parent before re-inserting. delete_access_group: also deletes membership edges where this group appears on either side, to avoid dangling references. AccessGroupInfo gains parent_groups + child_groups (always present, empty by default). get_all_access_groups_from_db expands model_names transitively via _resolve_nested_groups, surfaces pure-composition groups that exist only in the membership table, and reports parent/ child relationships. Self-references (parent == child) are rejected at write time with 400. Multi-hop cycles are caught by the read-path DFS guard from the prior commit. Refs BerriAI#28032

Makes the feature actually active end-to-end. get_available_models_for_user (utils.py) and model_info_v1 (proxy_server.py) now fetch group_memberships once per call and pass them to get_key_models / get_team_models. When prisma_client is None (SDK-only mode) the map stays empty and behavior is identical to today. Also: - Rename _resolve_nested_groups -> resolve_nested_groups since it's used across modules; leading underscore was misleading. - Update NewModelGroupResponse.models_updated field comment to reflect the new semantics (deployment tags + membership edges, not just models). Refs BerriAI#28032

11 unit tests for the nested-groups feature: - resolve_nested_groups: flat compat, single/multi-hop nesting, self- and indirect cycles (warning logged + safe degradation), propagation, pure-composition groups, diamond inheritance - _classify_member_names: router model takes precedence over group name when both match; mixed input splits into the three buckets - validate_models_exist: accepts known group names; reports truly unknown; backwards compatible when no groups passed All pure-function tests - no DB / no fake Prisma client needed. Refs BerriAI#28032

Previously the visited set tracked every node ever seen during a top-level resolution. That meant DAG-shared subtrees (A -> [B, C], B -> D, C -> D) triggered a spurious 'cycle detected' warning on D's second visit. Switch to on-path tracking: discard the node from visited when its frame unwinds. Real cycles still fire the warning; DAG re-traversal is silent. Caller already dedups the final list. Adds a regression test (test_dag_shared_group_subtree_no_false_cycle_warning) that asserts no warning is logged for the legitimate DAG case. Refs BerriAI#28032

Adds 22 tests covering pathological/edge inputs to lock the algorithm against silent regressions: - empty inputs: all_models=[], group_memberships={}, dangling groups - defensive: unknown seed, membership pointing to missing child, null router, empty child list - cycles: 3-hop (A->B->C->A), self-ref nested deep in chain, message identifies the right node - scale: 50-level linear chain (no stack overflow), 100-child fanout - DAG / dedup: shared subtrees produce no false warnings, duplicate top-level groups expand independently, visited isolation between top-level calls - format: unicode/special-char group names, classify preserves input order for error legibility - equivalence: group_memberships=None vs {} behave identically Total: 35 tests against pure functions, no DB needed. Refs BerriAI#28032

ruff was unhappy with update_access_group at 52 statements. extracted two helpers that create/update/delete all needed anyway: - _clear_access_group_from_all_deployments (the deployment-tag loop) - _dual_write_group_membership (classify then route to deployments or memberships) endpoints are shorter now. PLR0915 happy. refs BerriAI#28032

CLAassistant · 2026-05-16T18:59:56Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ wtfashwin
❌ shin-berri
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

CI's recursive_detector forbids new recursive functions (it has caused CPU spikes in the past). Switching to an explicit stack with ENTER/EXIT frame markers preserves on-path cycle tracking: visited grows on ENTER, shrinks on EXIT. Real cycles still log+skip; DAG-shared subtrees re-traverse silently as before. Same algorithm, same O(V + E) complexity, no Python recursion overhead. All 35 tests pass unchanged. Refs BerriAI#28032

codecov · 2026-05-16T19:02:58Z

Codecov Report

❌ Patch coverage is 89.80583% with 21 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/_experimental/mcp_server/server.py	75.47%	13 Missing ⚠️
litellm/integrations/prometheus.py	88.23%	4 Missing ⚠️
litellm/llms/gemini/google_genai/transformation.py	90.90%	2 Missing ⚠️
...erimental/mcp_server/auth/user_api_key_auth_mcp.py	95.55%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

15 tests using a hand-rolled fake Prisma client (no Postgres required): - get_group_memberships_from_db: bucketing, empty table - upsert_group_memberships: empty-list noop, self-reference rejection, batch create_many with skip_duplicates=True - delete_group_membership_edges: OR-clause covers parent and child - _clear_access_group_from_all_deployments: only touches deployments carrying the tag, no-op when nothing matches - _dual_write_group_membership: real-models path, child-groups path, mixed input, null router - get_all_access_groups_from_db: flat-only (backwards compat), nested expansion of model_names + parent/child fields, pure-composition groups (in membership table only, no deployment tags) Total nested-group test count: 50 (35 pure-function + 15 async DB). Refs BerriAI#28032

8 existing tests broke because they pass a plain MagicMock as the prisma_client and the new awaited find_many trips on await. same can hit prod - the proxy can boot before litellm-proxy-extras finishes prisma migrate deploy, so the model is briefly missing from the prisma client. wrap find_many in a narrow try/except (AttributeError, TypeError), return {} on miss. auth path falls back to today's flat-group behavior instead of 500'ing. two regression tests so we don't forget. refs BerriAI#28032

greptile-apps · 2026-05-16T19:37:46Z

Greptile Summary

This PR adds hierarchical model access groups to the LiteLLM proxy, allowing a group to include other groups as members with transitive DFS resolution and cycle-safe skipping. All five issues raised in the previous review round have been addressed: validate_models_exist now consults known_access_groups even when llm_router is None; get_group_memberships_from_db catches broad Exception; a second migration adds the missing child_group index; membership lookups are now served from a 60-second in-process TTL cache with explicit write-side invalidation; and update_access_group Step 1b no longer clears parent→child edges when the model_ids path is used.

Resolution logic (model_checks.py): Iterative DFS with on-path cycle detection replaces the previous flat lookup; callers already copy lists so in-place mutation is safe; deduplication via dict.fromkeys preserves order.
Storage & migration: New LiteLLM_AccessGroupMembership table with (parent_group, child_group) unique constraint, indexes on both columns across all three schema.prisma files, and IF NOT EXISTS guards in both migration files.
Test coverage: 35 pure-function tests plus dedicated files for cache semantics, DB helpers, and validate/classify logic — all mock-only, no real network calls.

Confidence Score: 5/5

All five previously flagged defects are remediated; the change is additive, backwards-compatible, and confined to listing endpoints rather than the per-request auth hot path.

The iterative DFS is correct (on-path visited tracking, safe for DAG-shared subtrees), the TTL cache invalidates synchronously on every write within the same process, the broad exception catch in the DB helper ensures the auth path cannot 500, the child_group index is present in all three schema files and both migration files, and the model_ids update path no longer silently destroys nested-group edges. No new defects were found during this review pass.

No files require special attention. The migration and schema changes are consistent across all three Prisma schema copies.

Important Files Changed

Filename	Overview
litellm/proxy/auth/model_checks.py	Adds iterative DFS `resolve_nested_groups` (cycle-safe, on-path visited tracking) and wires `group_memberships` kwarg through `_get_models_from_access_groups`, `get_key_models`, and `get_team_models`; callers already copy lists before passing, so in-place mutation is safe.
litellm/proxy/management_endpoints/model_access_group_management_endpoints.py	Adds TTL-cached membership map (60s, explicit invalidation on writes), `validate_models_exist` now checks `known_access_groups` even when `llm_router is None`, `get_group_memberships_from_db` catches broad `Exception`, and `update_access_group` Step 1b only clears edges when not in `model_ids` path — all five previously flagged issues addressed.
litellm/proxy/utils.py	Wires `get_cached_group_memberships` into `get_available_models_for_user`, gated on `prisma_client is not None`; falls back to empty dict preserving flat behavior when no DB is configured.
litellm/proxy/proxy_server.py	Adds the same TTL-cached membership fetch to `model_info_v1` and passes `group_memberships` to `get_key_models`/`get_team_models`; change is confined to the listing endpoint, not the per-request auth path.
litellm-proxy-extras/litellm_proxy_extras/migrations/20260517100000_index_child_group_on_access_group_membership/migration.sql	Adds `IF NOT EXISTS` index on `child_group` column, fixing the previously flagged full-table-scan on `delete_group_membership_edges`; all three schema.prisma files include `@@index([child_group])`.
tests/test_litellm/proxy/auth/test_nested_access_groups_cache.py	Six tests cover cache hit/miss, explicit invalidation via upsert/delete, TTL expiry, within-TTL no-refetch, and error-path empty-dict caching; uses `autouse` fixture to reset module-level state between tests.
tests/test_litellm/proxy/auth/test_nested_access_groups.py	35 pure-function tests covering flat compat, multi-hop nesting, diamond DAGs, cycle detection (self-ref, indirect, 3-hop), scale (50-level chain, 100-child fanout), and `group_memberships=None` vs `{}` equivalence; no real network calls.
litellm/types/proxy/management_endpoints/model_management_endpoints.py	Adds `parent_groups` and `child_groups` fields to `AccessGroupInfo` with `default_factory=list`; additive-only, fully backwards-compatible.

_{Reviews (3): Last reviewed commit: "test(proxy): split nested-access-groups ..." | Re-trigger Greptile}

four things from the bot review: 1. validate_models_exist bailed early with (False, all_names) when llm_router was None - never even checked known_access_groups. db-only setups couldn't create a group of groups at all. now we check known_groups even without a router. 2. get_group_memberships_from_db only caught AttributeError + TypeError, so a transient DB error would 500 every /v1/models call. broadened to Exception - the fallback now actually catches what the docstring says it does. 3. delete_group_membership_edges does WHERE parent OR child but only parent had an index. added @@index([child_group]) + a tiny migration. 4. get_group_memberships_from_db was being called on every model-listing request. added a 60s in-process TTL cache via get_cached_group_memberships, with explicit invalidation on every membership write. same shape as the existing llm_router.get_model_access_groups() cache. 9 tests for the above. refs BerriAI#28032

wtfashwin · 2026-05-16T20:10:36Z

@greptileai please re-review — addressed all four items from the previous review:

validate_models_exist now consults known_access_groups even when llm_router is None, so DB-only deployments can create groups of groups
get_group_memberships_from_db now catches Exception so transient DB errors degrade to flat-group fallback instead of 500-ing model-listing requests
added @@index([child_group]) on LiteLLM_AccessGroupMembership + a migration
added a 60s per-process TTL cache (get_cached_group_memberships) for the membership map with explicit invalidation on every write — hot-path callers in utils.py and proxy_server.py now go through it

61 unit tests total covering pure-function, async-DB-helper, and cache (hit/miss/invalidation/TTL/error-cache) behavior. All green locally + on CI.

oss-pr-review-agent-shin · 2026-05-16T20:16:21Z

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 3/5 ❌

Why blocked:

1 PR-related CI failure (Size gate: 1 file(s) over 500 added LOC — split first (tests/test_litellm/proxy/auth/test_nested_access_groups.py (+577)). Add the oversized-ok label or a Big-PR-Approved: <handle> trailer if a maintainer has signed off.) (pr_related_failures, -2 pts)

Details: Score docked for: 1 PR-related CI failure (Size gate: 1 file(s) over 500 added LOC — split first (tests/test_litellm/proxy/auth/test_nested_access_groups.py (+577)). Add the oversized-ok label or a Big-PR-Approved: <handle> trailer if a maintainer has signed off.).

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

update_access_group cleared parent->child edges unconditionally but only re-added them on the model_names path. update via model_ids would wipe the membership table for the group and never refill it - so a project-x -> [image, reasoning] composition vanished after a single model_ids update. gate the edge-clear behind not use_model_ids. adds two regression tests covering both branches.

577-LOC test file tripped the +500 added-LOC size gate. classify + validate coverage moves to a sibling file; the original keeps the DFS and edge-case suite. each file is now under 500 LOC.

oss-pr-review-agent-shin · 2026-05-16T20:51:39Z

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 0/5 ❌

Why blocked:

all Phase B agent checks non-approving (phase_b_none_approved, -5 pts)

Details: Score docked for: all Phase B agent checks non-approving (karpathy + security + coverage gap).

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

wtfashwin · 2026-05-16T20:56:56Z

@krrishdholakia @ishaan-jaff — ready for a look when you have a moment.

Where it stands:

All 43 CI checks green on d0951cb
Greptile round 2 cleared at 4/5; the single P1 it raised (update_access_group destroying nested edges on the model_ids path) is fixed in ac807e2 with two regression tests
63 unit tests across the 4 new test files, all green
Size gate cleared by splitting the test file (d0951cb) — every file now under 500 LOC

litellm-agent is sitting at 0/5 after the size gate cleared, with no per-agent detail on what Phase B caught (karpathy / security / coverage). For what it's worth, the codecov "missing lines" all sit in files this PR doesn't touch (mcp_server/server.py, prometheus.py, gemini transformation, user_api_key_auth_mcp.py) — pre-existing gaps in main that get attributed to anyone whose diff lands at the same time. Happy to address concrete feedback if the bot can surface specifics, but flagging in case it's a beta hiccup.

PR scope recap: hierarchical access groups via new LiteLLM_AccessGroupMembership join table + iterative DFS resolver with on-path cycle tracking + 60s TTL cache on the hot read path. Backwards compatible — callers without the new group_memberships kwarg behave identically to today.

shin-berri and others added 9 commits May 13, 2026 22:37

Merge pull request BerriAI#27906 from BerriAI/litellm_internal_staging

e58a561

[Infra] Promote internal staging to main

feat(proxy): add LiteLLM_AccessGroupMembership table

92729d8

Holds parent/child edges between access groups so a group can include another group's models. Schema + migration only; resolution next. Refs BerriAI#28032

wtfashwin added 2 commits May 17, 2026 00:37

wtfashwin marked this pull request as ready for review May 16, 2026 19:31

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

wtfashwin force-pushed the feat/nested-access-groups branch from 4f1252c to c5e2c6c Compare May 16, 2026 19:58

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

Comment thread litellm/proxy/management_endpoints/model_access_group_management_endpoints.py Outdated

wtfashwin added 2 commits May 17, 2026 01:50

test(proxy): split nested-access-groups test file under size gate

d0951cb

577-LOC test file tripped the +500 added-LOC size gate. classify + validate coverage moves to a sibling file; the original keeps the DFS and edge-case suite. each file is now under 500 LOC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(proxy): nested model access groups with cycle-safe resolution (#28032)#28075

feat(proxy): nested model access groups with cycle-safe resolution (#28032)#28075
wtfashwin wants to merge 15 commits into
BerriAI:litellm_internal_stagingfrom
wtfashwin:feat/nested-access-groups

wtfashwin commented May 16, 2026

Uh oh!

CLAassistant commented May 16, 2026

Uh oh!

codecov Bot commented May 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 16, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wtfashwin commented May 16, 2026

Uh oh!

Uh oh!

oss-pr-review-agent-shin Bot commented May 16, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 16, 2026

Uh oh!

wtfashwin commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

wtfashwin commented May 16, 2026

What

Design

Storage

Resolution

Write path

Request-path activation

AccessGroupInfo API

Tests

Local checks

Backwards compatibility

Out of scope (deliberate)

Checklist

Uh oh!

CLAassistant commented May 16, 2026

Uh oh!

codecov Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wtfashwin commented May 16, 2026

Uh oh!

Uh oh!

oss-pr-review-agent-shin Bot commented May 16, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 16, 2026

Uh oh!

wtfashwin commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`AccessGroupInfo` API

codecov Bot commented May 16, 2026 •

edited

Loading

greptile-apps Bot commented May 16, 2026 •

edited

Loading