Atlas is a system for continual learning from agent workflows. This repository is the runtime component—it wraps existing agents, captures execution traces with reward signals, and exports structured data for training. Atlas Core is the training component—it runs GRPO, GKD, and SFT on those exports to produce improved teacher checkpoints. Together they form a closed loop: the runtime generates training data from agent execution, Core trains better models from that data, you deploy updated checkpoints back into the runtime.
Atlas separates runtime orchestration from offline training. This repository handles data collection, Core handles model improvement.
Runtime (Atlas SDK):
- Wraps your agent (OpenAI, Claude, Gemini, local models, custom implementations) in a dual-agent loop where Student executes and Teacher supervises
- Routes tasks to auto/paired/coach supervision lanes based on a capability probe that assesses difficulty and confidence
- Captures execution traces: plans, attempts, interventions, rewards at step and session granularity
- Stores telemetry in Postgres with review gates for approved sessions
Training (Atlas Core):
- Reads runtime data directly from Postgres via
atlas/training_data/ - Trains teacher models using GRPO (RL from rewards), GKD (distillation), or SFT (supervised fine-tuning)
- Shares reward infrastructure with the runtime so scoring is consistent across data collection and training
- Produces checkpoints that deploy back into the SDK
The training algorithm itself—GRPO is a single equation over logprobs—is straightforward. The challenge is infrastructure: collecting clean training data from multi-turn agent execution with proper reward attribution, adaptive supervision, and export guardrails. That's what this SDK does.
If you're experimenting with RL for LLM agents, you need training data that captures more than prompt/completion pairs. You need execution traces showing where reasoning failed, how supervision corrected it, and which strategies worked. You need rewards attributed to specific steps so GRPO can learn what actions improve outcomes. You need this data exportable with review workflows so bad episodes don't poison training datasets.
Building that infrastructure means solving:
- Multi-turn orchestration with tool calls and state management
- Adaptive supervision routing (when to guide vs. when to let the agent run)
- Reward attribution across process quality, outcome correctness, and efficiency
- Telemetry persistence with plan/step/trajectory granularity
- Export guardrails with approval gates and drift detection
The SDK implements that infrastructure so you can focus on training experiments. See examples/mcp_tool_learning/ for a working integration with LangGraph agents demonstrating progressive learning across 25 file operation tasks.
- Automated Configuration Discovery –
atlas env initscans your codebase for agent classes and tool schemas, generates runtime config, and synthesizes adapter factories when needed. See Configuration Guide for details. - Adaptive Supervision Routing – Capability probe routes tasks to auto/paired/coach lanes based on difficulty and confidence, reducing supervision overhead as models improve on specific task types.
- Reward Attribution – Small/large judge pairs score process quality, outcome correctness, and efficiency at step and session granularity. Reward infrastructure is shared with Atlas Core for scoring consistency.
- Observability and Telemetry – Runtime sessions stream to Postgres with plan structures, execution traces, and reward payloads. Learning reports (
scripts/report_learning.py) filter by project/task/tags and break down performance metrics. - Export Guardrails – Session exports default to approved-only with CLI review workflow and drift alerts. Prevents problematic episodes from entering training datasets.
- Direct Training Integration –
atlas trainexports sessions and launches Atlas Core training with Hydra config overrides, closing the runtime→training loop in one command.
Note: Use Python 3.10 or newer before installing. Pip on older interpreters (e.g., 3.9) resolves
arc-atlas0.1.0 and the runtime crashes at import time.
pip install arc-atlas
export ANTHROPIC_API_KEY=sk-ant-... # Or your preferred provider
atlas env init
atlas run --config .atlas/generated_config.yaml --task "Your task here"What happens:
- Install – Install the SDK from PyPI
- Autodiscovery –
atlas env initintelligently discovers your agent, configures Anthropic models (Claude 4.5 Haiku + Sonnet), enables learning features (few-shot + playbook), and optionally sets up PostgreSQL storage via Docker—all automatically with LLM-driven inference. - Run –
atlas runexecutes your agent in the dual-agent loop (Student/Teacher), tracks rewards, generates learning playbooks, and saves traces to PostgreSQL.
The generated config (.atlas/generated_config.yaml) uses production-ready defaults based on runtime evaluation benchmarks:
- Student: Claude Haiku 4.5 (claude-haiku-4-5-20251001) - fast, cost-effective
- Teacher: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) - powerful, accurate
- Learning: Few-shot prompting + playbook injection enabled by default
- Performance: 0.989 reward score, 20.08s average latency
See Autodiscovery Guide and Configuration Guide for customization.
- Python 3.10+ (3.13 recommended)
ANTHROPIC_API_KEYexported or in.env(for default config)- Docker installed (optional, for automated PostgreSQL setup)
Custom Providers:
While the default configuration uses Anthropic models for optimal performance, you can customize to use any supported provider (OpenAI, Google, Gemini, xAI, Bedrock) by editing .atlas/generated_config.yaml after initialization.
For a hands-on demonstration of Atlas learning capabilities:
atlas quickstartThis runs 3 security review tasks showing learning progression. See Quickstart Guide for detailed usage.
examples/mcp_tool_learning/- MCP tool learning with LangGraph agents, demonstrating progressive learning across 25 file operation tasksatlas quickstart- Runs 3 security review tasks showing learning progression (Quickstart Guide)
Configuration:
- Configuration Guide - Student/teacher/reward system configuration, learning tuning, adaptive teaching
- docs.arc.computer - Full reference including orchestration details and training recipes
Evaluation:
- Learning Evaluation - Transfer learning metrics, baseline comparison, evaluation harness
- Runtime Evaluation - Dual-agent runtime benchmarking and performance analysis
- Reward Evaluation - Judge scoring matrices and reward model validation
- Probe Evaluation - Capability probe accuracy and supervision routing analysis
Operations:
- Export Guardrails - Session review, approval workflow, drift detection
Video: Installation and Configuration Walkthrough
1. core.run() # load config, adapter, execution context
2. Student planner creates plan # Bring-Your-Own-Agent bridge composes dependency-aware steps
3. Teacher validator reviews # ensures tooling, dependencies, and risks are handled
4. Capability probe selects supervision lane # routes to auto, paired, or coach based on confidence
5. Orchestrator.arun() # executes steps, applies guidance, records telemetry
6. Evaluator.ajudge() # aggregates reward signals (process/helpfulness/custom)
7. Database.log_*() # stores plans, attempts, trajectory events in Postgres
8. Review + export guards # reward stats + drift alerts gate training exports until approved
Configuration files live in configs/examples/. Each YAML document is validated against atlas.config.models.AtlasConfig.
Quick reference of configuration sections:
| Section | Purpose |
|---|---|
agent |
Adapter settings (endpoint, Python import path, model) and tool schemas |
student |
Prompts and limits for Student persona (planner, executor, synthesizer) |
teacher |
Teacher persona settings (LLM config, cache behavior, prompts) |
orchestration |
Retry policy, per-step timeout, trajectory emission |
rim |
Judge models, weights, aggregation strategy, thresholds |
adaptive_teaching |
Capability probe, supervision lane thresholds, learning history |
storage |
PostgreSQL connection info for persistence |
See the Configuration Guide for detailed tuning options including learning synthesis, reward system configuration, and adaptive teaching parameters.
Training workflows require persistent storage to capture reward signals and execution traces. The runtime uses PostgreSQL for persistence.
Setup:
# Option 1: Local Postgres via Docker
atlas init # Starts bundled Docker + Postgres on localhost:5433
# Option 2: Use your own Postgres instance (add to config.yaml)
storage:
database_url: postgresql://user:pass@host:port/databaseOnce storage is configured, runtime sessions stream to the database automatically. Atlas Core accesses this data directly:
from atlas.training_data import get_training_sessions
sessions = get_training_sessions(
db_url="postgresql://atlas:atlas@localhost:5433/atlas",
min_reward=0.7,
review_status_filters=["approved"],
limit=100
)Optional: JSONL export
For offline workflows or external tools, export sessions to JSONL:
arc-atlas \
--database-url postgresql://atlas:atlas@localhost:5433/atlas \
--include-status approved \
--output traces.jsonl \
--limit 100Each line is an AtlasSessionTrace with plans, steps, rewards, and metadata. See docs/examples/export_runtime_traces.md for details.
Once you've collected runtime traces, use Atlas Core to train updated teacher models.
Training Methods:
- GRPO - Reinforcement learning from reward signals (Guide)
- GKD - 9-30x faster distillation for production models (Guide)
- SFT - Supervised fine-tuning on approved traces
Quick Start:
# Option 1: Direct database access (recommended)
export STORAGE__DATABASE_URL=postgresql://atlas:atlas@localhost:5433/atlas
export ATLAS_CORE_PATH=~/src/ATLAS
atlas train \
--data-config runtime_pg \
--trainer-config grpo \
--wandb-project atlas-runtime \
--override trainer.max_steps=250
# Option 2: Export to JSONL first
arc-atlas --database-url postgresql://... --output traces.jsonl
cd $ATLAS_CORE_PATH
pip install -e .
atlas-core offline-pipeline --export-path traces.jsonlDeploying Trained Models:
After training, update your SDK config to use the improved teacher:
# config.yaml - HuggingFace Inference Endpoint
teacher:
llm:
provider: openai # HF inference is OpenAI-compatible
model: your-org/atlas-teacher-v1
api_base: https://api-inference.huggingface.co/models/your-org/atlas-teacher-v1
api_key_env: HUGGING_FACE_HUB_TOKEN
temperature: 0.05
# config.yaml - Local inference server (vLLM/TGI)
teacher:
llm:
provider: openai # Most local servers are OpenAI-compatible
model: your-org/atlas-teacher-v1
api_base: http://localhost:8000/v1
api_key_env: VLLM_API_KEY # Dummy key if server doesn't require auth
temperature: 0.05Run agents with the improved teacher to collect better training data, creating a continual learning loop.
Comprehensive Guides:
- Complete Training Pipeline - Step-by-step SFT → GRPO workflow
- Training Configuration - Hydra parameters reference
- Training Data Pipeline - Direct database access API
PYTHONPATH=. pytest tests --disable-warningsThe test suite covers dependency parsing, prompt rewriting, student/teacher orchestration, reward system aggregation, adapter bridges, and database logging. Most tests rely on locally mocked adapters, so no external network calls occur.
For evaluation harnesses (runtime, reward, learning, probe), see the Evaluation documentation above.
- Python 3.10+ (project is developed and validated with 3.13).
- Development extras (
pip install -e .[dev]) install pytest tooling for local validation; core telemetry streams rely solely on the standard library. - Reactive stream helpers live under
atlas/utils/reactive/; SPDX headers are retained and must remain intact. - Aim for descriptive naming and concise docstrings so the intent is evident without extra commentary.
# Install with dev dependencies
pip install -e .[dev]
# Run tests
PYTHONPATH=. pytest tests --disable-warnings
# Format and lint
ruff check .
ruff format .
# Type checking (if pyright is installed)
pyrightFor evaluation harnesses (runtime, reward, learning, probe), see the Evaluation documentation above.
- Fork and clone the repository.
- Use the provided
pyproject.tomlextras to install development dependencies. - Review existing modules before coding and keep commits focused and incremental to match the current style.
- Add or update unit tests alongside feature changes.
Pull requests should include updated documentation or examples when behaviour changes.
Atlas SDK is released under the Apache 2.0 license. See LICENSE for full details. Vendored NeMo components retain their original licensing notices.
Need more depth or end-to-end walkthroughs? Everything in this README is covered—and expanded—at docs.arc.computer.

