Atlas SDK

Atlas is a system for continual learning from agent workflows. This repository is the runtime component—it wraps existing agents, captures execution traces with reward signals, and exports structured data for training. Atlas Core is the training component—it runs GRPO, GKD, and SFT on those exports to produce improved teacher checkpoints. Together they form a closed loop: the runtime generates training data from agent execution, Core trains better models from that data, you deploy updated checkpoints back into the runtime.

How the System Works

Atlas separates runtime orchestration from offline training. This repository handles data collection, Core handles model improvement.

Runtime (Atlas SDK):

Wraps your agent (OpenAI, Claude, Gemini, local models, custom implementations) in a dual-agent loop where Student executes and Teacher supervises
Routes tasks to auto/paired/coach supervision lanes based on a capability probe that assesses difficulty and confidence
Captures execution traces: plans, attempts, interventions, rewards at step and session granularity
Stores telemetry in Postgres with review gates for approved sessions

Training (Atlas Core):

Reads runtime data directly from Postgres via atlas/training_data/
Trains teacher models using GRPO (RL from rewards), GKD (distillation), or SFT (supervised fine-tuning)
Shares reward infrastructure with the runtime so scoring is consistent across data collection and training
Produces checkpoints that deploy back into the SDK

The training algorithm itself—GRPO is a single equation over logprobs—is straightforward. The challenge is infrastructure: collecting clean training data from multi-turn agent execution with proper reward attribution, adaptive supervision, and export guardrails. That's what this SDK does.

What Problem This Solves

If you're experimenting with RL for LLM agents, you need training data that captures more than prompt/completion pairs. You need execution traces showing where reasoning failed, how supervision corrected it, and which strategies worked. You need rewards attributed to specific steps so GRPO can learn what actions improve outcomes. You need this data exportable with review workflows so bad episodes don't poison training datasets.

Building that infrastructure means solving:

Multi-turn orchestration with tool calls and state management
Adaptive supervision routing (when to guide vs. when to let the agent run)
Reward attribution across process quality, outcome correctness, and efficiency
Telemetry persistence with plan/step/trajectory granularity
Export guardrails with approval gates and drift detection

The SDK implements that infrastructure so you can focus on training experiments. See examples/mcp_tool_learning/ for a working integration with LangGraph agents demonstrating progressive learning across 25 file operation tasks.

Runtime Features

Automated Configuration Discovery – atlas env init scans your codebase for agent classes and tool schemas, generates runtime config, and synthesizes adapter factories when needed. See Configuration Guide for details.
Adaptive Supervision Routing – Capability probe routes tasks to auto/paired/coach lanes based on difficulty and confidence, reducing supervision overhead as models improve on specific task types.
Reward Attribution – Small/large judge pairs score process quality, outcome correctness, and efficiency at step and session granularity. Reward infrastructure is shared with Atlas Core for scoring consistency.
Observability and Telemetry – Runtime sessions stream to Postgres with plan structures, execution traces, and reward payloads. Learning reports (scripts/report_learning.py) filter by project/task/tags and break down performance metrics.
Export Guardrails – Session exports default to approved-only with CLI review workflow and drift alerts. Prevents problematic episodes from entering training datasets.
Direct Training Integration – atlas train exports sessions and launches Atlas Core training with Hydra config overrides, closing the runtime→training loop in one command.

Quickstart

Note: Use Python 3.10 or newer before installing. Pip on older interpreters (e.g., 3.9) resolves arc-atlas 0.1.0 and the runtime crashes at import time.

pip install arc-atlas
export ANTHROPIC_API_KEY=sk-ant-...  # Or your preferred provider
atlas env init
atlas run --config .atlas/generated_config.yaml --task "Your task here"

What happens:

Install – Install the SDK from PyPI
Autodiscovery – atlas env init intelligently discovers your agent, configures Anthropic models (Claude 4.5 Haiku + Sonnet), enables learning features (few-shot + playbook), and optionally sets up PostgreSQL storage via Docker—all automatically with LLM-driven inference.
Run – atlas run executes your agent in the dual-agent loop (Student/Teacher), tracks rewards, generates learning playbooks, and saves traces to PostgreSQL.

The generated config (.atlas/generated_config.yaml) uses production-ready defaults based on runtime evaluation benchmarks:

Student: Claude Haiku 4.5 (claude-haiku-4-5-20251001) - fast, cost-effective
Teacher: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) - powerful, accurate
Learning: Few-shot prompting + playbook injection enabled by default
Performance: 0.989 reward score, 20.08s average latency

See Autodiscovery Guide and Configuration Guide for customization.

Prerequisites

Python 3.10+ (3.13 recommended)
ANTHROPIC_API_KEY exported or in .env (for default config)
Docker installed (optional, for automated PostgreSQL setup)

Custom Providers:

While the default configuration uses Anthropic models for optimal performance, you can customize to use any supported provider (OpenAI, Google, Gemini, xAI, Bedrock) by editing .atlas/generated_config.yaml after initialization.

Try the Quickstart Demo

For a hands-on demonstration of Atlas learning capabilities:

atlas quickstart

This runs 3 security review tasks showing learning progression. See Quickstart Guide for detailed usage.

Examples

examples/mcp_tool_learning/ - MCP tool learning with LangGraph agents, demonstrating progressive learning across 25 file operation tasks
atlas quickstart - Runs 3 security review tasks showing learning progression (Quickstart Guide)

Documentation

Configuration:

Configuration Guide - Student/teacher/reward system configuration, learning tuning, adaptive teaching
docs.arc.computer - Full reference including orchestration details and training recipes

Evaluation:

Learning Evaluation - Transfer learning metrics, baseline comparison, evaluation harness
Runtime Evaluation - Dual-agent runtime benchmarking and performance analysis
Reward Evaluation - Judge scoring matrices and reward model validation
Probe Evaluation - Capability probe accuracy and supervision routing analysis

Operations:

Export Guardrails - Session review, approval workflow, drift detection

Video: Installation and Configuration Walkthrough

Architecture

1. core.run()                 # load config, adapter, execution context
2. Student planner creates plan  # Bring-Your-Own-Agent bridge composes dependency-aware steps
3. Teacher validator reviews     # ensures tooling, dependencies, and risks are handled
4. Capability probe selects supervision lane  # routes to auto, paired, or coach based on confidence
5. Orchestrator.arun()        # executes steps, applies guidance, records telemetry
6. Evaluator.ajudge()         # aggregates reward signals (process/helpfulness/custom)
7. Database.log_*()           # stores plans, attempts, trajectory events in Postgres
8. Review + export guards     # reward stats + drift alerts gate training exports until approved

Configuration

Configuration files live in configs/examples/. Each YAML document is validated against atlas.config.models.AtlasConfig.

Quick reference of configuration sections:

Section	Purpose
`agent`	Adapter settings (endpoint, Python import path, model) and tool schemas
`student`	Prompts and limits for Student persona (planner, executor, synthesizer)
`teacher`	Teacher persona settings (LLM config, cache behavior, prompts)
`orchestration`	Retry policy, per-step timeout, trajectory emission
`rim`	Judge models, weights, aggregation strategy, thresholds
`adaptive_teaching`	Capability probe, supervision lane thresholds, learning history
`storage`	PostgreSQL connection info for persistence

See the Configuration Guide for detailed tuning options including learning synthesis, reward system configuration, and adaptive teaching parameters.

Training Data Access

Training workflows require persistent storage to capture reward signals and execution traces. The runtime uses PostgreSQL for persistence.

Setup:

# Option 1: Local Postgres via Docker
atlas init  # Starts bundled Docker + Postgres on localhost:5433

# Option 2: Use your own Postgres instance (add to config.yaml)
storage:
  database_url: postgresql://user:pass@host:port/database

Once storage is configured, runtime sessions stream to the database automatically. Atlas Core accesses this data directly:

from atlas.training_data import get_training_sessions

sessions = get_training_sessions(
    db_url="postgresql://atlas:atlas@localhost:5433/atlas",
    min_reward=0.7,
    review_status_filters=["approved"],
    limit=100
)

Optional: JSONL export

For offline workflows or external tools, export sessions to JSONL:

arc-atlas \
  --database-url postgresql://atlas:atlas@localhost:5433/atlas \
  --include-status approved \
  --output traces.jsonl \
  --limit 100

Each line is an AtlasSessionTrace with plans, steps, rewards, and metadata. See docs/examples/export_runtime_traces.md for details.

Training Your Model

Once you've collected runtime traces, use Atlas Core to train updated teacher models.

Training Methods:

GRPO - Reinforcement learning from reward signals (Guide)
GKD - 9-30x faster distillation for production models (Guide)
SFT - Supervised fine-tuning on approved traces

Quick Start:

# Option 1: Direct database access (recommended)
export STORAGE__DATABASE_URL=postgresql://atlas:atlas@localhost:5433/atlas
export ATLAS_CORE_PATH=~/src/ATLAS

atlas train \
  --data-config runtime_pg \
  --trainer-config grpo \
  --wandb-project atlas-runtime \
  --override trainer.max_steps=250

# Option 2: Export to JSONL first
arc-atlas --database-url postgresql://... --output traces.jsonl
cd $ATLAS_CORE_PATH
pip install -e .
atlas-core offline-pipeline --export-path traces.jsonl

Deploying Trained Models:

After training, update your SDK config to use the improved teacher:

# config.yaml - HuggingFace Inference Endpoint
teacher:
  llm:
    provider: openai  # HF inference is OpenAI-compatible
    model: your-org/atlas-teacher-v1
    api_base: https://api-inference.huggingface.co/models/your-org/atlas-teacher-v1
    api_key_env: HUGGING_FACE_HUB_TOKEN
    temperature: 0.05

# config.yaml - Local inference server (vLLM/TGI)
teacher:
  llm:
    provider: openai  # Most local servers are OpenAI-compatible
    model: your-org/atlas-teacher-v1
    api_base: http://localhost:8000/v1
    api_key_env: VLLM_API_KEY  # Dummy key if server doesn't require auth
    temperature: 0.05

Run agents with the improved teacher to collect better training data, creating a continual learning loop.

Comprehensive Guides:

Complete Training Pipeline - Step-by-step SFT → GRPO workflow
Training Configuration - Hydra parameters reference
Training Data Pipeline - Direct database access API

Testing

PYTHONPATH=. pytest tests --disable-warnings

The test suite covers dependency parsing, prompt rewriting, student/teacher orchestration, reward system aggregation, adapter bridges, and database logging. Most tests rely on locally mocked adapters, so no external network calls occur.

For evaluation harnesses (runtime, reward, learning, probe), see the Evaluation documentation above.

Requirements & Notes

Python 3.10+ (project is developed and validated with 3.13).
Development extras (pip install -e .[dev]) install pytest tooling for local validation; core telemetry streams rely solely on the standard library.
Reactive stream helpers live under atlas/utils/reactive/; SPDX headers are retained and must remain intact.
Aim for descriptive naming and concise docstrings so the intent is evident without extra commentary.

Development

# Install with dev dependencies
pip install -e .[dev]

# Run tests
PYTHONPATH=. pytest tests --disable-warnings

# Format and lint
ruff check .
ruff format .

# Type checking (if pyright is installed)
pyright

For evaluation harnesses (runtime, reward, learning, probe), see the Evaluation documentation above.

Contributing

Fork and clone the repository.
Use the provided pyproject.toml extras to install development dependencies.
Review existing modules before coding and keep commits focused and incremental to match the current style.
Add or update unit tests alongside feature changes.

Pull requests should include updated documentation or examples when behaviour changes.

License

Atlas SDK is released under the Apache 2.0 license. See LICENSE for full details. Vendored NeMo components retain their original licensing notices.

Need more depth or end-to-end walkthroughs? Everything in this README is covered—and expanded—at docs.arc.computer.

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
.github/workflows		.github/workflows
atlas		atlas
configs		configs
data		data
docker		docker
docs		docs
examples		examples
public		public
scripts		scripts
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas SDK

How the System Works

What Problem This Solves

Runtime Features

Quickstart

Prerequisites

Try the Quickstart Demo

Examples

Documentation

Architecture

Configuration

Training Data Access

Training Your Model

Testing

Requirements & Notes

Development

Contributing

License

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Arc-Computer/atlas-sdk

Folders and files

Latest commit

History

Repository files navigation

Atlas SDK

How the System Works

What Problem This Solves

Runtime Features

Quickstart

Prerequisites

Try the Quickstart Demo

Examples

Documentation

Architecture

Configuration

Training Data Access

Training Your Model

Testing

Requirements & Notes

Development

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages