Add vLLM‑style Runtime Metrics (Inference + Training) with Opt‑In Telemetry by hnxnq7 · Pull Request #3897 · unslothai/unsloth

hnxnq7 · 2026-01-16T01:32:13Z

Add vLLM-style Runtime Metrics for Inference & Training (with Optional Telemetry)

This PR adds an opt-in runtime metrics system to Unsloth, inspired by vLLM’s metrics architecture, with optional Prometheus export and optional server-side telemetry forwarding.

What this enables

Inference metrics: request counts, token counts, throughput, latency histograms (E2E, prefill, decode)
Training metrics: steps, samples/sec, loss, LR, gradient norm, forward/backward timing
Prometheus support (optional) with /metrics HTTP endpoint
Programmatic access to metrics (no server required)
Optional telemetry forwarding of aggregated metrics to Unsloth servers

How it works

Metrics are disabled by default
Calling enable_prometheus_metrics() automatically instruments:
- unsloth_base_fast_generate() (inference)
- Trainer.training_step() via a patch hook (training)
Telemetry forwarding is opt-in and non-blocking
- Enabled via UNSLOTH_ENABLE_METRICS_TELEMETRY=1 or enable_telemetry()
- Can be disabled via UNSLOTH_DISABLE_METRICS_TELEMETRY=1
No user code changes required beyond enabling metrics

Key design points

Fully opt-in, no breaking changes
Graceful degradation if prometheus_client is not installed
Lightweight + low overhead
Inspired by vLLM’s metrics model, adapted to Transformers-based pipelines
Thread-safe singleton pattern
Handles ModelOutput objects when return_dict_in_generate=True
Telemetry sends aggregated stats only (counts / averages, no raw prompts or user data)

Files changed (13 files)

New module: unsloth/metrics/ (6 files)
- stats.py – Core statistics tracking (InferenceStats, TrainingStats, StatsCollector)
- prometheus.py – Prometheus export with Counter / Gauge / Histogram metrics
- server.py – Optional HTTP server for metrics scraping
- telemetry.py – Optional background telemetry sender (aggregated stats only)
- README.md – Documentation
Training hook: _patch_training_metrics() in unsloth/models/_utils.py
Inference hook: unsloth_base_fast_generate() in unsloth/models/vision.py
Public API exports: via unsloth/__init__.py
Tests: tests/metrics/test_metrics_standalone.py (all passing)
Dependencies: pyproject.toml (optional prometheus_client)

Quick usage

from unsloth import enable_prometheus_metrics, get_stats_collector

enable_prometheus_metrics()

# run inference / training as usual

stats = get_stats_collector().get_all_stats()
print(stats["inference"])  # request counts, latencies, tokens/sec
print(stats["training"])   # steps, loss, samples/sec

Notes

Telemetry is opt-in by default (can be flipped easily if preferred)
Uses a background sender (non-blocking, silent failures)
Endpoint is configurable via UNSLOTH_METRICS_TELEMETRY_ENDPOINT
Current default endpoint is a placeholder pending server-side confirmation

Testing

Kaggle smoke test confirming metrics collection + non-blocking telemetry forwarding:
https://www.kaggle.com/code/hnxnq07/metrics-telemetry-smoketest

No breaking changes. Purely additive.

- Comprehensive metrics collection system (inference + training) - Prometheus-compatible export with optional HTTP server - Programmatic access to metrics - Automatic instrumentation of inference and training loops - All tests passing

- Resolved merge conflict in vision.py (kept ModelOutput handling) - Added documentation notes about estimated vs measured metrics - Improved finish_reason detection - All syntax errors resolved

for more information, see https://pre-commit.ci

gemini-code-assist · 2026-01-16T01:32:38Z

Summary of Changes

Hello @hnxnq7, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a sophisticated, opt-in runtime metrics system into Unsloth, drawing inspiration from vLLM's architecture. It enables users to gain deep insights into the performance of their inference and training workloads by automatically collecting a wide array of statistics, including latencies, token counts, throughput, loss, and learning rates. The system offers flexible access to these metrics through programmatic APIs, optional Prometheus export with a dedicated HTTP server, and a privacy-conscious, opt-in telemetry mechanism for aggregated data. This enhancement empowers users with better monitoring and optimization capabilities for their Unsloth-powered applications.

Highlights

Comprehensive Runtime Metrics: Introduces an opt-in system for collecting detailed inference and training performance metrics, inspired by vLLM.
Inference & Training Data: Tracks key metrics such as request counts, token throughput, E2E/prefill/decode latencies for inference, and steps, samples/sec, loss, LR, and gradient norm for training.
Prometheus Integration: Provides optional Prometheus-compatible metrics export via standard Counter, Gauge, and Histogram types, including an optional HTTP server with a '/metrics' endpoint.
Opt-in Telemetry: Implements an optional, non-blocking background telemetry sender for aggregated statistics to Unsloth servers, configurable via environment variables.
Automatic Instrumentation: Automatically hooks into 'unsloth_base_fast_generate()' for inference and 'Trainer.training_step()' for training, requiring minimal user code changes to enable.
Robust Design: Features a lightweight, low-overhead, thread-safe singleton pattern, graceful degradation if 'prometheus_client' is absent, and no breaking changes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive, opt-in runtime metrics system for both inference and training, inspired by vLLM's architecture. The changes are well-structured into a new unsloth/metrics module, providing features like Prometheus export, an optional HTTP server, and non-blocking telemetry. The implementation is robust, with graceful degradation for optional dependencies and careful patching to instrument the training and inference pipelines.

My review focuses on improving maintainability and fixing a potential bug in the metrics collection logic. I've identified a dependency on a private API in the Prometheus integration and a confusing and potentially buggy section for determining the finish_reason in inference metrics. Overall, this is an excellent and well-documented feature addition.

unsloth/models/vision.py

gemini-code-assist · 2026-01-16T01:34:31Z

unsloth/metrics/prometheus.py

+def _get_existing_collector(metric_name: str):
+    if REGISTRY is None:
+        return None
+    return REGISTRY._names_to_collectors.get(metric_name)  # type: ignore[attr-defined]


Accessing the internal _names_to_collectors attribute of the Prometheus registry is a bit fragile, as it's not part of the public API and could change in future versions of prometheus-client, potentially breaking this code. While this is a common and pragmatic workaround to handle metric re-registration issues in environments like Jupyter notebooks, it's worth being aware of the maintainability risk for future library updates.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c41a8713a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

unsloth/models/vision.py

unsloth/metrics/stats.py

for more information, see https://pre-commit.ci

Rachel Li and others added 14 commits January 15, 2026 17:07

Fix merge conflict and add documentation about metric estimations

40da40e

- Resolved merge conflict in vision.py (kept ModelOutput handling) - Added documentation notes about estimated vs measured metrics - Improved finish_reason detection - All syntax errors resolved

[pre-commit.ci] auto fixes from pre-commit.com hooks

0b38f27

for more information, see https://pre-commit.ci

Update _utils.py

a413eaa

Update pyproject.toml

fd9f5ca

Update README.md

f5b6b8a

Add opt-in telemetry forwarding for metrics

f2bd62c

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1d61eb

for more information, see https://pre-commit.ci

Avoid duplicate Prometheus metric registration

ab33a55

Reuse existing Prometheus collectors if present

60615db

[pre-commit.ci] auto fixes from pre-commit.com hooks

1993e3d

for more information, see https://pre-commit.ci

Make Prometheus init robust to duplicates

73165df

Add one-time server opt-out polling for telemetry

d9ad476

Merge branch 'unslothai:main' into metrics-collection-clean

6c41a87

gemini-code-assist bot reviewed Jan 16, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Jan 16, 2026

View reviewed changes

unsloth/models/vision.py Outdated Show resolved Hide resolved

unsloth/metrics/stats.py Show resolved Hide resolved

Rachel Li and others added 2 commits January 15, 2026 17:40

Fix prompt token scaling and preserve first token time

e6a11b8

[pre-commit.ci] auto fixes from pre-commit.com hooks

fbf7ff7

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vLLM‑style Runtime Metrics (Inference + Training) with Opt‑In Telemetry#3897

Add vLLM‑style Runtime Metrics (Inference + Training) with Opt‑In Telemetry#3897
hnxnq7 wants to merge 16 commits intounslothai:mainfrom
hnxnq7:metrics-collection-clean

hnxnq7 commented Jan 16, 2026

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 16, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hnxnq7 commented Jan 16, 2026

Add vLLM-style Runtime Metrics for Inference & Training (with Optional Telemetry)

What this enables

How it works

Key design points

Files changed (13 files)

Quick usage

Notes

Testing

Uh oh!

gemini-code-assist bot commented Jan 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants