Add vLLM‑style Runtime Metrics (Inference + Training) with Opt‑In Telemetry#3897
Add vLLM‑style Runtime Metrics (Inference + Training) with Opt‑In Telemetry#3897hnxnq7 wants to merge 16 commits intounslothai:mainfrom
Conversation
- Comprehensive metrics collection system (inference + training) - Prometheus-compatible export with optional HTTP server - Programmatic access to metrics - Automatic instrumentation of inference and training loops - All tests passing
- Resolved merge conflict in vision.py (kept ModelOutput handling) - Added documentation notes about estimated vs measured metrics - Improved finish_reason detection - All syntax errors resolved
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Summary of ChangesHello @hnxnq7, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates a sophisticated, opt-in runtime metrics system into Unsloth, drawing inspiration from vLLM's architecture. It enables users to gain deep insights into the performance of their inference and training workloads by automatically collecting a wide array of statistics, including latencies, token counts, throughput, loss, and learning rates. The system offers flexible access to these metrics through programmatic APIs, optional Prometheus export with a dedicated HTTP server, and a privacy-conscious, opt-in telemetry mechanism for aggregated data. This enhancement empowers users with better monitoring and optimization capabilities for their Unsloth-powered applications. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive, opt-in runtime metrics system for both inference and training, inspired by vLLM's architecture. The changes are well-structured into a new unsloth/metrics module, providing features like Prometheus export, an optional HTTP server, and non-blocking telemetry. The implementation is robust, with graceful degradation for optional dependencies and careful patching to instrument the training and inference pipelines.
My review focuses on improving maintainability and fixing a potential bug in the metrics collection logic. I've identified a dependency on a private API in the Prometheus integration and a confusing and potentially buggy section for determining the finish_reason in inference metrics. Overall, this is an excellent and well-documented feature addition.
| def _get_existing_collector(metric_name: str): | ||
| if REGISTRY is None: | ||
| return None | ||
| return REGISTRY._names_to_collectors.get(metric_name) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Accessing the internal _names_to_collectors attribute of the Prometheus registry is a bit fragile, as it's not part of the public API and could change in future versions of prometheus-client, potentially breaking this code. While this is a common and pragmatic workaround to handle metric re-registration issues in environments like Jupyter notebooks, it's worth being aware of the maintainability risk for future library updates.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c41a8713a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Add vLLM-style Runtime Metrics for Inference & Training (with Optional Telemetry)
This PR adds an opt-in runtime metrics system to Unsloth, inspired by vLLM’s metrics architecture, with optional Prometheus export and optional server-side telemetry forwarding.
What this enables
/metricsHTTP endpointHow it works
enable_prometheus_metrics()automatically instruments:unsloth_base_fast_generate()(inference)Trainer.training_step()via a patch hook (training)UNSLOTH_ENABLE_METRICS_TELEMETRY=1orenable_telemetry()UNSLOTH_DISABLE_METRICS_TELEMETRY=1Key design points
prometheus_clientis not installedModelOutputobjects whenreturn_dict_in_generate=TrueFiles changed (13 files)
unsloth/metrics/(6 files)stats.py– Core statistics tracking (InferenceStats,TrainingStats,StatsCollector)prometheus.py– Prometheus export with Counter / Gauge / Histogram metricsserver.py– Optional HTTP server for metrics scrapingtelemetry.py– Optional background telemetry sender (aggregated stats only)README.md– Documentation_patch_training_metrics()inunsloth/models/_utils.pyunsloth_base_fast_generate()inunsloth/models/vision.pyunsloth/__init__.pytests/metrics/test_metrics_standalone.py(all passing)pyproject.toml(optionalprometheus_client)Quick usage
Notes
Testing
https://www.kaggle.com/code/hnxnq07/metrics-telemetry-smoketest
No breaking changes. Purely additive.