KServe
Highly scalable and standards-based model inference platform on Kubernetes for trusted AI. KServe is a standard model inference platform on Kubernetes, built for highly scalable use cases. Provides performant, standardized inference protocol across ML frameworks. Support modern serverless inference workload with autoscaling including a scale to zero on GPU. Provides high scalability, density packing, and intelligent routing using ModelMesh. Simple and pluggable production serving for production ML serving including prediction, pre/post-processing, monitoring, and explainability. Advanced deployments with the canary rollout, experiments, ensembles, and transformers. ModelMesh is designed for high-scale, high-density, and frequently-changing model use cases. ModelMesh intelligently loads and unloads AI models to and from memory to strike an intelligent trade-off between responsiveness to users and computational footprint.
Learn more
Feast
Make your offline data available for real-time predictions without having to build custom pipelines. Ensure data consistency between offline training and online inference, eliminating train-serve skew. Standardize data engineering workflows under one consistent framework. Teams use Feast as the foundation of their internal ML platforms. Feast doesn’t require the deployment and management of dedicated infrastructure. Instead, it reuses existing infrastructure and spins up new resources when needed. You are not looking for a managed solution and are willing to manage and maintain your own implementation. You have engineers that are able to support the implementation and management of Feast. You want to run pipelines that transform raw data into features in a separate system and integrate with it. You have unique requirements and want to build on top of an open source solution.
Learn more
JFrog ML
JFrog ML (formerly Qwak) offers an MLOps platform designed to accelerate the development, deployment, and monitoring of machine learning and AI applications at scale. The platform enables organizations to manage the entire lifecycle of machine learning models, from training to deployment, with tools for model versioning, monitoring, and performance tracking. It supports a wide variety of AI models, including generative AI and LLMs (Large Language Models), and provides an intuitive interface for managing prompts, workflows, and feature engineering. JFrog ML helps businesses streamline their ML operations and scale AI applications efficiently, with integrated support for cloud environments.
Learn more
NVIDIA Triton Inference Server
NVIDIA Triton™ inference server delivers fast and scalable AI in production. Open-source inference serving software, Triton inference server streamlines AI inference by enabling teams deploy trained AI models from any framework (TensorFlow, NVIDIA TensorRT®, PyTorch, ONNX, XGBoost, Python, custom and more on any GPU- or CPU-based infrastructure (cloud, data center, or edge). Triton runs models concurrently on GPUs to maximize throughput and utilization, supports x86 and ARM CPU-based inferencing, and offers features like dynamic batching, model analyzer, model ensemble, and audio streaming. Triton helps developers deliver high-performance inference aTriton integrates with Kubernetes for orchestration and scaling, exports Prometheus metrics for monitoring, supports live model updates, and can be used in all major public cloud machine learning (ML) and managed Kubernetes platforms. Triton helps standardize model deployment in production.
Learn more