| Badge | Workflow | Cluster | Always runs |
|---|---|---|---|
| CI | ci.yml |
none | yes β lint, build, security on every push/PR |
| Tier 1 Tests | ci-unit-tests.yaml |
Kind (ephemeral) | yes β unit + integration tests |
| E2E Kind | e2e-kind.yaml |
Kind (ephemeral) | yes β Tier 1 E2E on every push/PR |
| E2E OpenShift | e2e-openshift.yaml |
live OCP cluster | no β see note below |
Grey or skipped E2E OpenShift badge is expected and not a failure. OpenShift E2E tests (Tiers 2-5) only run when
OPENSHIFT_SERVERandOPENSHIFT_TOKENrepository secrets are set and either a push tomain/release-*occurs or a PR has thee2e-testlabel applied. See docs/CI_CLUSTER_SETUP.md for cluster registration instructions.
A Kubernetes-native operator that automates Jupyter Notebook validation in MLOps workflows. Built with Operator SDK and Go, it provides Git integration, pod orchestration for notebook execution, golden notebook comparison for regression testing, and model-aware validation for ML/AI workloads.
The Jupyter Notebook Validator Operator enables automated testing and validation of Jupyter notebooks in Kubernetes and OpenShift environments. It's designed for data science teams, ML engineers, and platform teams who need to ensure notebook reliability, reproducibility, and integration with deployed ML models.
High-level flow from a NotebookValidationJob to execution, comparison, and status:
flowchart LR
User["User / CI"] -->|"kubectl apply"| CR["NotebookValidationJob CR"]
CR --> Controller["Operator Controller"]
Controller -->|"clone"| Git["Git Repository"]
Controller -->|"create"| Pod["Validation Pod"]
Pod -->|"execute via Papermill"| Notebook["Jupyter Notebook"]
Controller -->|"compare"| Golden["Golden Notebook"]
Controller -->|"validate"| Model["ML Model Endpoint"]
Controller -->|"update"| Status["Job Status"]
For more detail, see Architecture Overview.
- π Automated Notebook Execution - Execute notebooks in isolated Kubernetes pods with Papermill
- π Golden Notebook Comparison - Regression testing with cell-by-cell output comparison
- π Credential Management - Secure injection of credentials (AWS, databases, APIs) via Secrets, ESO, or Vault
- π€ Model-Aware Validation - Validate notebooks against deployed models (KServe, OpenShift AI, vLLM, etc.)
- π Git Integration - Clone notebooks from Git repositories (HTTPS and SSH authentication)
- π Observability - Prometheus metrics and structured logging with credential sanitization
- π― Platform Detection - Auto-detect model serving platforms (9 platforms supported)
- π Security - RBAC, Pod Security Standards, secret rotation, and audit logging
- Kubernetes/OpenShift Cluster: OpenShift 4.18+ (recommended) or Kubernetes 1.31+
- Command-line Tools: kubectl or oc CLI, make (for building from source)
- Optional: External Secrets Operator (ESO), KServe or OpenShift AI, Tekton Pipelines (for build integration)
# Install CRDs
make install
# Build and push image
make docker-build docker-push IMG=quay.io/tosin2013/jupyter-notebook-validator-operator:v0.1.0
# Deploy operator
make deploy IMG=quay.io/tosin2013/jupyter-notebook-validator-operator:v0.1.0kubectl get pods -n jupyter-notebook-validator-operator-system
kubectl get crd notebookvalidationjobs.mlops.mlops.devSee config/samples/ for complete examples.
apiVersion: mlops.mlops.dev/v1alpha1
kind: NotebookValidationJob
metadata:
name: simple-validation
spec:
notebook:
git:
url: https://github.com/tosin2013/jupyter-notebook-validator-test-notebooks.git
ref: main
path: notebooks/tier1-simple/01-hello-world.ipynb
podConfig:
containerImage: quay.io/jupyter/scipy-notebook:latestSchedule validation pods on GPU nodes, high-memory nodes, or spot instances using Kubernetes-native scheduling features:
apiVersion: mlops.mlops.dev/v1alpha1
kind: NotebookValidationJob
metadata:
name: gpu-training-validation
spec:
notebook:
git:
url: https://github.com/example/ml-notebooks.git
ref: main
path: notebooks/gpu-training.ipynb
podConfig:
containerImage: quay.io/jupyter/pytorch-notebook:cuda-latest
resources:
limits:
nvidia.com/gpu: "1"
memory: "16Gi"
# Tolerate GPU node taints
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# Target GPU nodes
nodeSelector:
nvidia.com/gpu.present: "true"
# Advanced affinity rules
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.present
operator: In
values: ["true"]
timeout: "2h"See config/samples/mlops_v1alpha1_notebookvalidationjob_gpu_scheduling.yaml for more examples including:
- GPU node scheduling with NVIDIA tolerations
- High-memory node scheduling
- Spot/preemptible instance scheduling
- Multi-tenant cluster node pools with pod anti-affinity
- Architecture Overview - System design
- Testing Guide - Testing procedures
- Notebook Credentials Guide - Credential injection
- Model Discovery Guide - Model validation
- Community Platforms - Supported platforms
- ADRs - Architectural decisions
- Model Serving: KServe, OpenShift AI, vLLM, TorchServe, TensorFlow Serving, Triton, Ray Serve, Seldon, BentoML
- Credential Management: Kubernetes Secrets, External Secrets Operator (ESO), HashiCorp Vault
We welcome contributions! Please read CONTRIBUTING.md for setup, coding standards, and the pull request process.
This project follows the Contributor Covenant. By participating, you agree to uphold this code.
- Issues & feature requests: Use GitHub Issues.
- Discussions: GitHub Discussions for Q&A and sharing usage patterns.
- Distribution: OperatorHub.io (OLM) and Artifact Hub.
Copyright 2025 Tosin Akinosho. Licensed under the Apache License, Version 2.0.