Skip to content

tosin2013/jupyter-notebook-validator-operator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

312 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Jupyter Notebook Validator Operator

License Go Report Card CI Tier 1 Tests E2E Kind E2E OpenShift OpenShift Kubernetes

CI Badge Guide

Badge Workflow Cluster Always runs
CI ci.yml none yes β€” lint, build, security on every push/PR
Tier 1 Tests ci-unit-tests.yaml Kind (ephemeral) yes β€” unit + integration tests
E2E Kind e2e-kind.yaml Kind (ephemeral) yes β€” Tier 1 E2E on every push/PR
E2E OpenShift e2e-openshift.yaml live OCP cluster no β€” see note below

Grey or skipped E2E OpenShift badge is expected and not a failure. OpenShift E2E tests (Tiers 2-5) only run when OPENSHIFT_SERVER and OPENSHIFT_TOKEN repository secrets are set and either a push to main/release-* occurs or a PR has the e2e-test label applied. See docs/CI_CLUSTER_SETUP.md for cluster registration instructions.

A Kubernetes-native operator that automates Jupyter Notebook validation in MLOps workflows. Built with Operator SDK and Go, it provides Git integration, pod orchestration for notebook execution, golden notebook comparison for regression testing, and model-aware validation for ML/AI workloads.

Overview

The Jupyter Notebook Validator Operator enables automated testing and validation of Jupyter notebooks in Kubernetes and OpenShift environments. It's designed for data science teams, ML engineers, and platform teams who need to ensure notebook reliability, reproducibility, and integration with deployed ML models.

Architecture

High-level flow from a NotebookValidationJob to execution, comparison, and status:

flowchart LR
    User["User / CI"] -->|"kubectl apply"| CR["NotebookValidationJob CR"]
    CR --> Controller["Operator Controller"]
    Controller -->|"clone"| Git["Git Repository"]
    Controller -->|"create"| Pod["Validation Pod"]
    Pod -->|"execute via Papermill"| Notebook["Jupyter Notebook"]
    Controller -->|"compare"| Golden["Golden Notebook"]
    Controller -->|"validate"| Model["ML Model Endpoint"]
    Controller -->|"update"| Status["Job Status"]
Loading

For more detail, see Architecture Overview.

Key Features

  • πŸ”„ Automated Notebook Execution - Execute notebooks in isolated Kubernetes pods with Papermill
  • πŸ“Š Golden Notebook Comparison - Regression testing with cell-by-cell output comparison
  • πŸ” Credential Management - Secure injection of credentials (AWS, databases, APIs) via Secrets, ESO, or Vault
  • πŸ€– Model-Aware Validation - Validate notebooks against deployed models (KServe, OpenShift AI, vLLM, etc.)
  • πŸ” Git Integration - Clone notebooks from Git repositories (HTTPS and SSH authentication)
  • πŸ“ˆ Observability - Prometheus metrics and structured logging with credential sanitization
  • 🎯 Platform Detection - Auto-detect model serving platforms (9 platforms supported)
  • πŸ”’ Security - RBAC, Pod Security Standards, secret rotation, and audit logging

Quick Start

Prerequisites

  • Kubernetes/OpenShift Cluster: OpenShift 4.18+ (recommended) or Kubernetes 1.31+
  • Command-line Tools: kubectl or oc CLI, make (for building from source)
  • Optional: External Secrets Operator (ESO), KServe or OpenShift AI, Tekton Pipelines (for build integration)

Installation

# Install CRDs
make install

# Build and push image
make docker-build docker-push IMG=quay.io/tosin2013/jupyter-notebook-validator-operator:v0.1.0

# Deploy operator
make deploy IMG=quay.io/tosin2013/jupyter-notebook-validator-operator:v0.1.0

Verify Installation

kubectl get pods -n jupyter-notebook-validator-operator-system
kubectl get crd notebookvalidationjobs.mlops.mlops.dev

Usage Examples

See config/samples/ for complete examples.

Basic Validation

apiVersion: mlops.mlops.dev/v1alpha1
kind: NotebookValidationJob
metadata:
  name: simple-validation
spec:
  notebook:
    git:
      url: https://github.com/tosin2013/jupyter-notebook-validator-test-notebooks.git
      ref: main
    path: notebooks/tier1-simple/01-hello-world.ipynb
  podConfig:
    containerImage: quay.io/jupyter/scipy-notebook:latest

GPU and Specialized Node Scheduling

Schedule validation pods on GPU nodes, high-memory nodes, or spot instances using Kubernetes-native scheduling features:

apiVersion: mlops.mlops.dev/v1alpha1
kind: NotebookValidationJob
metadata:
  name: gpu-training-validation
spec:
  notebook:
    git:
      url: https://github.com/example/ml-notebooks.git
      ref: main
    path: notebooks/gpu-training.ipynb
  podConfig:
    containerImage: quay.io/jupyter/pytorch-notebook:cuda-latest
    resources:
      limits:
        nvidia.com/gpu: "1"
        memory: "16Gi"
    # Tolerate GPU node taints
    tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
    # Target GPU nodes
    nodeSelector:
      nvidia.com/gpu.present: "true"
    # Advanced affinity rules
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: nvidia.com/gpu.present
                  operator: In
                  values: ["true"]
  timeout: "2h"

See config/samples/mlops_v1alpha1_notebookvalidationjob_gpu_scheduling.yaml for more examples including:

  • GPU node scheduling with NVIDIA tolerations
  • High-memory node scheduling
  • Spot/preemptible instance scheduling
  • Multi-tenant cluster node pools with pod anti-affinity

Documentation

Supported Platforms

  • Model Serving: KServe, OpenShift AI, vLLM, TorchServe, TensorFlow Serving, Triton, Ray Serve, Seldon, BentoML
  • Credential Management: Kubernetes Secrets, External Secrets Operator (ESO), HashiCorp Vault

Contributing

We welcome contributions! Please read CONTRIBUTING.md for setup, coding standards, and the pull request process.

Code of Conduct

This project follows the Contributor Covenant. By participating, you agree to uphold this code.

Community

License

Copyright 2025 Tosin Akinosho. Licensed under the Apache License, Version 2.0.