-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
Description
Summary
Allow users to specify a pre-existing Kubernetes Secret for authentication when using authOptions.mode: token, instead of having the operator auto-generate a new secret for each RayCluster.
Problem Statement
When using RayService with authentication enabled (authOptions.mode: token), the KubeRay operator automatically generates a new auth secret each time a new RayCluster is created. The secret name is derived from the RayCluster name (e.g., <cluster-name>-auth-secret).
Current behavior:
- RayService creates a RayCluster with a generated name (e.g., myservice-raycluster-abc123)
- Operator creates auth secret: myservice-raycluster-abc123-auth-secret
- On RayService update (new model deployment, config change, etc.), a new RayCluster is created (e.g., myservice-raycluster-def456)
- Operator creates a new auth secret: myservice-raycluster-def456-auth-secret
- Old cluster and secret are eventually cleaned up
Use case
The problem:
External services that submit Ray jobs to the cluster (e.g., Apache Airflow DAGs, indexing microservices, CI/CD pipelines) need to know the auth token. Since the secret name and token value change with every RayService update, these external services face challenges:
They cannot use a static secret reference in their configuration
They must dynamically discover the current secret name (requires K8s API access and logic to find the active RayCluster)
During blue-green transitions, there's ambiguity about which token to use
Credential rotation in external systems becomes operationally complex
Proposed Solution
Add an optional secretName field to AuthConfig that allows users to reference a pre-existing Kubernetes Secret:
API Change
// In ray-operator/apis/ray/v1/raycluster_types.go
type AuthConfig struct {
// Mode specifies the authentication mode (none, token)
Mode AuthMode `json:"mode,omitempty"`
// SecretName specifies the name of an existing Secret containing the auth token.
// If provided, the operator will use this secret instead of generating a new one.
// The secret must exist in the same namespace and contain a 'token' key.
// If not specified and mode is "token", a secret will be auto-generated.
// +optional
SecretName string `json:"secretName,omitempty"`
}
A reconciliation logic would then check:
func (r *RayClusterReconciler) reconcileAuth(ctx context.Context, instance *rayv1.RayCluster) error {
if instance.Spec.HeadGroupSpec.AuthConfig.Mode == rayv1.AuthModeToken {
var secretName string
if instance.Spec.HeadGroupSpec.AuthConfig.SecretName != "" {
// Use user-provided secret
secretName = instance.Spec.HeadGroupSpec.AuthConfig.SecretName
// Validate secret exists and has correct keys
} else {
// Generate secret as before
secretName = utils.GenerateAuthSecretName(instance.Name)
}
// ... mount logic
}
}
Example Usage
- Create a static secret (one-time setup):
apiVersion: v1
kind: Secret
metadata:
name: ray-static-auth-token
namespace: ray-workloads
type: Opaque
stringData:
token: "my-secure-static-token-value"
- Reference it in RayService:
apiVersion: ray.io/v1
kind: RayService
metadata:
name: ml-inference-service
namespace: ray-workloads
spec:
rayClusterConfig:
authOptions:
mode: token
secretName: ray-static-auth-token # <-- New field
headGroupSpec:
# ...
workerGroupSpecs:
# ...
Now the External services use the same static secret and users can easily plan a rotation based on some external logic if required.
Alternatives considered:
I did consider the alternative example here but I am not sure if we want to continue with this approach as the doc here mentions it is for KubeRay version older than v1.5.1 and also involves lot more steps to enable it which can be taken care by the operator.