Conversation
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Fix: Feast configmap failure due to missing notebook CRD timing
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
- Add label-filtered informer cache (app.kubernetes.io/managed-by=feast-operator) for ConfigMaps to prevent cluster-wide cache flooding OOM - Add GOMEMLIMIT=230MiB (90% of 256Mi limit) as GC safety net - Add JSON Patch test ops to validate env indices before replace - Strip managedFields from cached objects via DefaultTransform - Use shared constants for label keys across operator and tests
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
feat: Auto-Discovery of feast client configmaps to the users/groups defined in feast permissions policies
… apply Signed-off-by: Ani Lynch <anixlynch@gmail.com>
Signed-off-by: Vanshika Vanshika <vvanshik@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
…ast-dev#6126) * feat: Add ServiceMonitor auto-generation for Prometheus discovery Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> * fix: Server-side apply Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> --------- Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
feat: Manually Add commits from upstream as auto-merge failed
fix: Harden informer cache with label selectors and memory optimizations
fix: Failure of AutoMerge Remote tracking branch upstream master
Fixed the "already exists" error when multiple FeatureStore instances try to create the same "feast-configs-registry-reader" Role. Changed from CreateOrUpdate to a check-then-create pattern to avoid conflicts when the shared Role already exists. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
| func (r *FeatureStoreReconciler) countOtherFeatureStoresInNamespace(ctx context.Context, namespace, excludeName string) int { | ||
| var list feastdevv1.FeatureStoreList | ||
| if err := r.List(ctx, &list, client.InNamespace(namespace)); err != nil { | ||
| return -1 | ||
| } | ||
| count := 0 | ||
| for i := range list.Items { | ||
| if list.Items[i].Name != excludeName && list.Items[i].DeletionTimestamp == nil { | ||
| count++ | ||
| } | ||
| } | ||
| return count | ||
| } | ||
|
|
||
| func (r *FeatureStoreReconciler) countOtherFeatureStoresInCluster(ctx context.Context, namespace, excludeName string) int { | ||
| var list feastdevv1.FeatureStoreList | ||
| if err := r.List(ctx, &list); err != nil { | ||
| return -1 | ||
| } | ||
| count := 0 | ||
| for i := range list.Items { | ||
| item := &list.Items[i] | ||
| if (item.Name != excludeName || item.Namespace != namespace) && item.DeletionTimestamp == nil { | ||
| count++ | ||
| } | ||
| } | ||
| return count | ||
| } |
There was a problem hiding this comment.
🔴 countOtherFeatureStores returning -1 on error causes premature cleanup of shared resources
When the Kubernetes API List call fails (e.g., transient API server error), countOtherFeatureStoresInNamespace and countOtherFeatureStoresInCluster return -1. This value is passed to RemoveNamespaceLabelIfLast and CleanupDiscoverClusterRoleIfLast at infra/feast-operator/internal/controller/access/access.go:61 and infra/feast-operator/internal/controller/access/rbac.go:222, both of which guard cleanup with otherFeatureStoreCount > 0. Since -1 > 0 is false, the guard is bypassed and the namespace label / shared ClusterRole are deleted even though other FeatureStores may still depend on them. This breaks namespace discovery for dashboard applications and removes RBAC for all remaining FeatureStores in the cluster.
Downstream impact in access.go and rbac.go
RemoveNamespaceLabelIfLast (access.go:61) and CleanupDiscoverClusterRoleIfLast (rbac.go:222) both use if otherFeatureStoreCount > 0 { return nil } — a sentinel value of -1 passes this guard and triggers deletion.
Prompt for agents
In infra/feast-operator/internal/controller/featurestore_controller.go, the functions countOtherFeatureStoresInNamespace (lines 239-251) and countOtherFeatureStoresInCluster (lines 253-266) return -1 on List errors. The callers in cleanupOnDeletion (lines 203-222) pass this value directly to RemoveNamespaceLabelIfLast and CleanupDiscoverClusterRoleIfLast, which interpret any value <= 0 as 'no other stores exist' and proceed with deletion.
Fix: Either (a) change the return type to (int, error) so callers can skip cleanup on error, or (b) return a large positive sentinel value (e.g., math.MaxInt32) on error so the > 0 guard in the downstream functions prevents accidental cleanup. For approach (a), update cleanupOnDeletion at lines 205 and 212 to check the error and skip the corresponding cleanup call when the count cannot be determined.
Also update the callers:
- Line 206: access.RemoveNamespaceLabelIfLast should only be called when otherCount was successfully retrieved
- Line 213: access.CleanupDiscoverClusterRoleIfLast should only be called when clusterCount was successfully retrieved
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
Fixes issue in LTS nightly run :
e2e tests failing with latest changes as 'FeatureStore' is not in 'Ready' state.
Which issue(s) this PR fixes:
Checks
git commit -s)Testing Strategy
Misc