fix(operator): Set appProtocol: grpc on registry gRPC Service#6367
Conversation
15e640c to
1124abd
Compare
1124abd to
49aeda2
Compare
|
@ntkathole and @shuchu — would appreciate a review when you have a moment. This is a one-liner fix (missing appProtocol: grpc on the registry Service port) that breaks Istio users. Happy to add tests if there's a pattern to follow for operator Service tests. |
|
@tmvfb can you please add unit test for this? Also, fix the linting |
@ntkathole, thanks for the prompt answer! The linting failure is not caused by this branch and comes from main, this PR only touches the go code. I added 7 unit tests, first 5 are pretty generic but considering the other unit tests are rather verbose, I think it is alright. The unit test failures also seem unrelated to this branch: |
|
@tmvfb ignore, unit tests - those will get fixed by #6427 you can either run |
e68542d to
5199cfe
Compare
Signed-off-by: Igor Kvachenok <igor.kvachenok@prokube.ai>
Signed-off-by: Igor Kvachenok <igor.kvachenok@prokube.ai>
Signed-off-by: Igor Kvachenok <igor.kvachenok@prokube.ai>
fb8b086 to
3023c9f
Compare
# [0.64.0](v0.63.0...v0.64.0) (2026-06-13) ### Bug Fixes * Add async_supported property to RedisOnlineStore ([9b088fe](9b088fe)) * Add missing feast init templates to operator CRD and enhance persistence documentation ([1941d4d](1941d4d)) * Allow to publish from reference branch ([5458ec8](5458ec8)) * API calls list ([4203eb7](4203eb7)) * **bigquery:** Enable list inference for parquet loads in offline_write_batch ([9243497](9243497)), closes [#5845](#5845) * Bump grpcio dependencies ([07b4782](07b4782)) * **compute-engine/local:** Honor field_mapping on join keys in dedup + join nodes ([#6395](#6395)) ([bd01824](bd01824)) * **dynamodb:** Avoid tag race condition by using diff-based tag updates ([#6479](#6479)) ([bad2b7d](bad2b7d)), closes [#6418](#6418) * **dynamodb:** Fix mypy type for _build_projection_expression return ([217b4da](217b4da)) * Fix intermittent async test failures for DynamoDB and Redis ([63c5eb1](63c5eb1)) * Fix mongodb blog title ([57d28d4](57d28d4)) * Fix shared SQL registry crash - avoid unnecessary UDF deserialization in proto cache building ([ac588d7](ac588d7)) * Fix SparkRetrievalJob.persist() failing for SparkSource ([209d7cd](209d7cd)) * Fixed formatting and image for mongo blog ([#6377](#6377)) ([f8389fb](f8389fb)) * Fixes for ray source ([7f592a4](7f592a4)) * **go:** skip registry refresh when cache_ttl_seconds <= 0 ([97ed40c](97ed40c)) * Handle array of strings columns in Athena materialization ([#6324](#6324)) ([4ed0278](4ed0278)) * make milvus VARCHAR max_length configurable, remove hardcoded 512 limit ([3b98c22](3b98c22)) * **operator:** Set appProtocol: grpc on registry gRPC Service ([#6367](#6367)) ([c9ae2b4](c9ae2b4)) * PyJWT 2.10+ added validation that rejects empty HMAC keys ([e756ffe](e756ffe)) * RemoteOnlineStore sends all features in a single HTTP request ([8f187dd](8f187dd)) * Remove registry proto dump to enforce RBAC and add permission checks to Commit/Refresh RPCs ([328431f](328431f)) * Remove selector migration job - no longer needed ([51c325e](51c325e)) * replace broken .claude skill symlink with correct relative path ([4541690](4541690)) * Replace selector label strip patch with migration Job for upgrade-safe selector uniqueness ([00dea50](00dea50)) * Scope feature view name conflict check to current project in file-based registry ([#6369](#6369)) ([a4fde83](a4fde83)), closes [#6209](#6209) * **snowflake:** Stop double-quoting connection identifiers ([#6462](#6462)) ([e914d59](e914d59)) * **spark:** S3/GCS PyArrow filesystem resolution for staging paths ([#6442](#6442)) ([ae50414](ae50414)) * **trino:** Clean up temporary entity tables after retrieval ([#6381](#6381)) ([d86b13d](d86b13d)), closes [#6306](#6306) * Update go-feature-server base image to Go 1.25 and fix operator Dockerfile COPY permissions ([86ef0bc](86ef0bc)) ### Features * [Backend] Data Quality Monitoring with native compute, multi-backend support, REST API, CLI ([#6202](#6202)) ([5458c37](5458c37)) * Add apache flink compute engine ([#6476](#6476)) ([9636d6a](9636d6a)) * Add demo noteboooks for users ([e362173](e362173)) * Add enabled/disabled toggle for feature views ([#6401](#6401)) ([5f1fa0d](5f1fa0d)), closes [#6395](#6395) * Add Label View to init template ([ec272d5](ec272d5)) * Add mTLS support to remote registry gRPC client ([#6474](#6474)) ([c9602d8](c9602d8)) * Add Prometheus gauges for FeatureStore installation telemetry ([#6354](#6354)) ([1b681b7](1b681b7)) * Adds registry REST API endpoints for managing entities, data sources, and feature views ([#6413](#6413)) ([f77bd1d](f77bd1d)) * Allow CRUD on entities, data sources, and feature views from UI ([#6412](#6412)) ([2321c07](2321c07)) * Allow default openlineage configuration ([#6467](#6467)) ([276b6df](276b6df)) * **bigquery:** Support DATE-type event timestamp columns ([#6362](#6362)) ([753dee5](753dee5)), closes [#2530](#2530) * **cli:** Add `feast projects delete` command (closes [#5095](#5095)) ([#6318](#6318)) ([1a4b96c](1a4b96c)) * Data Quality Monitoring added in feast UI ([#6422](#6422)) ([fa271be](fa271be)) * **dynamodb:** Use ProjectionExpression when requested_features is set ([0adc906](0adc906)), closes [#6058](#6058) * Enhance DataSource and FeatureView modals with error handling and submission states ([96d7169](96d7169)) * Expose registry endpoints on feature server for MCP access ([f77981c](f77981c)) * Feast First-Class LabelView Implementation ([#6292](#6292)) ([c0e7e5d](c0e7e5d)) * Feast-MLflow Integration ([#6235](#6235)) ([7279c75](7279c75)) * Operational metrics for offline store and SOX metrics for both ([#6340](#6340)) ([65b1b80](65b1b80)) * Pre-compute feature service ([8011550](8011550)) * REST API-backed UI for RBAC compatibility and per-page lazy loading ([#6414](#6414)) ([6ae80af](6ae80af)) * Support non-string map key types ([#6382](#6382)) ([#6383](#6383)) ([728aa2e](728aa2e)) * Update FeatureStore CRD with DRA Fields ([01241e4](01241e4)) ### Performance Improvements * Cache feature view resolution in get_online_features to reduce per-request overhead ([55c2f18](55c2f18)) * Optimize feature serving latency with batched async Redis, cached checks fix ([103809a](103809a)) * Replace MessageToDict with optimized custom dict builder ([#6015](#6015)) ([9902064](9902064))
Summary
The registry
Service(the use case is local registry +server.grpcenabled, i.e. SQLite) is always created withname: httpand noappProtocol. This causes service meshes (notably Istio) to misclassify the registry gRPC traffic as HTTP/1.1, breaking connections in two places simultaneously:name: http→ downgrades to HTTP/1.1. gRPC requires HTTP/2.Both failures stem from the same missing field. The fix is minimal: set
appProtocol: grpcon the registry gRPCServiceport.Changes
getServiceAppProtocolhelper that returnsptr.To("grpc")for the registry gRPC Service andnilfor all other services.AppProtocolon theServicePortinsetServicevia this helper.k8s.io/utils/ptrimport toservices.go(already a transitive dependency, used in tests).Effect
With
appProtocol: grpcset, Istio and other service meshes correctly identify the port as gRPC and handle HTTP/2 and mTLS normally, no workarounds required.Testing
go vet ./internal/...passes. Existing operator unit tests are unaffected (they assert on port number and name, notappProtocol). Manual verification on a cluster with Istio sidecar injection confirmed gRPC connectivity works end-to-end after this change with no additional mesh configuration.Workaround
appProtocol: grpc.traffic.sidecar.istio.io/excludeInboundPorts: "6570"on the feast-server pod to bypass it entirely.DestinationRulewithtls: DISABLEis also required to stop the client from attempting it.