Skip to content

fix(operator): Set appProtocol: grpc on registry gRPC Service#6367

Merged
ntkathole merged 3 commits into
feast-dev:masterfrom
tmvfb:fix/registry-service-grpc-app-protocol
May 22, 2026
Merged

fix(operator): Set appProtocol: grpc on registry gRPC Service#6367
ntkathole merged 3 commits into
feast-dev:masterfrom
tmvfb:fix/registry-service-grpc-app-protocol

Conversation

@tmvfb

@tmvfb tmvfb commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

The registry Service (the use case is local registry + server.grpc enabled, i.e. SQLite) is always created with name: http and no appProtocol. This causes service meshes (notably Istio) to misclassify the registry gRPC traffic as HTTP/1.1, breaking connections in two places simultaneously:

  • Client-side envoy: sees name: http → downgrades to HTTP/1.1. gRPC requires HTTP/2.
  • Server-side envoy: builds its inbound listener for the registry port as HTTP/1.1 → receives gRPC (HTTP/2) frames → sends RST with "protocol error".

Both failures stem from the same missing field. The fix is minimal: set appProtocol: grpc on the registry gRPC Service port.

Changes

  • Added getServiceAppProtocol helper that returns ptr.To("grpc") for the registry gRPC Service and nil for all other services.
  • Set AppProtocol on the ServicePort in setService via this helper.
  • Added k8s.io/utils/ptr import to services.go (already a transitive dependency, used in tests).

Effect

With appProtocol: grpc set, Istio and other service meshes correctly identify the port as gRPC and handle HTTP/2 and mTLS normally, no workarounds required.

Testing

go vet ./internal/... passes. Existing operator unit tests are unaffected (they assert on port number and name, not appProtocol). Manual verification on a cluster with Istio sidecar injection confirmed gRPC connectivity works end-to-end after this change with no additional mesh configuration.

Workaround

  • Client-side envoy downgrades to HTTP/1.1. Fixed by creating an alt-Service with appProtocol: grpc.
  • Server-side envoy builds its inbound listener for port 6570 as HTTP/1.1 and rejects incoming HTTP/2 with a protocol error. The only workaround is traffic.sidecar.istio.io/excludeInboundPorts: "6570" on the feast-server pod to bypass it entirely.
  • With the server-side envoy bypassed there is nothing to terminate mTLS, so a DestinationRule with tls: DISABLE is also required to stop the client from attempting it.

@tmvfb tmvfb force-pushed the fix/registry-service-grpc-app-protocol branch from 15e640c to 1124abd Compare May 5, 2026 11:20
@tmvfb tmvfb marked this pull request as ready for review May 5, 2026 11:34
@tmvfb tmvfb requested a review from a team as a code owner May 5, 2026 11:34
@tmvfb tmvfb force-pushed the fix/registry-service-grpc-app-protocol branch from 1124abd to 49aeda2 Compare May 22, 2026 09:45
@tmvfb

tmvfb commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@ntkathole and @shuchu — would appreciate a review when you have a moment. This is a one-liner fix (missing appProtocol: grpc on the registry Service port) that breaks Istio users. Happy to add tests if there's a pattern to follow for operator Service tests.

@ntkathole ntkathole changed the title fix(operator): set appProtocol: grpc on registry gRPC Service fix(operator): Set appProtocol: grpc on registry gRPC Service May 22, 2026
@ntkathole

ntkathole commented May 22, 2026

Copy link
Copy Markdown
Member

@tmvfb can you please add unit test for this?

Also, fix the linting

@tmvfb

tmvfb commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

@tmvfb can you please add unit test for this?

Also, fix the linting

@ntkathole, thanks for the prompt answer! The linting failure is not caused by this branch and comes from main, this PR only touches the go code.

I added 7 unit tests, first 5 are pretty generic but considering the other unit tests are rather verbose, I think it is alright. The unit test failures also seem unrelated to this branch:

=========================== short test summary info ============================

FAILED
  sdk/python/tests/unit/permissions/auth/client/
  test_authentication_client_manager_factory.py::
  test_authentication_client_manager_factory[
    auth:
      type: kubernetes
  ]

  jwt.exceptions.InvalidKeyError:
    HMAC key must not be empty.


FAILED
  sdk/python/tests/unit/permissions/auth/client/
  test_authentication_client_manager_factory.py::
  test_authentication_client_manager_factory[
    auth:
      type: oidc
      client_id: feast-integration-client
      client_secret: feast-integration-client-secret
      username: reader_writer
      password: password
      auth_discovery_url:
        KEYCLOAK_URL_PLACE_HOLDER/realms/master/.well-known/openid-configuration
  ]

  jwt.exceptions.InvalidKeyError:
    HMAC key must not be empty.


==== 2 failed, 1730 passed, 134 skipped, 227 warnings in 489.99s (0:08:09) ====

make: *** [Makefile:170: test-python-unit] Error 1

@ntkathole

Copy link
Copy Markdown
Member

@tmvfb ignore, unit tests - those will get fixed by #6427
But linting issue is related to this branch, since it touches services.go, secrets baseline failed due to code line number changed

 "filename": "infra/feast-operator/internal/controller/services/services.go",
         "hashed_secret": "36dc326eb15c7bdd8d91a6b87905bcea20b637d1",
         "is_verified": false,
-        "line_number": 179
+        "line_number": 180

you can either run detect-secrets scan --baseline .secrets.baseline or pre-commit run detect-secrets --all-files to fix

@ntkathole ntkathole force-pushed the fix/registry-service-grpc-app-protocol branch from e68542d to 5199cfe Compare May 22, 2026 11:31
tmvfb added 3 commits May 22, 2026 17:43
Signed-off-by: Igor Kvachenok <igor.kvachenok@prokube.ai>
Signed-off-by: Igor Kvachenok <igor.kvachenok@prokube.ai>
Signed-off-by: Igor Kvachenok <igor.kvachenok@prokube.ai>
@ntkathole ntkathole force-pushed the fix/registry-service-grpc-app-protocol branch from fb8b086 to 3023c9f Compare May 22, 2026 12:13
@ntkathole ntkathole merged commit c9ae2b4 into feast-dev:master May 22, 2026
19 of 25 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Jun 13, 2026
# [0.64.0](v0.63.0...v0.64.0) (2026-06-13)

### Bug Fixes

* Add async_supported property to RedisOnlineStore ([9b088fe](9b088fe))
* Add missing feast init templates to operator CRD and enhance persistence documentation ([1941d4d](1941d4d))
* Allow to publish from reference branch ([5458ec8](5458ec8))
* API calls list ([4203eb7](4203eb7))
* **bigquery:** Enable list inference for parquet loads in offline_write_batch ([9243497](9243497)), closes [#5845](#5845)
* Bump grpcio dependencies ([07b4782](07b4782))
* **compute-engine/local:** Honor field_mapping on join keys in dedup + join nodes ([#6395](#6395)) ([bd01824](bd01824))
* **dynamodb:** Avoid tag race condition by using diff-based tag updates ([#6479](#6479)) ([bad2b7d](bad2b7d)), closes [#6418](#6418)
* **dynamodb:** Fix mypy type for _build_projection_expression return ([217b4da](217b4da))
* Fix intermittent async test failures for DynamoDB and Redis ([63c5eb1](63c5eb1))
* Fix mongodb blog title ([57d28d4](57d28d4))
* Fix shared SQL registry crash - avoid unnecessary UDF deserialization in proto cache building ([ac588d7](ac588d7))
* Fix SparkRetrievalJob.persist() failing for SparkSource ([209d7cd](209d7cd))
* Fixed formatting and image for mongo blog ([#6377](#6377)) ([f8389fb](f8389fb))
* Fixes for ray source ([7f592a4](7f592a4))
* **go:** skip registry refresh when cache_ttl_seconds <= 0 ([97ed40c](97ed40c))
* Handle array of strings columns in Athena materialization ([#6324](#6324)) ([4ed0278](4ed0278))
* make milvus VARCHAR max_length configurable, remove hardcoded 512 limit ([3b98c22](3b98c22))
* **operator:** Set appProtocol: grpc on registry gRPC Service ([#6367](#6367)) ([c9ae2b4](c9ae2b4))
* PyJWT 2.10+ added validation that rejects empty HMAC keys ([e756ffe](e756ffe))
* RemoteOnlineStore sends all features in a single HTTP request ([8f187dd](8f187dd))
* Remove registry proto dump to enforce RBAC and add permission checks to Commit/Refresh RPCs ([328431f](328431f))
* Remove selector migration job - no longer needed ([51c325e](51c325e))
* replace broken .claude skill symlink with correct relative path ([4541690](4541690))
* Replace selector label strip patch with migration Job for upgrade-safe selector uniqueness ([00dea50](00dea50))
* Scope feature view name conflict check to current project in file-based registry ([#6369](#6369)) ([a4fde83](a4fde83)), closes [#6209](#6209)
* **snowflake:** Stop double-quoting connection identifiers ([#6462](#6462)) ([e914d59](e914d59))
* **spark:** S3/GCS PyArrow filesystem resolution for staging paths ([#6442](#6442)) ([ae50414](ae50414))
* **trino:** Clean up temporary entity tables after retrieval ([#6381](#6381)) ([d86b13d](d86b13d)), closes [#6306](#6306)
* Update go-feature-server base image to Go 1.25 and fix operator Dockerfile COPY permissions ([86ef0bc](86ef0bc))

### Features

* [Backend] Data Quality Monitoring with native compute, multi-backend support, REST API, CLI ([#6202](#6202)) ([5458c37](5458c37))
* Add apache flink compute engine ([#6476](#6476)) ([9636d6a](9636d6a))
* Add demo noteboooks for users ([e362173](e362173))
* Add enabled/disabled toggle for feature views ([#6401](#6401)) ([5f1fa0d](5f1fa0d)), closes [#6395](#6395)
* Add Label View to init template ([ec272d5](ec272d5))
* Add mTLS support to remote registry gRPC client ([#6474](#6474)) ([c9602d8](c9602d8))
* Add Prometheus gauges for FeatureStore installation telemetry ([#6354](#6354)) ([1b681b7](1b681b7))
* Adds registry REST API endpoints for managing entities, data sources, and feature views ([#6413](#6413)) ([f77bd1d](f77bd1d))
* Allow CRUD on entities, data sources, and feature views from UI ([#6412](#6412)) ([2321c07](2321c07))
* Allow default openlineage configuration ([#6467](#6467)) ([276b6df](276b6df))
* **bigquery:** Support DATE-type event timestamp columns ([#6362](#6362)) ([753dee5](753dee5)), closes [#2530](#2530)
* **cli:** Add `feast projects delete` command (closes [#5095](#5095)) ([#6318](#6318)) ([1a4b96c](1a4b96c))
* Data Quality Monitoring added in feast UI ([#6422](#6422)) ([fa271be](fa271be))
* **dynamodb:** Use ProjectionExpression when requested_features is set ([0adc906](0adc906)), closes [#6058](#6058)
* Enhance DataSource and FeatureView modals with error handling and submission states ([96d7169](96d7169))
* Expose registry endpoints on feature server for MCP access ([f77981c](f77981c))
* Feast First-Class LabelView Implementation ([#6292](#6292)) ([c0e7e5d](c0e7e5d))
* Feast-MLflow Integration ([#6235](#6235)) ([7279c75](7279c75))
* Operational metrics for offline store and SOX metrics for both ([#6340](#6340)) ([65b1b80](65b1b80))
* Pre-compute feature service ([8011550](8011550))
* REST API-backed UI for RBAC compatibility and per-page lazy loading ([#6414](#6414)) ([6ae80af](6ae80af))
* Support non-string map key types ([#6382](#6382)) ([#6383](#6383)) ([728aa2e](728aa2e))
* Update FeatureStore CRD with DRA Fields ([01241e4](01241e4))

### Performance Improvements

* Cache feature view resolution in get_online_features to reduce per-request overhead ([55c2f18](55c2f18))
* Optimize feature serving latency with batched async Redis, cached checks fix ([103809a](103809a))
* Replace MessageToDict with optimized custom dict builder ([#6015](#6015)) ([9902064](9902064))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants