Releases: kubernetes-sigs/gateway-api-inference-extension
v1.1.0
New and noteworthy
-
This release is primarily focused on sharing and enabling users to try our experimental features we are developing:
-
Flow Control is available as an experimental feature! To enable include ENABLE_EXPERIMENTAL_FLOW_CONTROL_LAYER as an env var, set to true (this can be done from the helm chart). Docs are WIP and soon coming!
-
Multi-port support is available with GW implementations that also support this. This enables sophisticated features like Wide EP. GW providers support forthcoming.
-
Multi-Cluster support the API surface has been extended to experimentally support multi-cluster support. Docs are WIP and coming soon!
What's Changed
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.24.0 to 2.25.1 by @dependabot[bot] in #1467
- adding warning labels to site while we update docs by @kfswain in #1466
- removed datastore dependency from saturation detector by @nirrozenbaum in #1293
- Add initial troubleshooting guide by @nicolexin in #1430
- fix: Pin to vllm v0.8.5 by @capri-xiyue in #1453
- chore(deps): bump google.golang.org/protobuf from 1.36.7 to 1.36.8 by @dependabot[bot] in #1471
- Updating proposal statuses by @kfswain in #1472
- chore(deps): bump google.golang.org/grpc from 1.74.2 to 1.75.0 by @dependabot[bot] in #1468
- fix(conformance): remove the inferenceObjective dependency compeletely. by @zetxqx in #1477
- chore(deps): bump github.com/onsi/gomega from 1.38.0 to 1.38.1 by @dependabot[bot] in #1469
- chore(deps): bump github.com/stretchr/testify from 1.10.0 to 1.11.0 by @dependabot[bot] in #1470
- Add Alibaba Cloud ack-gie conformance report for v0.5.1 by @delavet in #1478
- pin vllm gpu image by @nirrozenbaum in #1479
- [docs] Updating the FAQ by @kfswain in #1474
- Fix troubleshooting guide by @nicolexin in #1485
- Update provider name in helm chart for GKE to be not case sensitive by @rahulgurnani in #1486
- refactor(registry): Replace event-driven GC with a lease-based lifecycle by @LukeAVanDrie in #1476
- use context in indexer go routine instead of context.TODO by @nirrozenbaum in #1491
- cleanup: make port definitions symmetric and clean endpointPickerRef.port by @capri-xiyue in #1484
- Adding conformance report for Kubvernor by @dawid-nowak in #1316
- fix(conformance) add endpointConfig port in conformance test for v1 as it's required for service kind now. by @zetxqx in #1499
- feat(conformance): Use CRD annotation to populate the ConformanceReport GatewayAPIInferenceExtensionVersion by @zetxqx in #1214
- if request id was not supplied in header, generate uuid by @nirrozenbaum in #1490
- prefix state temp fix - write state to both plugin state and cycle state by @nirrozenbaum in #1509
- Fix(datastore): Correct inverted log messages in podResyncAll by @fyuan1316 in #1511
- Add WeightedRandomPicker by @Jooho in #1412
- allow setting custom plugins file through helm by @nirrozenbaum in #1508
- feat(helm): add affinity and tolerations to epp-deployment by @hhk7734 in #1504
- typo: add vLLM Prefix Cache & LoRA Adapters links by @zhengkezhou1 in #1280
- docs: update BBR guide by @chewong in #1517
- Docs: updated docs to include network service api enable for gke by @capri-xiyue in #1435
- cleanup modelName from inferenceObjective. by @zetxqx in #1521
- fix: fixed helm by @capri-xiyue in #1522
- Perf updates by @kfswain in #1523
- fix serve multiple genai models md file by @learner0810 in #1527
- update guides docs to fix miss guide by @Frapschen in #1532
- minor updates and godoc to weighted random picker by @nirrozenbaum in #1514
- follow up - improving logging perf issues in few more places by @nirrozenbaum in #1528
- Fixing test flake by @kfswain in #1534
- Update vllm image version for CPU deployment by @rahulgurnani in #1526
- adding elevran as code reviewer instead of shaneutt by @nirrozenbaum in #1533
- Update helm chart Readme with custom plugin config by @rahulgurnani in #1516
- Bumps k8s.io Deps to v0.34.0 by @danehans in #1537
- Helm fix by @nirrozenbaum in #1540
- Make apiVersion configurable for inferencePool in the helm charts by @rahulgurnani in #1542
- Update guide for better clarity and to avoid errors by @rlakhtakia in #1475
- fix helm chart support for gke v1alpha2. by @zetxqx in #1551
- chore: bump sim model server version by @nirrozenbaum in #1555
- remove duplicated section in quickstart guide by @nirrozenbaum in #1553
- Merge shuffle score pods logic by @learner0810 in #1552
- Update getting started guide with 1.0 release by @rahulgurnani in #1557
- Added envoy proxy ai-gateway by @learner0810 in #1554
- fix-import-groups by @learner0810 in #1560
- chore(deps): bump golang.org/x/sync from 0.16.0 to 0.17.0 by @dependabot[bot] in #1549
- chore(deps): bump sigs.k8s.io/controller-tools from 0.18.0 to 0.19.0 by @dependabot[bot] in #1548
- fix flake in weighted random picker by @nirrozenbaum in #1561
- Main uniquely name crbac by @Gregory-Pereira in #1564
- Update priority in EPP flow control from uint to int by @rahulgurnani in #1518
- rename inference_model metrics to inference_objective by @JeffLuoo in #1567
- add a hold label when PRs are pushed to branch other than main by @nirrozenbaum in #1570
- Refactor LLMRequest: Structured RequestData for Completions & Chat-Completions by @vMaroon in #1446
- epp servicemonitor by @sallyom in #1425
- remove scheduler epp flowchart by @kaushikmitr in #1573
- Fixes to overview.md and inferencemodel.md by @DamianSawicki in https://github.com/...
v1.1.0-rc.1
New and noteworthy
This release is primarily focused on sharing and enabling users to try our experimental features we are developing:
-
Flow Control is available as an experimental feature! To enable include
ENABLE_EXPERIMENTAL_FLOW_CONTROL_LAYERas an env var, set totrue(this can be done from the helm chart). Docs are WIP and soon coming! -
Multi-port support is available with GW implementations that also support this. This enables sophisticated features like Wide EP. GW providers support forthcoming.
-
Multi-Cluster support the API surface has been extended to experimentally support multi-cluster support. Docs are WIP and coming soon!
What's Changed
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.24.0 to 2.25.1 by @dependabot[bot] in #1467
- adding warning labels to site while we update docs by @kfswain in #1466
- removed datastore dependency from saturation detector by @nirrozenbaum in #1293
- Add initial troubleshooting guide by @nicolexin in #1430
- fix: Pin to vllm v0.8.5 by @capri-xiyue in #1453
- chore(deps): bump google.golang.org/protobuf from 1.36.7 to 1.36.8 by @dependabot[bot] in #1471
- Updating proposal statuses by @kfswain in #1472
- chore(deps): bump google.golang.org/grpc from 1.74.2 to 1.75.0 by @dependabot[bot] in #1468
- fix(conformance): remove the inferenceObjective dependency compeletely. by @zetxqx in #1477
- chore(deps): bump github.com/onsi/gomega from 1.38.0 to 1.38.1 by @dependabot[bot] in #1469
- chore(deps): bump github.com/stretchr/testify from 1.10.0 to 1.11.0 by @dependabot[bot] in #1470
- Add Alibaba Cloud ack-gie conformance report for v0.5.1 by @delavet in #1478
- pin vllm gpu image by @nirrozenbaum in #1479
- [docs] Updating the FAQ by @kfswain in #1474
- Fix troubleshooting guide by @nicolexin in #1485
- Update provider name in helm chart for GKE to be not case sensitive by @rahulgurnani in #1486
- refactor(registry): Replace event-driven GC with a lease-based lifecycle by @LukeAVanDrie in #1476
- use context in indexer go routine instead of context.TODO by @nirrozenbaum in #1491
- cleanup: make port definitions symmetric and clean endpointPickerRef.port by @capri-xiyue in #1484
- Adding conformance report for Kubvernor by @dawid-nowak in #1316
- fix(conformance) add endpointConfig port in conformance test for v1 as it's required for service kind now. by @zetxqx in #1499
- feat(conformance): Use CRD annotation to populate the ConformanceReport GatewayAPIInferenceExtensionVersion by @zetxqx in #1214
- if request id was not supplied in header, generate uuid by @nirrozenbaum in #1490
- prefix state temp fix - write state to both plugin state and cycle state by @nirrozenbaum in #1509
- Fix(datastore): Correct inverted log messages in podResyncAll by @fyuan1316 in #1511
- Add WeightedRandomPicker by @Jooho in #1412
- allow setting custom plugins file through helm by @nirrozenbaum in #1508
- feat(helm): add affinity and tolerations to epp-deployment by @hhk7734 in #1504
- typo: add vLLM Prefix Cache & LoRA Adapters links by @zhengkezhou1 in #1280
- docs: update BBR guide by @chewong in #1517
- Docs: updated docs to include network service api enable for gke by @capri-xiyue in #1435
- cleanup modelName from inferenceObjective. by @zetxqx in #1521
- fix: fixed helm by @capri-xiyue in #1522
- Perf updates by @kfswain in #1523
- fix serve multiple genai models md file by @learner0810 in #1527
- update guides docs to fix miss guide by @Frapschen in #1532
- minor updates and godoc to weighted random picker by @nirrozenbaum in #1514
- follow up - improving logging perf issues in few more places by @nirrozenbaum in #1528
- Fixing test flake by @kfswain in #1534
- Update vllm image version for CPU deployment by @rahulgurnani in #1526
- adding elevran as code reviewer instead of shaneutt by @nirrozenbaum in #1533
- Update helm chart Readme with custom plugin config by @rahulgurnani in #1516
- Bumps k8s.io Deps to v0.34.0 by @danehans in #1537
- Helm fix by @nirrozenbaum in #1540
- Make apiVersion configurable for inferencePool in the helm charts by @rahulgurnani in #1542
- Update guide for better clarity and to avoid errors by @rlakhtakia in #1475
- fix helm chart support for gke v1alpha2. by @zetxqx in #1551
- chore: bump sim model server version by @nirrozenbaum in #1555
- remove duplicated section in quickstart guide by @nirrozenbaum in #1553
- Merge shuffle score pods logic by @learner0810 in #1552
- Update getting started guide with 1.0 release by @rahulgurnani in #1557
- Added envoy proxy ai-gateway by @learner0810 in #1554
- fix-import-groups by @learner0810 in #1560
- chore(deps): bump golang.org/x/sync from 0.16.0 to 0.17.0 by @dependabot[bot] in #1549
- chore(deps): bump sigs.k8s.io/controller-tools from 0.18.0 to 0.19.0 by @dependabot[bot] in #1548
- fix flake in weighted random picker by @nirrozenbaum in #1561
- Main uniquely name crbac by @Gregory-Pereira in #1564
- Update priority in EPP flow control from uint to int by @rahulgurnani in #1518
- rename inference_model metrics to inference_objective by @JeffLuoo in #1567
- add a hold label when PRs are pushed to branch other than main by @nirrozenbaum in #1570
- Refactor LLMRequest: Structured RequestData for Completions & Chat-Completions by @vMaroon in #1446
- epp servicemonitor by @sallyom in #1425
- remove scheduler epp flowchart by @kaushikmitr in #1573
- Fixes to overview.md and inferencemodel.md by @DamianSawicki in h...
v1.0.2
What's Changed
- Dep: Bumps Gateway API to v1.4.0 by @danehans in #1707
- Fix LabelSelector validation markers for map field by @danehans in #1717
- Fix Minor Issues in Release Tooling by @danehans in #1724
- Backport: PR #1723 by @danehans in #1725
- Updates gateway manifests for v1 InferencePool (#1603) by @danehans in #1732
- Updates artifacts for v1.0.2 release by @danehans in #1735
Full Changelog: v1.0.1...v1.0.2
v1.0.1
What's Changed
Bug fixes to helm charts, no changes in EPP image or IGW APIs
Full Changelog: v1.0.0...v1.0.1
v1.0.1-rc.1
This is a small patch release to fix helm issues.
Context: #1616
v1.0.0
Inference Gateway v1
This release marks the v1 of Inference Gateway, and with it the promotion of the InferencePool CRD to v1.
We're excited to announce our v1 release of Inference Gateway! A huge thank you to our contributors, gateway implementers, and downstream community for helping to shape IGW into something we are proud of.
If you're new: Please take a look at our guide to get started! Or learn more about IGW here: https://gateway-api-inference-extension.sigs.k8s.io/
There is still much to do and more enhancements to come. Namely:
- SLO-based predictive scheduling
- Flow Control for multi-tenancy support
- An improved pluggable Data Layer system
- Multi-modal support
- APIs to support meeting multiple different SLOs in a single InferencePool
We look forward to what's next in the Inference space and looking forward to continuing to grow with it.
Onwards!
Cheers,
The IGW maintainer team
What's Changed
- chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #1160
- feat: Introduce pluggable queue framework by @LukeAVanDrie in #1138
- removed USE_STREAMING env var from conformance + tests by @nirrozenbaum in #1157
- Conformance: Fixes the EPP ConfigMap Namespace by @danehans in #1166
- feat: Introduce pluggable intra-flow dispatch policy framework by @LukeAVanDrie in #1139
- Add support for plugin configuration in the InferencePool helm chart by @ahg-g in #1168
- feat(epp): use kebab-cased flags for epp by @Xunzhuo in #1177
- chore: remove duplicated import for code polish by @Xunzhuo in #1179
- Add documentation for the new Configuration via text feature by @shmuelk in #1110
- fix: set epp image tag when releasing by @Xunzhuo in #1182
- feat: Introduce pluggable inter-flow dispatch policy framework by @LukeAVanDrie in #1167
- Update istio release by @LiorLieberman in #1186
- test: kubectl-validate manifests in presubmit by @chewong in #1083
- Delete the unnecessary Marshal of processRequestBody by @whzghb in #1127
- feat(flowcontrol): Introduce ManagedQueue and Service Contracts by @LukeAVanDrie in #1174
- (feat) initial types and interfaces for pluggable data layer by @elevran in #1154
- Fix a regression in prefix plugin which can cause data race by @liu-cong in #1188
- feat: generate crd with version annotation. by @zetxqx in #1134
- chore: update vllm deployment tag to latest by @Xunzhuo in #1184
- moved build details to version package by @nirrozenbaum in #1185
- Add an "Implementing a Compatible Data Plane" section to the implementers guide by @AndresGuedez in #1143
- feat(flowcontrol): Implement registry shard by @LukeAVanDrie in #1187
- feat(flowcontrol): refine types and consolidate docs by @LukeAVanDrie in #1191
- docs: update to use kebab-cased flags changed at #1177 by @nekomeowww in #1193
- added graceful shutdown when scheduler config is not initialized by @nirrozenbaum in #1198
- feat: move x-k8s to apix and add v1 InferencePool to api/v1 by @capri-xiyue in #1116
- feat: Change epp and conformance to use v1 type InferencePool by @capri-xiyue in #1118
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1200
- Enhanced InferencePool Chart Configurability by @vMaroon in #1211
- refactor(flowcontrol): Enable behavioral mocking by @LukeAVanDrie in #1202
- random endpoint pick on tie break in max score picker by @nirrozenbaum in #1205
- removed cmd/registry file by @nirrozenbaum in #1206
- Support scraping metrics from target running with TLS by @pierDipi in #1190
- gke-gateway v0.5.0 conformance test report 9/9 by @zetxqx in #1005
- added join slack badge to readme by @nirrozenbaum in #1218
- chore: 🔨 Use the v0.3.0 llm-d-inference-sim image tag. by @yafengio in #1140
- style: ✨ optimize import order and more readable. by @yafengio in #1220
- Remove TODO stubs from website by @sats-23 in #1221
- docs: update whole repo to v1 inferencepool by @capri-xiyue in #1213
- release issue template: updated the tag command to include the -s for signing the tag by @nirrozenbaum in #1196
- fix try it out section in quickstart by @nirrozenbaum in #1197
- Do not log potentially sensitive data below DEBUG log level by @pierDipi in #1192
- Update index.md with gateway-inference-extension slack by @LiorLieberman in #1225
- Add fallback logic to support multiple endpoints by @rlakhtakia in #1122
- chore: 🔨 add fmt-imports tool for import order. by @yafengio in #1228
- fix: missing permission to list inference.networking.k8s.io/v1/inferencepool by @nekomeowww in #1230
- fix: Make test iter deterministic to fix flake by @LukeAVanDrie in #1231
- feat(flowcontrol): Implement ShardProcessor engine by @LukeAVanDrie in #1203
- Add a set of configuration defaults by @shmuelk in #1223
- Proposing the successor to the InferenceModel API by @kfswain in #1199
- cleanup of unused fields and functions by @nirrozenbaum in #1233
- chore: update CRD BundleVersion to main-dev by @zetxqx in #1216
- Change String() to accept a value reciever. by @elevran in #1239
- renamed kvcache-scorer to kvcache-utilization-scorer by @nirrozenbaum in #1238
- Add unit tests by @elevran in #1195
- test-report: istio 1.28-alpha v0.4.0 & v0.5.0 report 9/9 by @aslakknutsen in #1102
- added scheduler config logging on bootstrap by @nirrozenbaum in #1247
- fix: updated to v1 inferencepool in manifests by @capri-xiyue in #1248
- chore(deps): bump github.com/onsi/gomega from 1.37.0 to 1.38.0 by @dependabot[bot] in #1253
- chore(deps): bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 by @dependabot[bot] in #1251
- chore(deps): bump google.golang.org/grpc from 1.73.0...
v1.0.0-rc.4
a list of PRs that are cherry picked into RC4:
CRD updates:
performance issues fixed in pickers:
helm chart fix:
bug fix in prefix when no request id header is supplied by the gateway:
#1490 (was on the original list but somehow missed, without this prefix cache won't work in bursty workload)
test flake fix, required for llm-d to use formal image of IGW:
** all the items in this list have been cherry picked successfully into the release branch.
v1.0.0-rc.3
v1.0.0-rc.2
This release is primarily updating the InferencePool API and Conformance tests after the completion of the API review conducted in this PR: #1173
NOTE: Barring any breaking change after this RC the APIs are considered frozen for the remainder of the v1.0 release cycle
v1.0.0-rc.1
What's Changed
- chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by @dependabot[bot] in #1160
- feat: Introduce pluggable queue framework by @LukeAVanDrie in #1138
- removed USE_STREAMING env var from conformance + tests by @nirrozenbaum in #1157
- Conformance: Fixes the EPP ConfigMap Namespace by @danehans in #1166
- feat: Introduce pluggable intra-flow dispatch policy framework by @LukeAVanDrie in #1139
- Add support for plugin configuration in the InferencePool helm chart by @ahg-g in #1168
- feat(epp): use kebab-cased flags for epp by @Xunzhuo in #1177
- chore: remove duplicated import for code polish by @Xunzhuo in #1179
- Add documentation for the new Configuration via text feature by @shmuelk in #1110
- fix: set epp image tag when releasing by @Xunzhuo in #1182
- feat: Introduce pluggable inter-flow dispatch policy framework by @LukeAVanDrie in #1167
- Update istio release by @LiorLieberman in #1186
- test: kubectl-validate manifests in presubmit by @chewong in #1083
- Delete the unnecessary Marshal of processRequestBody by @whzghb in #1127
- feat(flowcontrol): Introduce ManagedQueue and Service Contracts by @LukeAVanDrie in #1174
- (feat) initial types and interfaces for pluggable data layer by @elevran in #1154
- Fix a regression in prefix plugin which can cause data race by @liu-cong in #1188
- feat: generate crd with version annotation. by @zetxqx in #1134
- chore: update vllm deployment tag to latest by @Xunzhuo in #1184
- moved build details to version package by @nirrozenbaum in #1185
- Add an "Implementing a Compatible Data Plane" section to the implementers guide by @AndresGuedez in #1143
- feat(flowcontrol): Implement registry shard by @LukeAVanDrie in #1187
- feat(flowcontrol): refine types and consolidate docs by @LukeAVanDrie in #1191
- docs: update to use kebab-cased flags changed at #1177 by @nekomeowww in #1193
- added graceful shutdown when scheduler config is not initialized by @nirrozenbaum in #1198
- feat: move x-k8s to apix and add v1 InferencePool to api/v1 by @capri-xiyue in #1116
- feat: Change epp and conformance to use v1 type InferencePool by @capri-xiyue in #1118
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1200
- Enhanced InferencePool Chart Configurability by @vMaroon in #1211
- refactor(flowcontrol): Enable behavioral mocking by @LukeAVanDrie in #1202
- random endpoint pick on tie break in max score picker by @nirrozenbaum in #1205
- removed cmd/registry file by @nirrozenbaum in #1206
- Support scraping metrics from target running with TLS by @pierDipi in #1190
- gke-gateway v0.5.0 conformance test report 9/9 by @zetxqx in #1005
- added join slack badge to readme by @nirrozenbaum in #1218
- chore: 🔨 Use the v0.3.0 llm-d-inference-sim image tag. by @yafengio in #1140
- style: ✨ optimize import order and more readable. by @yafengio in #1220
- Remove TODO stubs from website by @sats-23 in #1221
- docs: update whole repo to v1 inferencepool by @capri-xiyue in #1213
- release issue template: updated the tag command to include the -s for signing the tag by @nirrozenbaum in #1196
- fix try it out section in quickstart by @nirrozenbaum in #1197
- Do not log potentially sensitive data below DEBUG log level by @pierDipi in #1192
- Update index.md with gateway-inference-extension slack by @LiorLieberman in #1225
- Add fallback logic to support multiple endpoints by @rlakhtakia in #1122
- chore: 🔨 add fmt-imports tool for import order. by @yafengio in #1228
- fix: missing permission to list inference.networking.k8s.io/v1/inferencepool by @nekomeowww in #1230
- fix: Make test iter deterministic to fix flake by @LukeAVanDrie in #1231
- feat(flowcontrol): Implement ShardProcessor engine by @LukeAVanDrie in #1203
- Add a set of configuration defaults by @shmuelk in #1223
- Proposing the successor to the InferenceModel API by @kfswain in #1199
- cleanup of unused fields and functions by @nirrozenbaum in #1233
- chore: update CRD BundleVersion to main-dev by @zetxqx in #1216
- Change String() to accept a value reciever. by @elevran in #1239
- renamed kvcache-scorer to kvcache-utilization-scorer by @nirrozenbaum in #1238
- Add unit tests by @elevran in #1195
- test-report: istio 1.28-alpha v0.4.0 & v0.5.0 report 9/9 by @aslakknutsen in #1102
- added scheduler config logging on bootstrap by @nirrozenbaum in #1247
- fix: updated to v1 inferencepool in manifests by @capri-xiyue in #1248
- chore(deps): bump github.com/onsi/gomega from 1.37.0 to 1.38.0 by @dependabot[bot] in #1253
- chore(deps): bump sigs.k8s.io/yaml from 1.5.0 to 1.6.0 by @dependabot[bot] in #1251
- chore(deps): bump google.golang.org/grpc from 1.73.0 to 1.74.2 by @dependabot[bot] in #1252
- Update the Endpoint Picker Protocol with a new metadata field that communicates status associated with picked endpoints by @AndresGuedez in #1226
- chore(deps): bump sigs.k8s.io/controller-tools from 0.17.3 to 0.18.0 by @dependabot[bot] in #1254
- Update golangci lint to v2.x by @elevran in #1256
- Add nightly benchmarking documentation by @kaushikmitr in #1234
- normalize score to make sure it is always in the range of [0,1] by @nirrozenbaum in #1236
- updated metrics and logging for plugins by @nirrozenbaum in https://github....