Skip to content

Conversation

@astronautas
Copy link
Contributor

What this PR does / why we need it:

We noticed that for some of our larger get_historical_features jobs, we started getting:

[2025-12-19, 14:16:36 UTC] {pod_manager.py:454} INFO - [base] OperationalError: Error
[2025-12-19, 14:16:36 UTC] {pod_manager.py:454} INFO - [base] HTTPConnectionPool(host='cluster-clickhouse-ml-main.ml.svc.cluster.local',
[2025-12-19, 14:16:36 UTC] {pod_manager.py:454} INFO - [base] port=8123): Read timed out. (read timeout=300) executing HTTP request attempt 1

in essence, we need to be able to control some of the client-side timeouts within the clickhouse client used by the feature store. PR introduces such functionality.

Which issue(s) this PR fixes:

(inline issue within this PR)

@astronautas astronautas requested a review from a team as a code owner December 22, 2025 10:32
@astronautas
Copy link
Contributor Author

astronautas commented Dec 22, 2025

@franciscojavierarceo @ntkathole A small improvement to Clickhouse offline store.

@astronautas astronautas force-pushed the fix/control-clickhouse-offline-store-client-timeouts branch 2 times, most recently from a9c9902 to 508bf0f Compare December 22, 2025 10:44
@astronautas
Copy link
Contributor Author

astronautas commented Dec 22, 2025

I'll additionally test how this more generic additional_args field works with feature_store.yaml config 🕐 . Will let you know in this thread @ntkathole .

@astronautas
Copy link
Contributor Author

astronautas commented Dec 22, 2025

@ntkathole All ready from my end!

@astronautas
Copy link
Contributor Author

astronautas commented Dec 22, 2025

@ntkathole Any ideas?

Run make test-python-integration
python -m pytest --tb=short -v -n 8 --integration --color=yes --durations=10 --timeout=1200 --timeout_method=thread --dist loadgroup \
	-k "(not snowflake or not test_historical_features_main)" \
	-m "not rbac_remote_integration_test" \
	--log-cli-level=INFO -s \
	sdk/python/tests
/opt/hostedtoolcache/Python/3.11.14/x64/bin/python: No module named pytest
make: *** [Makefile:157: test-python-integration] Error 1
Error: Process completed with exit code 2.

can we retry? Unrelated to PR, it seems :/ Caching?

@ntkathole
Copy link
Member

It seems related to aws creds, looking
Error: The security token included in the request is invalid.

Signed-off-by: lukas.valatka <lukas.valatka@cast.ai>
Signed-off-by: lukas.valatka <lukas.valatka@cast.ai>
Signed-off-by: lukas.valatka <lukas.valatka@cast.ai>
Signed-off-by: lukas.valatka <lukas.valatka@cast.ai>
@ntkathole ntkathole force-pushed the fix/control-clickhouse-offline-store-client-timeouts branch from 15c6d7a to a3b9977 Compare December 23, 2025 07:26
Copy link
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@ntkathole ntkathole merged commit 59dbb33 into feast-dev:master Dec 23, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants