Skip to content

Conversation

@james-crabtree-sp
Copy link

Prior to this fix, I observed that each pod was materializing every file in the provided list

Defaulted container "process" out of: process, init-hostfile (init)
/usr/local/lib/python3.9/site-packages/snowflake/connector/vendored/requests/__init__.py:119: DeprecationWarning: 'urllib3.contrib.pyopenssl' module is deprecated and will be removed in a future release of urllib3 2.x. Read more in this issue: https://github.com/urllib3/urllib3/issues/2680
  from ..urllib3.contrib import pyopenssl
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/411ccb09-c2f0-402d-beab-a1abc4b34a6f/temporary_527093994d934016ae816b5d90df4f73_0_3_0.snappy.parquet
100%|██████████| 131072/131072 [01:05<00:00, 1998.94it/s]s]
100%|██████████| 131072/131072 [01:05<00:00, 2001.73it/s]s]
100%|██████████| 131072/131072 [01:07<00:00, 1954.31it/s]s]
100%|██████████| 94674/94674 [00:45<00:00, 2061.51it/s]
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/411ccb09-c2f0-402d-beab-a1abc4b34a6f/temporary_527093994d934016ae816b5d90df4f73_0_0_0.snappy.parquet
100%|██████████| 131072/131072 [01:06<00:00, 1980.39it/s]
100%|██████████| 131072/131072 [01:05<00:00, 1998.63it/s]
100%|██████████| 131072/131072 [01:07<00:00, 1948.74it/s]
100%|██████████| 95234/95234 [00:47<00:00, 2005.39it/s]
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/411ccb09-c2f0-402d-beab-a1abc4b34a6f/temporary_527093994d934016ae816b5d90df4f73_0_1_0.snappy.parquet
100%|██████████| 131072/131072 [01:06<00:00, 1980.13it/s]
100%|██████████| 131072/131072 [01:09<00:00, 1894.30it/s]s]
100%|██████████| 131072/131072 [01:06<00:00, 1980.35it/s]
100%|██████████| 95008/95008 [00:47<00:00, 2017.95it/s]
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/411ccb09-c2f0-402d-beab-a1abc4b34a6f/temporary_527093994d934016ae816b5d90df4f73_0_2_0.snappy.parquet
100%|██████████| 131072/131072 [01:05<00:00, 1998.41it/s]]
100%|██████████| 131072/131072 [01:05<00:00, 1986.44it/s]
100%|██████████| 131072/131072 [01:04<00:00, 2035.47it/s]
100%|██████████| 95709/95709 [00:47<00:00, 2009.30it/s]

After this change. Each pod only materializes a single file

09/19/2023 03:48:49 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: Logging output for entitlement_attributes pod 0
09/19/2023 03:48:50 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: /usr/local/lib/python3.9/site-packages/snowflake/connector/vendored/requests/__init__.py:119: DeprecationWarning: 'urllib3.contrib.pyopenssl' module is deprecated and will be removed in a future release of urllib3 2.x. Read more in this issue: https://github.com/urllib3/urllib3/issues/2680
  from ..urllib3.contrib import pyopenssl
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/81820a3a-ee53-4b85-97bf-c194e6ea6e2c/temporary_913009fdf1d54d31a02ace0a6bf167aa_0_2_0.snappy.parquet
100%|██████████| 131072/131072 [01:19<00:00, 1647.29it/s]
100%|██████████| 131072/131072 [01:16<00:00, 1703.57it/s]
100%|██████████| 131072/131072 [01:15<00:00, 1742.86it/s]
100%|██████████| 95008/95008 [00:51<00:00, 1849.29it/s]

09/19/2023 03:48:50 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: Logging output for entitlement_attributes pod 1
09/19/2023 03:48:50 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: /usr/local/lib/python3.9/site-packages/snowflake/connector/vendored/requests/__init__.py:119: DeprecationWarning: 'urllib3.contrib.pyopenssl' module is deprecated and will be removed in a future release of urllib3 2.x. Read more in this issue: https://github.com/urllib3/urllib3/issues/2680
  from ..urllib3.contrib import pyopenssl
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/81820a3a-ee53-4b85-97bf-c194e6ea6e2c/temporary_913009fdf1d54d31a02ace0a6bf167aa_0_3_0.snappy.parquet
100%|██████████| 131072/131072 [01:15<00:00, 1734.00it/s]]
100%|██████████| 131072/131072 [01:14<00:00, 1761.92it/s]
100%|██████████| 131072/131072 [01:13<00:00, 1777.00it/s]]
100%|██████████| 95234/95234 [00:49<00:00, 1914.79it/s]

09/19/2023 03:48:50 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: Logging output for entitlement_attributes pod 2
09/19/2023 03:48:50 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: /usr/local/lib/python3.9/site-packages/snowflake/connector/vendored/requests/__init__.py:119: DeprecationWarning: 'urllib3.contrib.pyopenssl' module is deprecated and will be removed in a future release of urllib3 2.x. Read more in this issue: https://github.com/urllib3/urllib3/issues/2680
  from ..urllib3.contrib import pyopenssl
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/81820a3a-ee53-4b85-97bf-c194e6ea6e2c/temporary_913009fdf1d54d31a02ace0a6bf167aa_0_1_0.snappy.parquet
100%|██████████| 131072/131072 [01:27<00:00, 1500.80it/s]
100%|██████████| 131072/131072 [01:22<00:00, 1582.73it/s]]
100%|██████████| 131072/131072 [01:21<00:00, 1599.95it/s]
100%|██████████| 94674/94674 [00:55<00:00, 1706.64it/s]s]

09/19/2023 03:48:50 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: Logging output for entitlement_attributes pod 3
09/19/2023 03:48:50 PM feast.infra.materialization.contrib.bytewax.bytewax_materialization_engine INFO: /usr/local/lib/python3.9/site-packages/snowflake/connector/vendored/requests/__init__.py:119: DeprecationWarning: 'urllib3.contrib.pyopenssl' module is deprecated and will be removed in a future release of urllib3 2.x. Read more in this issue: https://github.com/urllib3/urllib3/issues/2680
  from ..urllib3.contrib import pyopenssl
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Processing path s3://odin-dev-intermediate/pipeline/features/feast_odin_dev/81820a3a-ee53-4b85-97bf-c194e6ea6e2c/temporary_913009fdf1d54d31a02ace0a6bf167aa_0_0_0.snappy.parquet
100%|██████████| 131072/131072 [01:14<00:00, 1769.42it/s]]
100%|██████████| 131072/131072 [01:11<00:00, 1842.88it/s]s]
100%|██████████| 131072/131072 [01:10<00:00, 1847.69it/s]]
100%|██████████| 95709/95709 [00:50<00:00, 1886.36it/s]

@james-crabtree-sp james-crabtree-sp requested a review from a team September 19, 2023 21:14
@james-crabtree-sp james-crabtree-sp changed the title James.crabtree/saasmlops 809 SAASMLOPS-809 fix bytewax workers so they only process a single file Sep 19, 2023
@james-crabtree-sp james-crabtree-sp merged commit 3c22d90 into develop Sep 20, 2023
@james-crabtree-sp james-crabtree-sp deleted the james.crabtree/SAASMLOPS-809 branch September 20, 2023 15:14
james-crabtree-sp added a commit that referenced this pull request Oct 6, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines
james-crabtree-sp added a commit that referenced this pull request Oct 6, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
james-crabtree-sp added a commit that referenced this pull request Oct 6, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
james-crabtree-sp added a commit that referenced this pull request Oct 6, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
james-crabtree-sp added a commit that referenced this pull request Oct 6, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
james-crabtree-sp added a commit that referenced this pull request Oct 9, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
james-crabtree-sp added a commit that referenced this pull request Oct 9, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
james-crabtree-sp added a commit that referenced this pull request Oct 23, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
james-crabtree-sp added a commit that referenced this pull request Oct 23, 2023
)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
alex-vinnik-sp pushed a commit that referenced this pull request Jan 17, 2024
…rialization timestamp updates (feast-dev#3789)

* SAASMLOPS-767 wait for jobs to complete

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* SAASMLOPS-805 Stopgap change to fix duplicate materialization of data

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* SAASMLOPS-805 save BYTEWAX_REPLICAS=1

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* SAASMLOPS-809 fix bytewax workers so they only process a single file (#6)

* SAASMLOPS-809 fix bytewax workers so they only process a single file

* SAASMLOPS-809 fix newlines

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* SAASMLOPS-833 add configurable job timeout (#7)

* SAASMLOPS-833 add configurable job timeout

* SAASMLOPS-833 fix whitespace

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* develop Run large materializations in batches of pods

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* master Set job_batch_size at least equal to max_parallelism

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* master clarity max_parallelism description

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* master resolve bug that causes materialization to continue after job error

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

* master resolve bug causing pod logs to not be printed

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>

---------

Signed-off-by: James Crabtree <james.crabtree@sailpoint.com>
alex-vinnik-sp pushed a commit that referenced this pull request Jan 17, 2024
# [0.35.0](feast-dev/feast@v0.34.0...v0.35.0) (2024-01-13)

### Bug Fixes

* Add async refresh to prevent synchronous refresh in main thread ([feast-dev#3812](feast-dev#3812)) ([9583ed6](feast-dev@9583ed6))
* Adopt connection pooling for HBase ([feast-dev#3793](feast-dev#3793)) ([b3852bf](feast-dev@b3852bf))
* Bytewax engine create configmap from object ([feast-dev#3821](feast-dev#3821)) ([25e9775](feast-dev@25e9775))
* Fix warnings from deprecated paths and update default log level ([feast-dev#3757](feast-dev#3757)) ([68a8737](feast-dev@68a8737))
* improve parsing bytewax job status ([5983f40](feast-dev@5983f40))
* make bytewax settings unexposed ([ae1bb8b](feast-dev@ae1bb8b))
* Make generated temp table name escaped ([feast-dev#3797](feast-dev#3797)) ([175d796](feast-dev@175d796))
* Pin numpy version to avoid spammy deprecation messages ([774ed33](feast-dev@774ed33))
* Redundant feature materialization and premature incremental materialization timestamp updates ([feast-dev#3789](feast-dev#3789)) ([417b16b](feast-dev@417b16b)), closes [#6](feast-dev#6) [#7](feast-dev#7)
* Resolve hbase hotspot issue when materializing ([feast-dev#3790](feast-dev#3790)) ([7376db8](feast-dev@7376db8))
* Set keepalives_idle None by default ([feast-dev#3756](feast-dev#3756)) ([8717e9b](feast-dev@8717e9b))
* Set upper bound for bigquery client due to its breaking changes ([2151c39](feast-dev@2151c39))
* UI project cannot handle fallback routes ([feast-dev#3766](feast-dev#3766)) ([96ece0f](feast-dev@96ece0f))
* update dependencies versions due to conflicts ([5dc0b24](feast-dev@5dc0b24))
* Update jackson and remove unnecessary logging ([feast-dev#3809](feast-dev#3809)) ([018d0ea](feast-dev@018d0ea))
* upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. ([052182b](feast-dev@052182b))

### Features

* Add get online feature rpc to gprc server ([feast-dev#3815](feast-dev#3815)) ([01db8cc](feast-dev@01db8cc))
* Add materialize and materialize-incremental rest endpoints ([feast-dev#3761](feast-dev#3761)) ([fa600fe](feast-dev@fa600fe)), closes [feast-dev#3760](feast-dev#3760)
* add redis sentinel support ([3387a15](feast-dev@3387a15))
* add redis sentinel support ([4337c89](feast-dev@4337c89))
* add redis sentinel support format lint ([aad8718](feast-dev@aad8718))
* Add support for `table_create_disposition` in bigquery job for offline store ([feast-dev#3762](feast-dev#3762)) ([6a728fe](feast-dev@6a728fe))
* Add support for in_cluster config and additional labels for bytewax materialization ([feast-dev#3754](feast-dev#3754)) ([2192e65](feast-dev@2192e65))
* Apply cache to load proto registry for performance ([feast-dev#3702](feast-dev#3702)) ([709c709](feast-dev@709c709))
* Make bytewax job write as mini-batches ([feast-dev#3777](feast-dev#3777)) ([9b0e5ce](feast-dev@9b0e5ce))
* Optimize bytewax pod resource with zero-copy ([9cf9d96](feast-dev@9cf9d96))
* Support GCS filesystem for bytewax engine ([feast-dev#3774](feast-dev#3774)) ([fb6b807](feast-dev@fb6b807))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants