Skip to content

Conversation

@nvcastet
Copy link
Collaborator

@nvcastet nvcastet commented Jan 23, 2023

Signed-off-by: Nicolas Castet ncastet@nvidia.com

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

Horovod was only checking for tf.keras.optimizers.Optimizer in DistributedOptimizer but supported optimizers are under tf.keras.optimizers.legacy.Optimizer for Keras 2.11+.

@nvcastet nvcastet requested a review from chongxiaoc January 23, 2023 16:54
@chongxiaoc
Copy link
Collaborator

There are some tests failing. For example:

=================================== FAILURES ===================================
_______________ Tf2KerasTests.test_partial_distributed_optimizer _______________

self = <test_tensorflow2_keras.Tf2KerasTests testMethod=test_partial_distributed_optimizer>

@chongxiaoc
Copy link
Collaborator

But I like this fix. Should land as soon as all tests pass.

@nvcastet
Copy link
Collaborator Author

Thanks @chongxiaoc Yes I need to update the different tests/examples. Since with Keras >= 2.11 we were not always using the legacy optimizers.

@github-actions
Copy link

github-actions bot commented Jan 23, 2023

Unit Test Results

0 files   -      977  0 suites   - 977   0s ⏱️ - 11h 47m 43s
0 tests  -      865  0 ✔️  -      754  0 💤  -    111  0 ±0 
0 runs   - 21 131  0 ✔️  - 14 810  0 💤  - 6 321  0 ±0 

Results for commit 7230379. ± Comparison against base commit 5266a29.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jan 23, 2023

Unit Test Results (with flaky tests)

0 files   -   1 164  0 suites   - 1 164   0s ⏱️ - 13h 10m 40s
0 tests  -      865  0 ✔️  -      753  0 💤  -    111  0  - 1 
0 runs   - 25 179  0 ✔️  - 17 397  0 💤  - 7 779  0  - 3 

Results for commit 7230379. ± Comparison against base commit 5266a29.

♻️ This comment has been updated with latest results.

@chongxiaoc
Copy link
Collaborator

some missing examples and tests:

tensorflow2_mnist_data_service_train_fn_compute_side_dispatcher.py
test/parallel/test_tensorflow2_keras.py
/horovod/examples/tensorflow2/tensorflow2_keras_mnist.py

@pjannaty
Copy link

Thank you both, @chongxiaoc and @nvcastet. This is an important fix for TensorFlow!

Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
@nvcastet
Copy link
Collaborator Author

@chongxiaoc Finally, we should be good.

Copy link
Collaborator

@chongxiaoc chongxiaoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for fixing it.

@chongxiaoc chongxiaoc merged commit 305280d into horovod:master Jan 30, 2023
@nvcastet nvcastet deleted the fix_new_keras_opts branch January 30, 2023 22:13
nvcastet added a commit to nvcastet/horovod that referenced this pull request Mar 10, 2023
…3822)"

This reverts commit 305280d.

Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
nvcastet added a commit to nvcastet/horovod that referenced this pull request Mar 24, 2023
…3822)"

This reverts commit 305280d.

Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
nvcastet added a commit to nvcastet/horovod that referenced this pull request Apr 4, 2023
…3822)"

This reverts commit 305280d.

Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
nvcastet added a commit to nvcastet/horovod that referenced this pull request Apr 17, 2023
…3822)"

This reverts commit 305280d.

Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants