Conversation
|
@sayakpaul please help me review. I fixed the per step time calculations which were wrong previously. Thanks! |
| where we shard the input batches over the TPU devices. | ||
|
|
||
| As of 9-11-2024, these are some expected step times. | ||
|
|
||
| | accelerator | global batch size | step time (seconds) | | ||
| | ----------- | ----------------- | --------- | | ||
| | v5p-128 | 1024 | 0.245 | | ||
| | v5p-256 | 2048 | 0.234 | | ||
| | v5p-512 | 4096 | 0.2498 | |
There was a problem hiding this comment.
Can we update the corrected numbers instead of deleting the table?
There was a problem hiding this comment.
It will take some time to collect these. Can we push these changes for now with an action item to add this table in the future?
There was a problem hiding this comment.
I won't prefer that. Since we already had these values and communicated about it, I would prefer if the PR indicated the changes. Cc: @tengomucho as well.
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
@sayakpaul I'm still working on this. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Fixes step time calculations for ptxla training script.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.