Skip to content

Fix error during loss reduce in callback#469

Merged
SamitHuang merged 3 commits into
mindspore-lab:mainfrom
zhtmike:reduce_fix
Jul 1, 2023
Merged

Fix error during loss reduce in callback#469
SamitHuang merged 3 commits into
mindspore-lab:mainfrom
zhtmike:reduce_fix

Conversation

@zhtmike
Copy link
Copy Markdown
Collaborator

@zhtmike zhtmike commented Jun 30, 2023

Recent change on callback.py cause RuntimeError: Couldn't get correct hccl hcom with group hccl_world_group in MindSpore 1.10 in OpenI. Seems ops.ReduceSum must be compiled first. This is a fix.

Meanwhile this change give a better loss value report once the loss output is fp16.

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

Motivation

(Write your motivation for proposed changes here.)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

Comment thread mindocr/utils/misc.py Outdated
Co-authored-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com>
@SamitHuang SamitHuang merged commit 0ef002f into mindspore-lab:main Jul 1, 2023
@zhtmike zhtmike deleted the reduce_fix branch July 3, 2023 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants