Skip to content

Conversation

@CaoE
Copy link
Collaborator

@CaoE CaoE commented Aug 12, 2021

Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish and softplus on CPU, and optimize the performance of softshrink.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 12, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 2b830af (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-bionic-rocm4.5-py3.7 / test (distributed, 1, 1, linux.rocm.gpu) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2022-03-02T13:06:04.3339840Z RuntimeError: hello
2022-03-02T13:06:04.3329514Z -- Process 0 terminated with the following error:
2022-03-02T13:06:04.3330263Z Traceback (most recent call last):
2022-03-02T13:06:04.3331744Z   File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
2022-03-02T13:06:04.3332610Z     fn(i, *args)
2022-03-02T13:06:04.3333939Z   File "/opt/conda/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 369, in _wrap
2022-03-02T13:06:04.3334910Z     ret = record(fn)(*args_)
2022-03-02T13:06:04.3336334Z   File "/opt/conda/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
2022-03-02T13:06:04.3337342Z     return f(*args, **kwargs)
2022-03-02T13:06:04.3338273Z   File "/var/lib/jenkins/pytorch/test/distributed/elastic/multiprocessing/api_test.py", line 138, in echo2
2022-03-02T13:06:04.3339204Z     raise RuntimeError(msg)
2022-03-02T13:06:04.3339840Z RuntimeError: hello
2022-03-02T13:06:04.3340195Z 
2022-03-02T13:06:04.3647809Z ok (3.199s)
2022-03-02T13:06:04.3670267Z   test_function_with_tensor (__main__.StartProcessesTest) ... ok (0.002s)
2022-03-02T13:06:04.3684477Z   test_invalid_log_dir (__main__.StartProcessesTest) ... ok (0.002s)
2022-03-02T13:06:04.3747779Z   test_multiprocess_context_close (__main__.StartProcessesTest) ... Closing process 2449 via signal SIGTERM
2022-03-02T13:06:04.3802488Z ok (0.011s)
2022-03-02T13:06:04.3841647Z   test_multiprocessing_context_poll_raises_exception (__main__.StartProcessesTest) ... failed (exitcode: -1) local_rank: 0 (pid: 123) of fn: echo0 (start_method: spawn)
2022-03-02T13:06:04.3842896Z Traceback (most recent call last):
2022-03-02T13:06:04.3844315Z   File "/opt/conda/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 453, in _poll
2022-03-02T13:06:04.3845564Z     self._pc.join(-1)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@CaoE CaoE force-pushed the bf16_nn4 branch 3 times, most recently from 1a97e8f to 06774a8 Compare August 13, 2021 05:27
@CaoE CaoE changed the title Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, and hardswish on CPU Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish, and softplus on CPU Aug 13, 2021
@VitalyFedyunin
Copy link
Contributor

Please add benchmarks comparing float and bfloat kernels for various input sizes.
Please rebase
Windows build error might be relevant (check after rebase)

@VitalyFedyunin VitalyFedyunin self-requested a review August 15, 2021 16:44
@astaff astaff added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 16, 2021
@CaoE
Copy link
Collaborator Author

CaoE commented Aug 18, 2021

Rebased @VitalyFedyunin.
Single core performance is tested on Xeon(R) Platinum 8180 @2.5 Ghz.

Screenshot (209)

Single socket (28 cores) performance is tested on Xeon(R) Platinum 8180 @2.5 Ghz.

Screenshot (210)

@CaoE CaoE force-pushed the bf16_nn4 branch 3 times, most recently from 343e903 to 55c7313 Compare August 27, 2021 02:39
@CaoE CaoE changed the title Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish, and softplus on CPU Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish, softplus and elu on CPU Aug 27, 2021
@codecov
Copy link

codecov bot commented Aug 27, 2021

Codecov Report

Merging #63134 (12cf17d) into master (feefc94) will increase coverage by 0.26%.
The diff coverage is n/a.

❗ Current head 12cf17d differs from pull request most recent head 1f7f3ab. Consider uploading reports for the commit 1f7f3ab to get more accurate results

@@            Coverage Diff             @@
##           master   #63134      +/-   ##
==========================================
+ Coverage   66.37%   66.64%   +0.26%     
==========================================
  Files         739      707      -32     
  Lines       94299    92346    -1953     
==========================================
- Hits        62595    61540    -1055     
+ Misses      31704    30806     -898     

@CaoE CaoE closed this Aug 31, 2021
@CaoE CaoE reopened this Sep 2, 2021
@CaoE CaoE force-pushed the bf16_nn4 branch 3 times, most recently from 423a004 to 12cf17d Compare September 7, 2021 07:05
@CaoE CaoE changed the title Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish, softplus and elu on CPU Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink, hardswish and softplus on CPU Sep 7, 2021
@CaoE CaoE force-pushed the bf16_nn4 branch 2 times, most recently from 9c7be87 to 52285d1 Compare September 27, 2021 08:13
@pytorch-probot
Copy link

pytorch-probot bot commented Oct 8, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/CaoE/pytorch/blob/1f7f3ab2577a13216c04a105f5790291f12c0158/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-binary-conda ciflow/binaries, ciflow/binaries/conda 🚫 skipped
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-manywheel ciflow/binaries, ciflow/binaries/wheel 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-bionic-py3.6-clang9 ciflow/xla 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@CaoE
Copy link
Collaborator Author

CaoE commented Oct 19, 2021

Hi @VitalyFedyunin
Thanks for your suggestions and explanation. PR is rebased, could you please review it?

@CaoE CaoE force-pushed the bf16_nn4 branch 2 times, most recently from 3bec7d6 to 4468b97 Compare November 15, 2021 06:18
@CaoE CaoE force-pushed the bf16_nn4 branch 2 times, most recently from 7985dde to daba39a Compare December 17, 2021 06:51
@CaoE CaoE force-pushed the bf16_nn4 branch 2 times, most recently from f1267f0 to 6698e91 Compare December 21, 2021 01:32
@CaoE
Copy link
Collaborator Author

CaoE commented Dec 28, 2021

Hi @VitalyFedyunin, could you please review it ? Thank you.

@pytorchbot
Copy link
Collaborator

Build started for merge commit.

@facebook-github-bot
Copy link
Contributor

@frank-wei has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@frank-wei frank-wei self-requested a review March 15, 2022 17:01
facebook-github-bot pushed a commit that referenced this pull request Apr 4, 2022
…ink, hardswish and softplus on CPU (#63134)

Summary:
Add BFloat16 support for logsigmoid, hardsigmoid, hardshrink, softshrink,  hardswish and softplus  on CPU,  and optimize the performance of softshrink.

Pull Request resolved: #63134

Reviewed By: yinghai

Differential Revision: D34897992

Pulled By: frank-wei

fbshipit-source-id: 4c778f5271d6fa54dd78158258941def8d9252f5
@frank-wei frank-wei added the intel This tag is for PR from Intel label Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed intel This tag is for PR from Intel open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants