Skip to content

Commit 0053ee5

Browse files
committed
Update on "Fallback to CPU when remote end does not have CUDA for profiling"
server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. server * #44664 [RPC profiling] Extend RPC profiling to support async function execution over RPC. * #44655 [RPC profiling] Don't wrap toHere() calls with profiling * #44653 [RPC profiling] Allow disableProfiler() to be called from another thread. * #44646 Remove thread_local RecordFunctionGuard from profiler. A comment from @mrshenli on #44664 led us to the following concern: when enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. For testing, I forced it to return CUDA profiler state, and validated that it falls back. Not sure how to add a unittest given that we have single machine tests and the machine either has or doesn't have cuda. Differential Revision: [D23790729](https://our.internmc.facebook.com/intern/diff/D23790729/) [ghstack-poisoned]
2 parents 1e9d91e + f26efa6 commit 0053ee5

File tree

128 files changed

+4349
-1772
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+4349
-1772
lines changed

.circleci/config.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -924,7 +924,7 @@ jobs:
924924
smoke_mac_test:
925925
<<: *binary_linux_test_upload_params
926926
macos:
927-
xcode: "9.4.1"
927+
xcode: "11.2.1"
928928
steps:
929929
- checkout
930930
- run:
@@ -949,7 +949,7 @@ jobs:
949949
binary_mac_build:
950950
<<: *binary_mac_params
951951
macos:
952-
xcode: "9.4.1"
952+
xcode: "11.2.1"
953953
steps:
954954
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
955955
- checkout
@@ -1253,7 +1253,7 @@ jobs:
12531253
environment:
12541254
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
12551255
macos:
1256-
xcode: "9.4.1"
1256+
xcode: "11.2.1"
12571257
steps:
12581258
- checkout
12591259
- run_brew_for_macos_build
@@ -1287,7 +1287,7 @@ jobs:
12871287
environment:
12881288
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
12891289
macos:
1290-
xcode: "9.4.1"
1290+
xcode: "11.2.1"
12911291
steps:
12921292
- checkout
12931293
- attach_workspace:

.circleci/verbatim-sources/job-specs/binary-job-specs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@
135135
smoke_mac_test:
136136
<<: *binary_linux_test_upload_params
137137
macos:
138-
xcode: "9.4.1"
138+
xcode: "11.2.1"
139139
steps:
140140
- checkout
141141
- run:
@@ -160,7 +160,7 @@
160160
binary_mac_build:
161161
<<: *binary_mac_params
162162
macos:
163-
xcode: "9.4.1"
163+
xcode: "11.2.1"
164164
steps:
165165
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
166166
- checkout

.circleci/verbatim-sources/job-specs/job-specs-custom.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@
109109
environment:
110110
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build
111111
macos:
112-
xcode: "9.4.1"
112+
xcode: "11.2.1"
113113
steps:
114114
- checkout
115115
- run_brew_for_macos_build
@@ -143,7 +143,7 @@
143143
environment:
144144
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
145145
macos:
146-
xcode: "9.4.1"
146+
xcode: "11.2.1"
147147
steps:
148148
- checkout
149149
- attach_workspace:

.jenkins/caffe2/test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
171171
# default pip version is too old(9.0.2), unable to support tag `manylinux2010`.
172172
# Fix the pip error: Couldn't find a version that satisfies the requirement
173173
pip install --upgrade pip
174-
pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==1.4.0.dev202008122
174+
pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==1.5.0.dev202009182
175175
fi
176176
"$ROOT_DIR/scripts/onnx/test.sh"
177177
fi

aten/src/ATen/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ file(GLOB native_cuda_cu "native/cuda/*.cu")
7878
exclude(native_cuda_cu "${native_cuda_cu}" ${native_cuda_cu_sp})
7979
file(GLOB native_cuda_cpp "native/cuda/*.cpp")
8080
file(GLOB native_cuda_h "native/cuda/*.h" "native/cuda/*.cuh")
81+
file(GLOB native_hip_h "native/hip/*.h" "native/hip/*.cuh")
8182
file(GLOB native_cudnn_cpp "native/cudnn/*.cpp")
8283
file(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")
8384
file(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")
@@ -372,7 +373,7 @@ install(FILES "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake"
372373

373374
set(INSTALL_HEADERS ${base_h} ${ATen_CORE_HEADERS})
374375
if(NOT INTERN_BUILD_MOBILE)
375-
list(APPEND INSTALL_HEADERS ${native_h} ${native_cpu_h} ${native_quantized_h} ${cuda_h} ${native_cuda_h} ${cudnn_h} ${hip_h} ${miopen_h})
376+
list(APPEND INSTALL_HEADERS ${native_h} ${native_cpu_h} ${native_quantized_h} ${cuda_h} ${native_cuda_h} ${native_hip_h} ${cudnn_h} ${hip_h} ${miopen_h})
376377
endif()
377378

378379
# https://stackoverflow.com/questions/11096471/how-can-i-install-a-hierarchy-of-files-using-cmake

aten/src/ATen/core/aten_interned_strings.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -611,6 +611,7 @@ _(aten, sigmoid) \
611611
_(aten, sign) \
612612
_(aten, signbit) \
613613
_(aten, silu) \
614+
_(aten, sgn) \
614615
_(aten, sin) \
615616
_(aten, sinh) \
616617
_(aten, size) \

aten/src/ATen/core/jit_type.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,12 @@ struct SingleElementType : public Type {
263263
}
264264

265265
protected:
266-
SingleElementType(TypePtr elem) : Type(Kind), elem(std::move(elem)) {}
266+
SingleElementType(TypePtr elem) : Type(Kind), elem(std::move(elem)) {
267+
if (!this->elem) {
268+
throw std::runtime_error(c10::str(
269+
"Can not create ", typeKindToString(Kind), " with None type"));
270+
}
271+
}
267272

268273
private:
269274
TypePtr elem;

aten/src/ATen/core/type.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -716,6 +716,9 @@ TupleType::TupleType(
716716
schema_(std::move(schema)) {
717717
has_free_variables_ =
718718
std::any_of(elements_.begin(), elements_.end(), [](TypePtr v) {
719+
if (!v) {
720+
throw std::runtime_error("Can not create tuple with None type");
721+
}
719722
return v->hasFreeVariables();
720723
});
721724
if (schema_) {

aten/src/ATen/cpu/vec256/vec256_base.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,13 @@ struct Vec256 {
239239
// Specifically map() does not perform the type conversion needed by abs.
240240
return map([](T x) { return static_cast<T>(std::abs(x)); });
241241
}
242+
243+
template <typename other_t_sgn = T,
244+
typename std::enable_if<c10::is_complex<other_t_sgn>::value, int>::type = 0>
245+
Vec256<T> sgn() const {
246+
return map(at::native::sgn_impl);
247+
}
248+
242249
template <typename other_t_angle = T,
243250
typename std::enable_if<!c10::is_complex<other_t_angle>::value, int>::type = 0>
244251
Vec256<T> angle() const {

aten/src/ATen/cpu/vec256/vec256_complex_double.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,16 @@ template <> class Vec256<c10::complex<double>> {
134134
auto angle = _mm256_permute_pd(angle_(), 0x05); // angle 90-angle
135135
return _mm256_and_pd(angle, real_mask); // angle 0
136136
}
137+
Vec256<c10::complex<double>> sgn() const {
138+
auto abs = abs_();
139+
auto zero = _mm256_setzero_pd();
140+
auto mask = _mm256_cmp_pd(abs, zero, _CMP_EQ_OQ);
141+
auto abs_val = Vec256(abs);
142+
143+
auto div = values / abs_val.values; // x / abs(x)
144+
145+
return blendv(div, zero, mask);
146+
}
137147
__m256d real_() const {
138148
const __m256d real_mask = _mm256_castsi256_pd(_mm256_setr_epi64x(0xFFFFFFFFFFFFFFFF, 0x0000000000000000,
139149
0xFFFFFFFFFFFFFFFF, 0x0000000000000000));

0 commit comments

Comments
 (0)