[shard] use gather_object for gather API #71624

wanchaol · 2022-01-21T04:30:18Z

Stack from ghstack (oldest at bottom):

Now we have gather available in NCCL pg, we can switch our sharded_tensor.gather to use gather_object instead of all_gather_object, which will reduce the communication overhead.

fixes #66187

Differential Revision: D33688907

Now we have gather available in NCCL pg, we can switch our `sharded_tensor.gather` to use gather_object instead of all_gather_object, which will reduce the communication overhead. Differential Revision: [D33688907](https://our.internmc.facebook.com/intern/diff/D33688907/) [ghstack-poisoned]

pytorch-bot · 2022-01-21T04:30:22Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/aea2c196cf43997518669e0b4255b8cd32f16b96/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2022-01-21T04:30:30Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/71624
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 5aa888b (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Now we have gather available in NCCL pg, we can switch our `sharded_tensor.gather` to use gather_object instead of all_gather_object, which will reduce the communication overhead. fixes #66187 Differential Revision: [D33688907](https://our.internmc.facebook.com/intern/diff/D33688907/) [ghstack-poisoned]

Pull Request resolved: #71624 Now we have gather available in NCCL pg, we can switch our `sharded_tensor.gather` to use gather_object instead of all_gather_object, which will reduce the communication overhead. ghstack-source-id: 147386510 Differential Revision: [D33688907](https://our.internmc.facebook.com/intern/diff/D33688907/)

pritamdamania87 · 2022-01-22T01:49:58Z

torch/distributed/_sharded_tensor/api.py

                    tensor = shard.tensor

                    out_narrow_view = out
+                    assert out_narrow_view is not None


Doesn't _validate_output_tensor_for_gather validate this? Why do we need another assert here?

this is added purely for mypy linter, it seems like mypy couldn't understand _validate_output_tensor_for_gather checks, so have to do this assert here.

Usually I add a mypy ignore for something like this, but its up to you.

mrshenli · 2022-01-25T22:10:56Z

torch/distributed/_sharded_tensor/api.py

-        # https://github.com/pytorch/pytorch/issues/66187
-        dist.all_gather_object(
+        gathered_shards: List[Optional[List[Shard]]] = [None] * world_size if rank == dst else []
+        dist.gather_object(


(no need to update current PR, can be done in followup ones)

Not sure how performance critical this is, but as we discussed in today's meeting, this indeed looks more expensive than necessary, as there will be additional H2D + D2H copies. I'd assume handling tensor and non-tensor parts separately would be faster for large ShardedTensors.

pytorch/torch/distributed/distributed_c10d.py

Lines 1551 to 1565 in 03f1f0c

def _object_to_tensor(obj):

f = io.BytesIO()

_pickler(f).dump(obj)

byte_storage = torch.ByteStorage.from_buffer(f.getvalue()) # type: ignore[attr-defined]

# Do not replace `torch.ByteTensor` or `torch.LongTensor` with torch.tensor and specifying dtype.

# Otherwise, it will casue 100X slowdown.

# See: https://github.com/pytorch/pytorch/issues/65696

byte_tensor = torch.ByteTensor(byte_storage)

local_size = torch.LongTensor([byte_tensor.numel()])

return byte_tensor, local_size

def _tensor_to_object(tensor, tensor_size):

buf = tensor.numpy().tobytes()[:tensor_size]

return _unpickler(io.BytesIO(buf)).load()

BTW, is the non-tensor meta info static? If so, we can cache those?

Yeah thanks for suggestion, will try to use two separate gather to improve the perf. The non-tensor meta info might not be static I think (i.e. if we do resharding on a ShardedTensor, the Shard.metadata might get changed, to different ranks, or the shard_offset/size changes).

I am thinking of a new way to possibly only requires one gather call, this requires us make Shard a subclass of torch.Tensor(metadata is a field in python), so that we can do gather alone. But I am not sure if our c10d collectives support custom tensor objects? (Maybe not as we eventually lowering the collective to C++ and we might only have the at::Tensor do the real communication, not the metadata)

But I am not sure if our c10d collectives support custom tensor objects?

It should. At least it worked for SparseTensor. But I am not sure if that's sufficient for ShardedTensor

pytorch/torch/csrc/distributed/c10d/ProcessGroupGloo.cpp

Lines 1050 to 1134 in 7beb030

class AsyncSparseAllreduceWork : public ProcessGroupGloo::AsyncWork {

public:

AsyncSparseAllreduceWork(

const std::shared_ptr<gloo::Context>& context,

std::vector<at::Tensor>& inputs,

uint32_t tag)

: ProcessGroupGloo::AsyncWork({inputs}, "gloo:sparse_all_reduce", inputs),

context(context),

inputs(inputs),

tag(tag) {}

std::shared_ptr<gloo::Context> context;

std::vector<at::Tensor> inputs;

const uint32_t tag;

// We share dimensionality about the sparse tensors before collecting

// their contents. We assume here that the maximum number of sparse

// and dense dimensions is 4. This is stored in a contiguous piece of

// memory so that we can easily run allgather on it.

//

// The layout of this memory is as follows:

//

// - [0:4]: sparse dims

// - [4:8]: dense dims

// - [8]: nnz

//

class SparseTensorMetadata {

public:

static constexpr auto dim = 9;

// Construct from an existing metadata tensor to facilitate structured

// access to metadata from peers, after gathering it.

explicit SparseTensorMetadata(at::Tensor metadata)

: metadata_(metadata), data_(metadata_.data_ptr<int64_t>()) {

AT_ASSERT(metadata.scalar_type() == at::kLong);

AT_ASSERT(metadata.dim() == 1);

AT_ASSERT(metadata.size(0) == dim);

}

// Populate the metadata.

void populate_from_sparse_tensor(const at::Tensor& tensor) {

const auto sparse_dim = tensor.sparse_dim();

AT_ASSERT(sparse_dim <= 4);

for (const auto i : c10::irange(4)) {

if (i < sparse_dim) {

data_[i] = tensor.size(i);

}

}

const auto dense_dim = tensor.dense_dim();

AT_ASSERT(dense_dim <= 4);

for (const auto i : c10::irange(4)) {

if (i < dense_dim) {

data_[i + 4] = tensor.size(sparse_dim + i);

}

}

data_[8] = tensor._nnz();

}

std::vector<int64_t> sizes() const {

std::vector<int64_t> sizes;

// Sparse sizes

for (const auto i : c10::irange(4)) {

if (data_[i] <= 0) {

break;

}

sizes.push_back(data_[i]);

}

// Dense sizes

for (const auto i : c10::irange(4, 8)) {

if (data_[i] <= 0) {

break;

}

sizes.push_back(data_[i]);

}

return sizes;

}

int64_t nnz() const {

return data_[8];

}

protected:

at::Tensor metadata_;

int64_t* data_;

};

I gave a try on using two separate gather for metadata and tensors to avoid the pickling copies with gather_object today. I think it's more tricky than I thought, mainly because:

We can use gather_object for metadatas, but we couldn't simply use gather for the local shard tensors, mainly because local_shards() is a list of tensor, but input of gather is a single tensor not a list of tensor, so gather won't work here

We can try using torch.cat locally on each rank to form a single tensor before the gather collective, but we need to split them afterwards as they might not be adjacent to each other (local shards on this rank might contain two tensors that's far away in the global position logically). So we might need to insert additional insertion point in the first gather_object to split the combined tensors from the second gather call. This is pretty tricky from my understanding.

Any suggestions are appreciated :)

For this PR itself, I think we can land it as is, as the code before this PR used all_gather_object anyways. I will think more about how to solve this pickling issue and make a follow up PR to improve the perf.

@wanchaol Can we create a gh issue to track this follow up improvement?

Now we have gather available in NCCL pg, we can switch our `sharded_tensor.gather` to use gather_object instead of all_gather_object, which will reduce the communication overhead. fixes #66187 Differential Revision: [D33688907](https://our.internmc.facebook.com/intern/diff/D33688907/) [ghstack-poisoned]

pritamdamania87 · 2022-02-03T00:06:03Z

torch/distributed/_sharded_tensor/api.py

                    tensor = shard.tensor

                    out_narrow_view = out
+                    assert out_narrow_view is not None


Usually I add a mypy ignore for something like this, but its up to you.

pritamdamania87 · 2022-02-03T00:08:47Z

torch/distributed/_sharded_tensor/api.py

-        # https://github.com/pytorch/pytorch/issues/66187
-        dist.all_gather_object(
+        gathered_shards: List[Optional[List[Shard]]] = [None] * world_size if rank == dst else []
+        dist.gather_object(


@wanchaol Can we create a gh issue to track this follow up improvement?

fduwjj

I have a rookie question here, if this is for perf purpose. For change like this, do we measure the perf change before and after? If so, I am curious how do we do the experiment here?

Now we have gather available in NCCL pg, we can switch our `sharded_tensor.gather` to use gather_object instead of all_gather_object, which will reduce the communication overhead. fixes #66187 Differential Revision: [D33688907](https://our.internmc.facebook.com/intern/diff/D33688907/) [ghstack-poisoned]

Summary: Pull Request resolved: #71624 Now we have gather available in NCCL pg, we can switch our `sharded_tensor.gather` to use gather_object instead of all_gather_object, which will reduce the communication overhead. TODO: To further reduce the comm overhead, we need to figure out a way to avoid using `gather_object`, as `gather_object` or `all_gather_object` incurs pickling copy between devices. ghstack-source-id: 151007578 Test Plan: wait for ci Reviewed By: pritamdamania87 Differential Revision: D33688907 fbshipit-source-id: 2073c5a46c33a7a2640a9e3599dc795d9e4c0a1e

github-actions · 2022-03-10T19:18:37Z

Hey @wanchaol.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

wanchaol requested review from H-Huang, mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners January 21, 2022 04:30

pytorch-bot bot added the ciflow/default label Jan 21, 2022

facebook-github-bot added the cla signed label Jan 21, 2022

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jan 21, 2022

This was referenced Jan 21, 2022

Implement gather primitive for ProcessGroupNCCL #66745

Closed

Implement scatter primitive for ProcessGroupNCCL #70029

Closed

[c10d] Enable gather_object on nccl #71623

Closed

wanchaol requested a review from fduwjj January 21, 2022 22:10

pritamdamania87 reviewed Jan 22, 2022

View reviewed changes

mrshenli reviewed Jan 25, 2022

View reviewed changes

wanchaol mentioned this pull request Feb 2, 2022

[shard] use scatter in shard_parameter API #72160

Closed

pritamdamania87 approved these changes Feb 3, 2022

View reviewed changes

fduwjj reviewed Feb 9, 2022

View reviewed changes

wanchaol mentioned this pull request Mar 8, 2022

[shard] use dist.gather instead of dist.gather_object #73935

Closed

wanchaol added 2 commits March 9, 2022 16:45

pytorchmergebot closed this in d0f9556 Mar 10, 2022

facebook-github-bot deleted the gh/wanchaol/198/head branch March 14, 2022 14:17

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

	def _object_to_tensor(obj):
	f = io.BytesIO()
	_pickler(f).dump(obj)
	byte_storage = torch.ByteStorage.from_buffer(f.getvalue()) # type: ignore[attr-defined]
	# Do not replace `torch.ByteTensor` or `torch.LongTensor` with torch.tensor and specifying dtype.
	# Otherwise, it will casue 100X slowdown.
	# See: https://github.com/pytorch/pytorch/issues/65696
	byte_tensor = torch.ByteTensor(byte_storage)
	local_size = torch.LongTensor([byte_tensor.numel()])
	return byte_tensor, local_size


	def _tensor_to_object(tensor, tensor_size):
	buf = tensor.numpy().tobytes()[:tensor_size]
	return _unpickler(io.BytesIO(buf)).load()

	class AsyncSparseAllreduceWork : public ProcessGroupGloo::AsyncWork {
	public:
	AsyncSparseAllreduceWork(
	const std::shared_ptr<gloo::Context>& context,
	std::vector<at::Tensor>& inputs,
	uint32_t tag)
	: ProcessGroupGloo::AsyncWork({inputs}, "gloo:sparse_all_reduce", inputs),
	context(context),
	inputs(inputs),
	tag(tag) {}

	std::shared_ptr<gloo::Context> context;
	std::vector<at::Tensor> inputs;
	const uint32_t tag;

	// We share dimensionality about the sparse tensors before collecting
	// their contents. We assume here that the maximum number of sparse
	// and dense dimensions is 4. This is stored in a contiguous piece of
	// memory so that we can easily run allgather on it.
	//
	// The layout of this memory is as follows:
	//
	// - [0:4]: sparse dims
	// - [4:8]: dense dims
	// - [8]: nnz
	//
	class SparseTensorMetadata {
	public:
	static constexpr auto dim = 9;

	// Construct from an existing metadata tensor to facilitate structured
	// access to metadata from peers, after gathering it.
	explicit SparseTensorMetadata(at::Tensor metadata)
	: metadata_(metadata), data_(metadata_.data_ptr<int64_t>()) {
	AT_ASSERT(metadata.scalar_type() == at::kLong);
	AT_ASSERT(metadata.dim() == 1);
	AT_ASSERT(metadata.size(0) == dim);
	}

	// Populate the metadata.
	void populate_from_sparse_tensor(const at::Tensor& tensor) {
	const auto sparse_dim = tensor.sparse_dim();
	AT_ASSERT(sparse_dim <= 4);
	for (const auto i : c10::irange(4)) {
	if (i < sparse_dim) {
	data_[i] = tensor.size(i);
	}
	}
	const auto dense_dim = tensor.dense_dim();
	AT_ASSERT(dense_dim <= 4);
	for (const auto i : c10::irange(4)) {
	if (i < dense_dim) {
	data_[i + 4] = tensor.size(sparse_dim + i);
	}
	}
	data_[8] = tensor._nnz();
	}

	std::vector<int64_t> sizes() const {
	std::vector<int64_t> sizes;
	// Sparse sizes
	for (const auto i : c10::irange(4)) {
	if (data_[i] <= 0) {
	break;
	}
	sizes.push_back(data_[i]);
	}
	// Dense sizes
	for (const auto i : c10::irange(4, 8)) {
	if (data_[i] <= 0) {
	break;
	}
	sizes.push_back(data_[i]);
	}
	return sizes;
	}

	int64_t nnz() const {
	return data_[8];
	}

	protected:
	at::Tensor metadata_;
	int64_t* data_;
	};

[shard] use gather_object for gather API #71624

[shard] use gather_object for gather API #71624

Uh oh!

Conversation

wanchaol commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fduwjj left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wanchaol commented Jan 21, 2022 •

edited

Loading

pytorch-bot bot commented Jan 21, 2022 •

edited

Loading

facebook-github-bot commented Jan 21, 2022 •

edited

Loading