Skip to content

Conversation

@colesbury
Copy link
Member

Perf numbers:
https://gist.github.com/colesbury/9e28dd7b0f27b0b019f68adbd4bd4b88

I've changed the dispatch stub so that it doesn't require every kernel
to be compiled for every instruction set. Kernel implementations are
stored in the stub's table with the REGISTER_DISPATCH macro.

I've also moved vec256 to it's own folder and split up the
specializations before they get too unwieldy.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Mar 27, 2018

Generally the dispatch code looks more compact, which is nice. But, I think we might to revisit it again in the future. Having the ability to pass a table or function for non-standard cases is definitely good, but we might need to add support for a wider range of objects, such as templated functions (I'm not sure this supports that).

All calls to tbb will need to be prefaced by a call to the init_tbb_num_threads function as otherwise the scheduler will be initialized with the wrong number of threads. Maybe we should setup tbb so that this is required by making it unavailable anywhere outside of Parallel.h/.cpp and therefore force all devs to wrap the tbb calls into an ATen api call. Alternatively it would be great if this wasn't necessary to begin with, but I don't think this will happen any time soon. There are preview features that appear to attempt to support this, but I didn't get them to work correctly for our case.

Everything else looks good to me.

EDIT: I'm wondering if we could delay merging the dispatch and vec256 changes until after the Unary ops have been merged?

This comment was marked as off-topic.

This comment was marked as off-topic.

@colesbury colesbury force-pushed the vectorized_sum branch 2 times, most recently from 6eb1995 to 81ddaa6 Compare March 28, 2018 22:05
Perf numbers:
https://gist.github.com/colesbury/9e28dd7b0f27b0b019f68adbd4bd4b88

I've changed the dispatch stub so that it doesn't require every kernel
to be compiled for every instruction set. Kernel implementations are
stored in the stub's table with the REGISTER_DISPATCH macro.

I've also moved vec256 to it's own folder and split up the
specializations before they get too unwieldy.
 - Prefer signed integers. Mixing signed and unsigned integers is a
   pain and ATen mostly uses signed integers (int64_t).
 - Use inline lambda instead of struct for UnaryOps
 - Rename partial load overload "load_partial"
});
}

static void sqrt_kernel(Tensor& result, const Tensor& self) {

This comment was marked as off-topic.

This comment was marked as off-topic.

@colesbury colesbury merged commit e4c0bb1 into pytorch:master Mar 29, 2018
@colesbury colesbury deleted the vectorized_sum branch March 29, 2018 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants