-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
BUG: define "uint-alignment", fixes complex64 alignment #6377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsUintAligned needs a declaration.
|
Looks like |
|
Right now I'm looking into the python3 assertion failures which seem more serious. |
|
I'll have to finish later. But this is what fails: My current guess is that there is a bug in I think this only turned up now because I added |
|
is there anything different to my branch besides adding raw_* |
|
@juliantaylor I renamed all the uintaligned for now, but I'd like to keep the change in the final version, since I think it will be confusing that 'aligned' does not actually mean aligned. |
87186c2 to
3284052
Compare
|
OK, I think I fixed the bug in @juliantaylor It's largely the same, but I reorganizes the The last 4 (unnumbered) commits are additional bugfixes/cleanups. |
3c4661f to
7869879
Compare
|
I added a few more changes I missed before (last commit). Nditer code needed "uint" alignment too. Also, I got rid of |
doc/source/dev/alignment.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it also sets dtype.isalignedstruct)
|
☔ The latest upstream changes (presumably #7134) made this pull request unmergeable. Please resolve the merge conflicts. |
|
☔ The latest upstream changes (presumably #7481) made this pull request unmergeable. Please resolve the merge conflicts. |
7500aa9 to
475e4ef
Compare
|
Rebased and simplified. In summary:
I've removed the massive variable renaming I used to have of "aligned" -> "uintaligned" in the copy code paths. We just need to remember that in those files "aligned" refers to uint alignment, not true alignment. |
7e69574 to
48c389c
Compare
|
Does this affect pickles, npz files, etc.? I'm mostly concerned about data stored with the old alignment becoming invalid. Don't know if that is possible or not, I'm guessing that the stored offsets will take care of that and that if it were a problem cross platform compatibility would already be broken. |
|
That's an important thought! But I don't think it's a problem: I don't think it affects npz file loading because those files store the dtype using the pep3118 data-syntax ( As a sidenote, |
numpy/core/src/common/array_assign.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe npy_uintp. I'm pretty sure both work here for twos complement integers, but I needed to think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is later cast to npy_uintp in npy_is_aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only bitwise operations are involved, so it should be exactly the same.
numpy/core/src/common/array_assign.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an assumption of multiples of powers of two, together with twos complement arithmetic, built in here. AFAICT, there are no false positive, but perhaps some false negatives. @juliantaylor @seberg Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any arithmetic here, only bitwise OR operations. Whether the integer is considered signed or unsigned doesn't come into play.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the bitwise or is used together with arrhythmic stuff later since it is used in npy_is_aligned. Did not think about it long, but I think you are right Chuck. I guess npy_is_aligned might explain it a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if twos complement has anything to do with it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"arrythmic"? :-)
More seriously, I don't know what you are bothering about. First data is cast to npy_intp(this is an exact bitcast), some bitwise operations are done on the signed integer (they are agnostic wrt signedness, they just operate on individual bits), then the result is cast back to void* (this is an exact bitcast), then it is finally cast to npy_uintp (again an exact bitcast).
There is zero reason to believe that the intermediate cast to npy_intp is any potential cause for trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if this errs too much on the safe side (no idea if it does). I guess placing the npy_is_aligned check into the loop here instead should work and probably not be a lot of overhead. (Plus this is the flags calculation, right? so it doesn't happen very often anyway). Or am I missing something? Plus it might be a bit less mind boggling.
EDIT: Frankly, nvm, I bet for all practical purposes, this just always works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it has been there a long time. I think it basically gives the largest power of two that divides the GCD. For non-powers of two I'm not sure how well it works. But we have lived with it for a long time. Might be worth a note, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some googling suggests that the C standard says that alignments must be powers of two. (I wasn't able to find the exact quote though). In that case, the special casing in npy_is_aligned is unnecessary, and we can just assume it is a power of two.
I'll add a note in any case, I think this is the 3rd time now that I've puzzled out what this code does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to sleep on it :) I think when all is done, it will be OK for its intended application, but I want to understand it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that this assumes a twos-complement representation of signed integers, even though the C standard explicitly allows others like ones-complement. I'm not sure we care, because in practice all modern computers use two's complement, and because all this pointer bit-twiddling is C-implementation-dependent anyway.
Here's my understanding: We are trying to compute whether (data + n*stride)%alignment == 0 for all integers n. First, assuming alignment is a power of two, and assuming twos-complement representation, simplify the %alignment == 0 to checking if the lowest log2(alignment) bits are 0. Next, use the fact that in twos complement representation, if x low order bits of data and stride are 0, then the x lower bits of data + n*stride must also be 0 for all n because of how binary addition works. Neither simplification is valid for negative values in ones-complement representation.
Casting the intp stride to uintp` and using uintp would avoid the signed twos-complement assumption, and the algorithm still works.
However, the representation/cast of data from pointer to uint is implementation defined too, so we are violating the standard anyway anytime we use bitwise ops on pointer values. (only arithmetic is allowed, I think). I don't see any alternative though, as far as I see there is no totally standard-compliant way to test alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do PyArray_ISALIGNED and IsUintAligned differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former checks "true" alignment, the latter "uint" alignment.
We need uint alignment for the copy operations about 100 lines down.
doc/source/dev/alignment.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are some structures that have no uint alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's one way to see it. Or, in pratice, those structs get assigned a uint alignment of 0.
The "uint" alignments are for use in the strided copy code, which only special-cases the 1,2,4,8, and 16 byte sizes. All other cases return a "uint" alignment of 0, which triggers memmove to be used instead.
I'll correct/update the docs to better describe what happens for non-power-of-two sized types.
doc/source/dev/alignment.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's a power of two greater than sizeof(uintmax_t)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to rephrase, that's an inaccurate sentence. It also doesn't describe the fact that 16 bytes types have a uint alignment of 8, because that's what the copy code wants. Maybe:
"Uint" alignment depends on the size of a datatype. It is defined to be the
"True alignment" of the uint used in numpy's copy-code to copy the datatype,
or undefined/unaligned if there is no equivalent uint. Currently numpy uses
uint8, uint16, uint32, uint64 and uint64 to copy data of size 1,2,4,8,16 bytes
respectively, and all other sized datatypes cannot be uint aligned.
doc/source/dev/alignment.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter-example:
>>> np.dtype('U1').alignment
4
>>> np.issubdtype('U1', np.flexible)
True
doc/source/dev/alignment.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to specify which meaning of "align" you are using here?
80c5960 to
8ee2c91
Compare
Implements IsAligned, IsUintAligned, npy_uint_alignment
|
Thanks Allan. |
This PR fixes alignment checks along the lines of #5365 and this comment.
This fixes the bug that
complex64(andcomplex128) has incorrect alignment, and should be 4 instead of 8 on most systems. This has caused further bugs such as unnecessary buffer allocation, (previously) bus errors, and more work for people adding new architectures.The problem is that numpy really needs two different types of alignment check:
memcpy(dst, src, N)by doing*((uintN*)dst) = *((uintN*)src).See discussion in #5365, #5316, #3816, #5656, in the docs committed here, plus the bit at the end of the original notes (which the docs are based on).
This PR splits up the two types of alignment checks into two sets of functions, the first for uint alignment, the second for "true" alignment.