Support unicode array type.#2896
Conversation
5f660a4 to
4ddcda1
Compare
youknowone
left a comment
There was a problem hiding this comment.
wow, finally we will get unicode array! great!
4ddcda1 to
c4dcf8a
Compare
|
One thing that's tricky about the One code point could be split over 2 |
From the CPython's code, I am a little confused. Should My understanding is that match (SIZEOF_WCHAR_T, sizeof(wchar_t)) {
(2, 2) | (4, 4) => /* case [1] */,
(4, 2) => /* case [2] */
(2, 4) => /* case [3] */
}In RustPython, how can we deal with cases that #if USE_UNICODE_WCHAR_CACHE
const wchar_t *wstr = _PyUnicode_WSTR(unicode);
if (wstr != NULL) {
memcpy(w, wstr, size * sizeof(wchar_t));
return;
}
#else /* USE_UNICODE_WCHAR_CACHE */
if (PyUnicode_KIND(unicode) == sizeof(wchar_t)) {
memcpy(w, PyUnicode_DATA(unicode), size * sizeof(wchar_t));
return;
}
#endif /* USE_UNICODE_WCHAR_CACHE * assert(PyUnicode_KIND(unicode) == PyUnicode_2BYTE_KIND);
const Py_UCS2 *s = PyUnicode_2BYTE_DATA(unicode);
for (; size--; ++s, ++w) {
*w = *s;
} assert(PyUnicode_KIND(unicode) == PyUnicode_4BYTE_KIND);
const Py_UCS4 *s = PyUnicode_4BYTE_DATA(unicode);
for (; size--; ++s, ++w) {
Py_UCS4 ch = *s;
if (ch > 0xFFFF) {
assert(ch <= MAX_UNICODE);
/* encode surrogate pair in this case */
*w++ = Py_UNICODE_HIGH_SURROGATE(ch);
if (!size--)
break;
*w = Py_UNICODE_LOW_SURROGATE(ch);
}
else {
*w = ch;
}
} |
|
I don't think it does check CPython's string repr is kinda weird, so it's just checking whether the string is internally a |
|
Oh hey it'd actually be not too bad to do UTF-16 - |
|
@coolreader18 Thanks for your explanation, and that's what I've learnt from the CPython's code: In CPython, a string can be either encoded in UTF-8, USC-1, USC-2 or USC-4 depends on its content. However, in RustPython, a string is always encoded in UTF-8. (as the I have another question: If I call I have no such environment that |
Note that CPython splits a USC-4 to two USC-2s, which is different from the behavior of Maybe we can add a function like this? fn to_usc16(ch: char) -> impl Iterator<Item = u16> {
let ch = ch as u32;
let mut usc16 = [0_u16; 2];
if ch > u16::MAX as u32 {
usc16[0] = (ch >> 16) as u16;
usc16[1] = ch as u16;
usc16.into_iter()
} else {
usc16[0] = ch as u16;
usc16[..1].into_iter()
}
}Update: my mistake, |
c4dcf8a to
4d96582
Compare
|
Now all the test cases in |
893075a to
6c8a6e8
Compare
|
Fix as the clippy tips. |
youknowone
left a comment
There was a problem hiding this comment.
Looks great in general. I left a few comments
6c8a6e8 to
f283a2b
Compare
f283a2b to
1db04df
Compare
| self.assertEqual(fp.geturl(), redirected_url.strip()) | ||
|
|
||
| # TODO: RUSTPYTHON | ||
| @unittest.expectedFailure |
There was a problem hiding this comment.
There are two tests failed on my computer, so I marked them expectedFailure:
======================================================================
FAIL: test_redirect_encoding (test.test_urllib2.HandlerTests) [b'/spaced path/']
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../vm/pylib-crate/Lib/test/test_urllib2.py", line 1358, in test_redirect_encoding
self.assertTrue(request.startswith(expected), repr(request))
AssertionError: False is not true : b'GET http://example.com/spaced%20path/ HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: example.com\r\nUser-Agent: Python-urllib/3.9\r\nConnection: close\r\n\r\n'
======================================================================
FAIL: test_redirect_encoding (test.test_urllib2.HandlerTests) [b'/?p\xc3\xa5-dansk']
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../vm/pylib-crate/Lib/test/test_urllib2.py", line 1358, in test_redirect_encoding
self.assertTrue(request.startswith(expected), repr(request))
AssertionError: False is not true : b'GET http://example.com/?p%C3%A5-dansk HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: example.com\r\nUser-Agent: Python-urllib/3.9\r\nConnection: close\r\n\r\n'
But it seems to pass on the GitHub CI.
My OS is Ubuntu 20.04, and the CPU is 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz.
My rustc version is 1.53.0, and the test command is
cargo run --release -- -m test -j8 -v
Is there anyone who can reproduce this error?
1db04df to
c7a193e
Compare
c7a193e to
e652ae8
Compare
|
Thank you for contributing! |
fix #2895