Raise ValueError if \x00 character exists for eval argument#4052
Raise ValueError if \x00 character exists for eval argument#4052youknowone merged 10 commits intoRustPython:mainfrom
ValueError if \x00 character exists for eval argument#4052Conversation
| # TODO: RUSTPYTHON, RustPython raises a SyntaxError here, but cpython raise a ValueError | ||
| error = SyntaxError if platform.python_implementation() == 'RustPython' else ValueError | ||
| error = ValueError | ||
| with assert_raises(error): |
There was a problem hiding this comment.
| with assert_raises(error): | |
| with assert_raises(ValueError): |
There was a problem hiding this comment.
I applied your suggestion in 7e88c8f. Thank you! 🙏🏻
|
About the |
vm/src/stdlib/builtins.rs
Outdated
| source: Either< | ||
| Either<PyStrRef, crate::builtins::PyBytesRef>, | ||
| PyRef<crate::builtins::PyCode>, | ||
| >, |
There was a problem hiding this comment.
is this exactly (str|bytes) or ArgStrOrBytesLike?
ArgStrOrBytesLike accepts more types like bytearray. Some functions take exactly bytes, while other functions take also bytearray
There was a problem hiding this comment.
When I run rustpython with ArgStrOrBytesLike, it seems very suit for this case. string, bytes or code is just error message and the eval also receives bytearray Buffer 🤔 . So I'll push a commit apply ArgStrOrBytesLike. Thanks! 🙏🏻
vm/src/stdlib/builtins.rs
Outdated
| } | ||
|
|
||
| Ok(Either::A( | ||
| vm.ctx.new_str(std::str::from_utf8(source).unwrap()), |
There was a problem hiding this comment.
what happens if bytes include non-ascii bytes?
what happens if bytes include invalid utf8 character?
There was a problem hiding this comment.
what happens if bytes include invalid utf8 character?
It causes panic. I tested with the below python code:
eval(b'\xff')And it prints:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', vm/src/stdlib/builtins.rs:268:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Because the C language const char * type for string and also byte array, it can be represented as a single type with a compiler option. So I'll see more the compiler, parser section. If you have useful documentations to recommend, please leave them as comments 🙇🏻♂️
// _PyPegen_run_parser_from_string
if (flags != NULL && flags->cf_flags & PyCF_IGNORE_COOKIE) {
tok = PyTokenizer_FromUTF8(str, exec_input);
} else {
tok = PyTokenizer_FromString(str, exec_input);
}There was a problem hiding this comment.
What I thought was .unwrap() need to be avoided because CPython would not panic but raise a SyntaxError.
The major difference between FromUTF8 and FromString is using decoder or not. The decode error seems to be wrapped by SyntaxError somewhere.
There was a problem hiding this comment.
Ah, I agree with your thought. 🙏🏻 I'll work for wrapping the decode error.
There was a problem hiding this comment.
When you execute eval(b'\xff') in CPython implementation, you will see the below output:
>>> eval(b"\xff")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
�
^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I set SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte message as format and I tried to apply it in 714ce4d.
After the commit, in RustPython:
>>>>> eval(b'\xff')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
youknowone
left a comment
There was a problem hiding this comment.
looks great! I left a few minor change requests
vm/src/stdlib/builtins.rs
Outdated
| match std::str::from_utf8(source) { | ||
| Ok(s) => Ok(Either::A(vm.ctx.new_str(s))), | ||
| Err(err) => { | ||
| let msg = format!( | ||
| "(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | ||
| source[err.valid_up_to()], | ||
| err.valid_up_to() | ||
| ); | ||
| Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | ||
| } |
There was a problem hiding this comment.
I prefer to handle error first not to disturb to read main control flow
| match std::str::from_utf8(source) { | |
| Ok(s) => Ok(Either::A(vm.ctx.new_str(s))), | |
| Err(err) => { | |
| let msg = format!( | |
| "(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | |
| source[err.valid_up_to()], | |
| err.valid_up_to() | |
| ); | |
| Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | |
| } | |
| let source = std::str::from_utf8(source).map_err(|err| { | |
| let msg = format!( | |
| "(unicode error) 'utf-8' codec can't decode byte 0x{:x?} in position {}: invalid start byte", | |
| source[err.valid_up_to()], | |
| err.valid_up_to() | |
| ); | |
| Err(vm.new_exception_msg(vm.ctx.exceptions.syntax_error.to_owned(), msg)) | |
| })?; | |
| Either::A(vm.ctx.new_str(s))), | |
vm/src/stdlib/builtins.rs
Outdated
| let code = match source { | ||
| Either::A(either) => { | ||
| let source: &[u8] = &either.borrow_bytes(); | ||
| if source.iter().any(|&b| b == 0) { |
Co-authored-by: Jeong YunWon <69878+youknowone@users.noreply.github.com>
It isn't a full perfect solution for
evalimplementation but it fixes:evalbecame to receivebytesalso.evalraisesValueErrorwhen there isnull(\x00) character before compiling.You can see also:
_Py_SourceAsStringp.s. The message says like it should receive
stringorbytesorcodebut it also receivesbytearraywell. 🤔