Bugfix: Fix broken: UnicodeDecodeError: 'utf-8' codec can't decode#55
Bugfix: Fix broken: UnicodeDecodeError: 'utf-8' codec can't decode#55riverzhou wants to merge 2 commits intoabetlen:mainfrom
Conversation
|
Doesn't this just remove the error? Emoji's still don't work |
Test passed in Chinese, not test Emoji. |
|
Great to see the fix Chinese. |
|
Very funny indeed. |
|
Was this resolved Upstream? ggml-org/llama.cpp@aaf3b23 |
|
@MillionthOdin16 I don't think so because I've had this issue on linux as well. I believe the issue is that utf-8 encodng is variable length and certain tokens are not valid utf-8 because they're just returned as bytes which may include partial utf-8 code points. I think this needs some tests to ensure we're properly keeping track of the number of returned bytes. |
|
Fixes #57 |
|
@Niek can you confirm that this fixes the bug and gives the same result in streaming vs. regular mode. For example, compare streaming and regular mode for a completion that breaks in streaming mode with a fixed seed and temperature=0. |
|
I just tested, for my own reference: docker run --rm -it -v /path/to/models:/models -p8000:8000 python:3-buster bash
git clone -b river https://github.com/riverzhou/llama-cpp-python.git /app
cd /app
sed -i -e 's/git@github.com:/https:\/\/github.com\//' -e 's/.git$//' .gitmodules
git submodule update --init --recursive
python -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi sse_starlette uvicorn
python setup.py develop
HOST=0.0.0.0 MODEL=/models/ggml-vicuna-7b-4bit.bin python3 -m llama_cpp.serverWith a prompt like |
|
@Niek Can you try chaging |
|
I change the code a littie bit, and it works: |
|
@riverzhou can you check if this bug is still occurs since #118 was merged? |
|
@riverzhou update? |
Fine. It's OK now. Thanks. |

No description provided.