fix(image): 修复openai图片data URI的MIME识别与编码链路#7017
fix(image): 修复openai图片data URI的MIME识别与编码链路#7017idiotsj wants to merge 9 commits intoAstrBotDevs:masterfrom
Conversation
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Now that both
_encode_image_bs64andencode_image_bs64always delegate toimage_source_to_data_uri(which never returns an empty string), theif not image_databranches in the context assembly logic have effectively become dead code and can be removed or adjusted to only handle exception cases. - In
image_source_to_data_uri, you explicitly rejecthttp(s)URLs while the callers still use a simplestartswith("http")check, which can be brittle (e.g. uppercase schemes or otherhttp-prefixed schemes); consider aligning the scheme detection logic between the helper and its callers, or centralizing the scheme handling in one place to avoid edge-case mismatches.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Now that both `_encode_image_bs64` and `encode_image_bs64` always delegate to `image_source_to_data_uri` (which never returns an empty string), the `if not image_data` branches in the context assembly logic have effectively become dead code and can be removed or adjusted to only handle exception cases.
- In `image_source_to_data_uri`, you explicitly reject `http(s)` URLs while the callers still use a simple `startswith("http")` check, which can be brittle (e.g. uppercase schemes or other `http`-prefixed schemes); consider aligning the scheme detection logic between the helper and its callers, or centralizing the scheme handling in one place to avoid edge-case mismatches.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request centralizes image-to-data-URI conversion logic by introducing the image_source_to_data_uri utility, which includes automatic MIME type detection using magic bytes. Redundant encoding logic in ProviderRequest and OpenAI provider sources has been refactored to use this utility, improving support for various image formats like PNG, GIF, and WEBP. New tests were added to ensure correct URI generation and MIME detection. A review comment suggests improving exception handling by catching a more specific error type during base64 decoding.
astrbot/core/utils/io.py
Outdated
| except Exception: | ||
| pass |
There was a problem hiding this comment.
在处理 base64.b64decode 异常时,捕获更具体的 base64.binascii.Error 而不是通用的 Exception 会更好。这有助于区分不同类型的错误,并使代码更具可读性和可维护性。虽然当前逻辑在功能上没有问题,但更具体的异常处理是良好的编程实践。
| except Exception: | |
| pass | |
| image_bytes = base64.b64decode(raw_base64) | |
| mime_type = detect_image_mime_type(image_bytes) | |
| except base64.binascii.Error: | |
| pass |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="astrbot/core/utils/io.py" line_range="255-261" />
<code_context>
+ "Remote image URL is not supported in image_source_to_data_uri; download the file before calling this helper.",
+ )
+
+ if image_source.startswith("base64://"):
+ raw_base64 = image_source.removeprefix("base64://")
+ mime_type = DEFAULT_IMAGE_MIME_TYPE
+ try:
+ image_bytes = base64.b64decode(raw_base64)
+ mime_type = detect_image_mime_type(image_bytes)
+ except Exception:
+ pass
+ return f"data:{mime_type};base64,{raw_base64}", mime_type
</code_context>
<issue_to_address>
**suggestion (bug_risk):** base64:// handling swallows all decode errors and still returns the original base64 without any signal.
Decoding errors for `base64://` are caught with a bare `Exception`, and you then return the original `raw_base64` with a default MIME. This makes invalid base64 indistinguishable from a valid non-JPEG image and may hide unrelated failures. Please narrow the exception to the specific decode errors (e.g., `binascii.Error`/`ValueError`), and consider logging at debug level or skipping `detect_image_mime_type` entirely when decode fails so the behavior is explicit.
Suggested implementation:
```python
if image_source.startswith("base64://"):
raw_base64 = image_source.removeprefix("base64://")
mime_type = DEFAULT_IMAGE_MIME_TYPE
try:
image_bytes = base64.b64decode(raw_base64)
except (binascii.Error, ValueError) as exc:
# Invalid base64 payload: keep the default MIME type and skip detection.
# This keeps the behavior explicit and does not hide non-base64 related errors.
try:
logger.debug("Failed to decode base64 image_source: %s", exc)
except NameError:
# logger may not be defined in all usages; fail silently if absent.
pass
return f"data:{mime_type};base64,{raw_base64}", mime_type
try:
mime_type = detect_image_mime_type(image_bytes)
except Exception:
# If MIME detection fails, fall back to the default type but still
# return the original base64 content.
try:
logger.debug("Failed to detect MIME type for base64 image_source", exc_info=True)
except NameError:
pass
return f"data:{mime_type};base64,{raw_base64}", mime_type
```
To fully integrate this change with the rest of the module, you should:
1. Add the missing imports at the top of `astrbot/core/utils/io.py` (if they are not already present):
```python
import binascii
import logging
logger = logging.getLogger(__name__)
```
2. If your project uses a different logging convention (e.g., a shared logger utility), replace the `logger` usage here with that project-standard logger to keep logging consistent across the codebase.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
image_source_to_data_uri,base64://andfile://handling is case-sensitive while HTTP(S) detection is case-insensitive; consider usinglower_sourcefor these checks as well soBASE64://andFILE://variants behave consistently. - When decoding
base64://sources, you already catchbinascii.Error/ValueError; usingbase64.b64decode(raw_base64, validate=True)would make invalid base64 detection stricter and avoid silently accepting malformed input.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `image_source_to_data_uri`, `base64://` and `file://` handling is case-sensitive while HTTP(S) detection is case-insensitive; consider using `lower_source` for these checks as well so `BASE64://` and `FILE://` variants behave consistently.
- When decoding `base64://` sources, you already catch `binascii.Error`/`ValueError`; using `base64.b64decode(raw_base64, validate=True)` would make invalid base64 detection stricter and avoid silently accepting malformed input.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
变更说明
修复 OpenAI 兼容链路中图片 data URI MIME 被固定为
image/jpeg的问题,并统一图片来源到 data URI 的转换逻辑。
主要改动
io.py新增detect_image_mime_type(data: bytes)。io.py新增image_source_to_data_uri(image_source: str)。ProviderRequest._encode_image_bs64改为复用统一 helper。ProviderOpenAIOfficial.encode_image_bs64改为复用统一 helper。DEFAULT_IMAGE_MIME_TYPE。http(s),调用方需先下载。行为说明
image/jpeg。data:image/...;base64,...透传并返回原 MIME。base64://...自动识别 MIME,非法 base64 回退image/jpeg。file://URI。data:URI 与不支持 scheme 显式报错。测试覆盖
tests/test_openai_source.pytests/unit/test_astr_main_agent.py验证命令
python -m pytest tests/test_openai_source.py -qpython -m pytest tests/unit/test_astr_main_agent.py -q -k imagepython -m ruff check astrbot/core/utils/io.py astrbot/core/provider/entities.py astrbot/core/provider/sources/openai_source.py tests/test_openai_source.py tests/unit/test_astr_main_agent.py tests/fixtures/image_samples.py测试截图
close #6991
当前文件改动:
涉及的功能模块:
Summary by Sourcery
Unify image source handling by converting local paths, file URIs, base64 sources, and data URIs into data:image/...;base64,... while preserving or inferring the correct MIME type, and update OpenAI-compatible providers to use this shared logic.
New Features:
Bug Fixes:
Enhancements:
Tests: