Fix MIME type inference logic for speech synthesis instrumentation#115
Fix MIME type inference logic for speech synthesis instrumentation#115Cirilla-zmh wants to merge 2 commits intoalibaba:mainfrom
Conversation
Change-Id: Iaafae7e5f7ad99f0df0cec5652415aa0231b9e25 Co-developed-by: Cursor <noreply@cursor.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes MIME type inference logic for speech synthesis instrumentation by dynamically determining the correct MIME type based on the audio format parameter, rather than using hardcoded values. Additionally, it removes support for the V2 streaming_call functionality and improves time-to-first-token calculation safety.
Changes:
- Added
_convert_speech_format_to_mime_type()helper function to dynamically map DashScope audio format strings (wav, mp3, pcm, opus) to their corresponding MIME types - Changed from Base64Blob to Blob type for audio content storage, now using raw bytes instead of base64-encoded strings
- Removed instrumentation for
SpeechSynthesizerV2.streaming_callmethod (no longer supported) - Added guard condition to prevent negative time-to-first-token calculations in span_utils.py
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/span_utils.py | Added guard condition to prevent negative TTFT calculations when first token time precedes start time |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_speech_synthesis.py | Disabled streaming_call test as functionality is no longer supported |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/multimodal.py | Added MIME type converter, changed from Base64Blob to Blob, added mime_type parameters to response handlers |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/init.py | Exported new _convert_speech_format_to_mime_type function |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/patch/speech_synthesis.py | Integrated MIME type inference, removed streaming_call wrapper |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/patch/init.py | Removed streaming_call wrapper export |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/init.py | Removed streaming_call instrumentation and uninstrumentation |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/README.rst | Added Speech Synthesis V1 and V2 to supported APIs documentation |
| instrumentation-loongsuite/loongsuite-instrumentation-dashscope/CHANGELOG.md | Added changelog entry for MIME type fix |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _convert_speech_format_to_mime_type(speech_format: str) -> Optional[str]: | ||
| """Convert from speech format to mime type. | ||
|
|
||
| Args: | ||
| speech_format: speech format of DashScope | ||
|
|
||
| Returns: | ||
| the mime type of speech | ||
| """ | ||
| if speech_format == "wav": | ||
| return "audio/wav" | ||
| elif speech_format == "mp3": | ||
| return "audio/mp3" | ||
| elif speech_format == "pcm": | ||
| return "audio/pcm" | ||
| elif speech_format == "opus": | ||
| return "audio/opus" | ||
| else: | ||
| return None |
There was a problem hiding this comment.
The new function _convert_speech_format_to_mime_type lacks test coverage. Since the repository has comprehensive test coverage for other utility functions and this function contains logic that could fail silently if formats are not mapped correctly, it should have unit tests to verify all supported format conversions including edge cases (unknown formats).
| if speech_format == "wav": | ||
| return "audio/wav" | ||
| elif speech_format == "mp3": | ||
| return "audio/mp3" |
There was a problem hiding this comment.
Consider using "audio/mpeg" instead of "audio/mp3" as the MIME type for MP3 format. While "audio/mp3" is widely recognized, "audio/mpeg" is the official IANA-registered MIME type for MP3 files. Other parts of the codebase (see util/opentelemetry-util-genai/src/opentelemetry/util/genai/_multimodal_upload/pre_uploader.py:331) use "audio/mpeg" as the standard mapping for mp3 extensions.
| return "audio/mp3" | |
| return "audio/mpeg" |
|
|
||
|
|
||
| @skip_without_api_key | ||
| @pytest.mark.skip("Not support now") |
There was a problem hiding this comment.
The skip reason "Not support now" is vague and doesn't explain why the functionality is no longer supported. Consider providing a more descriptive reason, such as "Streaming call functionality has been removed from instrumentation" or reference a specific issue or decision that explains the removal.
| @pytest.mark.skip("Not support now") | |
| @pytest.mark.skip( | |
| "Streaming call functionality for SpeechSynthesizer V2 is currently not supported " | |
| "by the instrumentation test infrastructure (WebSocket-based streaming is disabled)." | |
| ) |
| if ( | ||
| invocation.monotonic_first_token_s is not None | ||
| and invocation.monotonic_start_s is not None | ||
| and invocation.monotonic_first_token_s >= invocation.monotonic_start_s |
There was a problem hiding this comment.
The new guard condition invocation.monotonic_first_token_s >= invocation.monotonic_start_s prevents negative time-to-first-token calculations, but there is no test coverage for this edge case. Consider adding a test that verifies the TTFT attribute is not set when monotonic_first_token_s < monotonic_start_s to ensure this defensive check works as intended.
| if ( | ||
| invocation.monotonic_first_token_s is not None | ||
| and invocation.monotonic_start_s is not None | ||
| and invocation.monotonic_first_token_s >= invocation.monotonic_start_s |
There was a problem hiding this comment.
The PR title and description focus on "Fix MIME type inference logic for speech synthesis instrumentation," but this file change is in a general utility module that applies to all LLM invocations, not just speech synthesis. While the change is valid (preventing negative TTFT calculations), it should either be mentioned in the PR description or potentially split into a separate commit/PR for clarity.
| ### Fixed | ||
|
|
||
| - Fix MIME type inference logic for speech synthesis instrumentation |
There was a problem hiding this comment.
The CHANGELOG entry only mentions "Fix MIME type inference logic" but doesn't document other significant changes in this PR: removal of streaming_call wrapper functionality, change from Base64Blob to Blob for audio content, and the guard condition added in span_utils.py. Consider expanding the changelog entry to provide a more complete picture of the changes, or split these into separate entries under different categories (e.g., "Removed", "Changed", "Fixed").
| ### Fixed | |
| - Fix MIME type inference logic for speech synthesis instrumentation | |
| ### Changed | |
| - Change audio content from `Base64Blob` to `Blob` for speech synthesis instrumentation | |
| ([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115)) | |
| ### Removed | |
| - Remove deprecated `streaming_call` wrapper functionality from DashScope instrumentation | |
| ([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115)) | |
| ### Fixed | |
| - Fix MIME type inference logic for speech synthesis instrumentation and add guard condition in | |
| `span_utils.py` to avoid creating invalid spans |
Change-Id: Iaafae7e5f7ad99f0df0cec5652415aa0231b9e25
Co-developed-by: Cursor noreply@cursor.com
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.