Skip to content

Fix MIME type inference logic for speech synthesis instrumentation#115

Open
Cirilla-zmh wants to merge 2 commits intoalibaba:mainfrom
Cirilla-zmh:minghui/multimodal
Open

Fix MIME type inference logic for speech synthesis instrumentation#115
Cirilla-zmh wants to merge 2 commits intoalibaba:mainfrom
Cirilla-zmh:minghui/multimodal

Conversation

@Cirilla-zmh
Copy link
Collaborator

Change-Id: Iaafae7e5f7ad99f0df0cec5652415aa0231b9e25
Co-developed-by: Cursor noreply@cursor.com

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Fix unit tests

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Change-Id: Iaafae7e5f7ad99f0df0cec5652415aa0231b9e25
Co-developed-by: Cursor <noreply@cursor.com>
Change-Id: I50db0082ded6ae85b5d29e1237e8b817b964cd33
Co-developed-by: Cursor <noreply@cursor.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes MIME type inference logic for speech synthesis instrumentation by dynamically determining the correct MIME type based on the audio format parameter, rather than using hardcoded values. Additionally, it removes support for the V2 streaming_call functionality and improves time-to-first-token calculation safety.

Changes:

  • Added _convert_speech_format_to_mime_type() helper function to dynamically map DashScope audio format strings (wav, mp3, pcm, opus) to their corresponding MIME types
  • Changed from Base64Blob to Blob type for audio content storage, now using raw bytes instead of base64-encoded strings
  • Removed instrumentation for SpeechSynthesizerV2.streaming_call method (no longer supported)
  • Added guard condition to prevent negative time-to-first-token calculations in span_utils.py

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
util/opentelemetry-util-genai/src/opentelemetry/util/genai/span_utils.py Added guard condition to prevent negative TTFT calculations when first token time precedes start time
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_speech_synthesis.py Disabled streaming_call test as functionality is no longer supported
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/multimodal.py Added MIME type converter, changed from Base64Blob to Blob, added mime_type parameters to response handlers
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/init.py Exported new _convert_speech_format_to_mime_type function
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/patch/speech_synthesis.py Integrated MIME type inference, removed streaming_call wrapper
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/patch/init.py Removed streaming_call wrapper export
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/init.py Removed streaming_call instrumentation and uninstrumentation
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/README.rst Added Speech Synthesis V1 and V2 to supported APIs documentation
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/CHANGELOG.md Added changelog entry for MIME type fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +815 to +833
def _convert_speech_format_to_mime_type(speech_format: str) -> Optional[str]:
"""Convert from speech format to mime type.

Args:
speech_format: speech format of DashScope

Returns:
the mime type of speech
"""
if speech_format == "wav":
return "audio/wav"
elif speech_format == "mp3":
return "audio/mp3"
elif speech_format == "pcm":
return "audio/pcm"
elif speech_format == "opus":
return "audio/opus"
else:
return None
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new function _convert_speech_format_to_mime_type lacks test coverage. Since the repository has comprehensive test coverage for other utility functions and this function contains logic that could fail silently if formats are not mapped correctly, it should have unit tests to verify all supported format conversions including edge cases (unknown formats).

Copilot uses AI. Check for mistakes.
if speech_format == "wav":
return "audio/wav"
elif speech_format == "mp3":
return "audio/mp3"
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using "audio/mpeg" instead of "audio/mp3" as the MIME type for MP3 format. While "audio/mp3" is widely recognized, "audio/mpeg" is the official IANA-registered MIME type for MP3 files. Other parts of the codebase (see util/opentelemetry-util-genai/src/opentelemetry/util/genai/_multimodal_upload/pre_uploader.py:331) use "audio/mpeg" as the standard mapping for mp3 extensions.

Suggested change
return "audio/mp3"
return "audio/mpeg"

Copilot uses AI. Check for mistakes.


@skip_without_api_key
@pytest.mark.skip("Not support now")
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skip reason "Not support now" is vague and doesn't explain why the functionality is no longer supported. Consider providing a more descriptive reason, such as "Streaming call functionality has been removed from instrumentation" or reference a specific issue or decision that explains the removal.

Suggested change
@pytest.mark.skip("Not support now")
@pytest.mark.skip(
"Streaming call functionality for SpeechSynthesizer V2 is currently not supported "
"by the instrumentation test infrastructure (WebSocket-based streaming is disabled)."
)

Copilot uses AI. Check for mistakes.
if (
invocation.monotonic_first_token_s is not None
and invocation.monotonic_start_s is not None
and invocation.monotonic_first_token_s >= invocation.monotonic_start_s
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new guard condition invocation.monotonic_first_token_s >= invocation.monotonic_start_s prevents negative time-to-first-token calculations, but there is no test coverage for this edge case. Consider adding a test that verifies the TTFT attribute is not set when monotonic_first_token_s < monotonic_start_s to ensure this defensive check works as intended.

Copilot uses AI. Check for mistakes.
if (
invocation.monotonic_first_token_s is not None
and invocation.monotonic_start_s is not None
and invocation.monotonic_first_token_s >= invocation.monotonic_start_s
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title and description focus on "Fix MIME type inference logic for speech synthesis instrumentation," but this file change is in a general utility module that applies to all LLM invocations, not just speech synthesis. While the change is valid (preventing negative TTFT calculations), it should either be mentioned in the PR description or potentially split into a separate commit/PR for clarity.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +18
### Fixed

- Fix MIME type inference logic for speech synthesis instrumentation
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CHANGELOG entry only mentions "Fix MIME type inference logic" but doesn't document other significant changes in this PR: removal of streaming_call wrapper functionality, change from Base64Blob to Blob for audio content, and the guard condition added in span_utils.py. Consider expanding the changelog entry to provide a more complete picture of the changes, or split these into separate entries under different categories (e.g., "Removed", "Changed", "Fixed").

Suggested change
### Fixed
- Fix MIME type inference logic for speech synthesis instrumentation
### Changed
- Change audio content from `Base64Blob` to `Blob` for speech synthesis instrumentation
([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115))
### Removed
- Remove deprecated `streaming_call` wrapper functionality from DashScope instrumentation
([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115))
### Fixed
- Fix MIME type inference logic for speech synthesis instrumentation and add guard condition in
`span_utils.py` to avoid creating invalid spans

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants