Fix MIME type inference logic for speech synthesis instrumentation by Cirilla-zmh · Pull Request #115 · alibaba/loongsuite-python-agent

Cirilla-zmh · 2026-02-05T10:20:41Z

Change-Id: Iaafae7e5f7ad99f0df0cec5652415aa0231b9e25
Co-developed-by: Cursor noreply@cursor.com

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Fix unit tests

Does This PR Require a Core Repo Change?

No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

Change-Id: Iaafae7e5f7ad99f0df0cec5652415aa0231b9e25 Co-developed-by: Cursor <noreply@cursor.com>

Change-Id: I50db0082ded6ae85b5d29e1237e8b817b964cd33 Co-developed-by: Cursor <noreply@cursor.com>

Copilot

Pull request overview

This PR fixes MIME type inference logic for speech synthesis instrumentation by dynamically determining the correct MIME type based on the audio format parameter, rather than using hardcoded values. Additionally, it removes support for the V2 streaming_call functionality and improves time-to-first-token calculation safety.

Changes:

Added _convert_speech_format_to_mime_type() helper function to dynamically map DashScope audio format strings (wav, mp3, pcm, opus) to their corresponding MIME types
Changed from Base64Blob to Blob type for audio content storage, now using raw bytes instead of base64-encoded strings
Removed instrumentation for SpeechSynthesizerV2.streaming_call method (no longer supported)
Added guard condition to prevent negative time-to-first-token calculations in span_utils.py

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
util/opentelemetry-util-genai/src/opentelemetry/util/genai/span_utils.py	Added guard condition to prevent negative TTFT calculations when first token time precedes start time
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_speech_synthesis.py	Disabled streaming_call test as functionality is no longer supported
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/multimodal.py	Added MIME type converter, changed from Base64Blob to Blob, added mime_type parameters to response handlers
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/init.py	Exported new `_convert_speech_format_to_mime_type` function
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/patch/speech_synthesis.py	Integrated MIME type inference, removed streaming_call wrapper
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/patch/init.py	Removed streaming_call wrapper export
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/init.py	Removed streaming_call instrumentation and uninstrumentation
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/README.rst	Added Speech Synthesis V1 and V2 to supported APIs documentation
instrumentation-loongsuite/loongsuite-instrumentation-dashscope/CHANGELOG.md	Added changelog entry for MIME type fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T11:13:10Z

...te-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/multimodal.py

+def _convert_speech_format_to_mime_type(speech_format: str) -> Optional[str]:
+    """Convert from speech format to mime type.
+
+    Args:
+        speech_format: speech format of DashScope
+
+    Returns:
+        the mime type of speech
+    """
+    if speech_format == "wav":
+        return "audio/wav"
+    elif speech_format == "mp3":
+        return "audio/mp3"
+    elif speech_format == "pcm":
+        return "audio/pcm"
+    elif speech_format == "opus":
+        return "audio/opus"
+    else:
+        return None


The new function _convert_speech_format_to_mime_type lacks test coverage. Since the repository has comprehensive test coverage for other utility functions and this function contains logic that could fail silently if formats are not mapped correctly, it should have unit tests to verify all supported format conversions including edge cases (unknown formats).

Copilot · 2026-02-05T11:13:11Z

...te-instrumentation-dashscope/src/opentelemetry/instrumentation/dashscope/utils/multimodal.py

+    if speech_format == "wav":
+        return "audio/wav"
+    elif speech_format == "mp3":
+        return "audio/mp3"


Consider using "audio/mpeg" instead of "audio/mp3" as the MIME type for MP3 format. While "audio/mp3" is widely recognized, "audio/mpeg" is the official IANA-registered MIME type for MP3 files. Other parts of the codebase (see util/opentelemetry-util-genai/src/opentelemetry/util/genai/_multimodal_upload/pre_uploader.py:331) use "audio/mpeg" as the standard mapping for mp3 extensions.

Suggested change

return "audio/mp3"

return "audio/mpeg"

Copilot · 2026-02-05T11:13:11Z

instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_speech_synthesis.py



-@skip_without_api_key
+@pytest.mark.skip("Not support now")


The skip reason "Not support now" is vague and doesn't explain why the functionality is no longer supported. Consider providing a more descriptive reason, such as "Streaming call functionality has been removed from instrumentation" or reference a specific issue or decision that explains the removal.

Suggested change

@pytest.mark.skip("Not support now")

@pytest.mark.skip(

"Streaming call functionality for SpeechSynthesizer V2 is currently not supported "

"by the instrumentation test infrastructure (WebSocket-based streaming is disabled)."

)

Copilot · 2026-02-05T11:13:11Z

util/opentelemetry-util-genai/src/opentelemetry/util/genai/span_utils.py

    if (
        invocation.monotonic_first_token_s is not None
        and invocation.monotonic_start_s is not None
+        and invocation.monotonic_first_token_s >= invocation.monotonic_start_s


The new guard condition invocation.monotonic_first_token_s >= invocation.monotonic_start_s prevents negative time-to-first-token calculations, but there is no test coverage for this edge case. Consider adding a test that verifies the TTFT attribute is not set when monotonic_first_token_s < monotonic_start_s to ensure this defensive check works as intended.

Copilot · 2026-02-05T11:13:11Z

util/opentelemetry-util-genai/src/opentelemetry/util/genai/span_utils.py

    if (
        invocation.monotonic_first_token_s is not None
        and invocation.monotonic_start_s is not None
+        and invocation.monotonic_first_token_s >= invocation.monotonic_start_s


The PR title and description focus on "Fix MIME type inference logic for speech synthesis instrumentation," but this file change is in a general utility module that applies to all LLM invocations, not just speech synthesis. While the change is valid (preventing negative TTFT calculations), it should either be mentioned in the PR description or potentially split into a separate commit/PR for clarity.

Copilot · 2026-02-05T11:13:12Z

instrumentation-loongsuite/loongsuite-instrumentation-dashscope/CHANGELOG.md

+### Fixed
+
+- Fix MIME type inference logic for speech synthesis instrumentation


The CHANGELOG entry only mentions "Fix MIME type inference logic" but doesn't document other significant changes in this PR: removal of streaming_call wrapper functionality, change from Base64Blob to Blob for audio content, and the guard condition added in span_utils.py. Consider expanding the changelog entry to provide a more complete picture of the changes, or split these into separate entries under different categories (e.g., "Removed", "Changed", "Fixed").

Suggested change

### Fixed

- Fix MIME type inference logic for speech synthesis instrumentation

### Changed

- Change audio content from `Base64Blob` to `Blob` for speech synthesis instrumentation

([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115))

### Removed

- Remove deprecated `streaming_call` wrapper functionality from DashScope instrumentation

([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115))

### Fixed

- Fix MIME type inference logic for speech synthesis instrumentation and add guard condition in

`span_utils.py` to avoid creating invalid spans

Cirilla-zmh added 2 commits February 5, 2026 18:16

Fix MIME type inference logic for speech synthesis instrumentation

f9ef91a

Change-Id: Iaafae7e5f7ad99f0df0cec5652415aa0231b9e25 Co-developed-by: Cursor <noreply@cursor.com>

Fix ttft

aafa0bd

Change-Id: I50db0082ded6ae85b5d29e1237e8b817b964cd33 Co-developed-by: Cursor <noreply@cursor.com>

github-actions bot assigned 123liuziming, Cirilla-zmh and ralf0131 Feb 5, 2026

123liuziming requested a review from Copilot February 5, 2026 11:06

Copilot started reviewing on behalf of 123liuziming February 5, 2026 11:06 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MIME type inference logic for speech synthesis instrumentation#115

Fix MIME type inference logic for speech synthesis instrumentation#115
Cirilla-zmh wants to merge 2 commits intoalibaba:mainfrom
Cirilla-zmh:minghui/multimodal

Cirilla-zmh commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		### Fixed

		- Fix MIME type inference logic for speech synthesis instrumentation

-### Fixed
-- Fix MIME type inference logic for speech synthesis instrumentation
+### Changed
+- Change audio content from `Base64Blob` to `Blob` for speech synthesis instrumentation
+  ([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115))
+### Removed
+- Remove deprecated `streaming_call` wrapper functionality from DashScope instrumentation
+  ([#115](https://github.com/alibaba/loongsuite-python-agent/pull/115))
+### Fixed
+- Fix MIME type inference logic for speech synthesis instrumentation and add guard condition in
+  `span_utils.py` to avoid creating invalid spans

Conversation

Cirilla-zmh commented Feb 5, 2026

Description

Type of change

How Has This Been Tested?

Does This PR Require a Core Repo Change?

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants