Enable message-key input in listen API
Open, Needs TriagePublic3 Estimated Story Points
Actions

Description

UI strings coming from mw.msg() (message-based strings) should be pre-segmented and synthesized using the existing PreSynthesizeMessages maintenance script (T387284), and then referenced by their message key via ApiWikispeechListen (which will need to be extended to support message-based requests).

Details

Related Changes in Gerrit:

	Subject	Repo	Branch	Lines +/-
	Enable message-key input in listen API	mediawiki/extensions/Wikispeech	master	+224 -26

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T396579 ☂ Play interface
		Open		Viktoria_Hillerud_WMSE	T407474 Enable message-key input in listen API

Event Timeline

Viktoria_Hillerud_WMSE created this task.Oct 16 2025, 8:45 AM

Viktoria_Hillerud_WMSE mentioned this in T402622: Store utterances from UI content (DOM-strings) in Wikispeech segmentation storage.

Viktoria_Hillerud_WMSE added a project: User-Viktoria_Hillerud_WMSE.

Viktoria_Hillerud_WMSE set the point value for this task to 16.

Viktoria_Hillerud_WMSE moved this task from Incoming to Sprint on the Wikispeech-Jobrunner board.

Viktoria_Hillerud_WMSE edited projects, added Wikispeech-Jobrunner (Sprint); removed Wikispeech-Jobrunner.

Viktoria_Hillerud_WMSE moved this task from Backlog to In progress on the Wikispeech-Jobrunner (Sprint) board.

Viktoria_Hillerud_WMSE moved this task from Backlog to This week on the User-Viktoria_Hillerud_WMSE board.Oct 16 2025, 8:47 AM

Viktoria_Hillerud_WMSE claimed this task.Oct 16 2025, 12:25 PM

Change #1196801 had a related patch set uploaded (by Viktoria Hillerud WMSE; author: Viktoria Hillerud WMSE):

[mediawiki/extensions/Wikispeech@master] Store utterances from MW:s message system in Wikispeech segmentation storage

https://gerrit.wikimedia.org/r/1196801

gerritbot added a project: Patch-For-Review.Oct 17 2025, 7:51 AM

I am a bit confused about how the wikispeech-listen action should or should not recieve a parameter ' message-key' if we want the action to automatically call the preSynthesizeMessages::synthesizeErrorMessage()?

Viktoria_Hillerud_WMSE added a project: User-Sebastian_Berlin-WMSE.Oct 20 2025, 7:53 AM

Viktoria_Hillerud_WMSE moved this task from Backlog to Reviewing on the User-Sebastian_Berlin-WMSE board.

Viktoria_Hillerud_WMSE added a comment.Oct 20 2025, 8:18 AM

This comment was removed by Viktoria_Hillerud_WMSE.

The message utterance should be created by the maintenance script beforehand. The API should just fetch the correct message utterance. We need to tell it where the utterance comes from, similarly to revision for page content. You should be able to use the messages keys for this since they are unique identifiers.

Since this is related to T387284 - Pre-synthesise messages, I noticed an issue in the way the script handles message utterances:

when running:

$this->utteranceGenerator->getUtterance( 
null, 
$voice, 
$language, 
0, 
$segment, 
$messageKey )
;)

correctly triggers findMessageUtterance() when $pageId === 0, and that method does find an existing utterance. However, the result seems to be ignored, the code still proceeds to synthesize and store a duplicate.
I suspect a bug in findMessageUtterance() or retrieveUtteranceMetadata(), and will investigate further, as this likely affects other parts of the system (e.g. ApiWikispeechListen.php) in the same way.

The issue occurred because the database contained references to utterances (rows) for certain messageKey values (e.g., 'wikispeech-error-loading-audio-title'), but the corresponding audio files were missing from the file backend, because I had manually removed them earlier... 🤦

getUtterance() retrieved these metadata entries from the database (via retrieveUtteranceMetadata()), but when trying to load the actual audio using loadUtteranceAudio(), it failed, since the .opus files no longer existed.
I removed the outdated utterance rows for the problematic messageKeys directly from the database and once removed, the system could correctly regenerate utterances using PreSynthesizeMessages.php, ensuring that both the database and audio files were in sync again.

So turns out the bug wasn’t in the code logic itself but was caused by a desynchronization between the utterance table and the audio files, due to manual file deletion 😅

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Oct 24 2025, 12:12 PM

It might be worth it to create a task to handle this. I don't think there need to be a way to fix it automatically, but at least an error message that makes it clear what the issue is. I'd guess this is not just an issue for the message utterances, but for utterances in general.

In T407474#11310614, @Sebastian_Berlin-WMSE wrote:

It might be worth it to create a task to handle this. I don't think there need to be a way to fix it automatically, but at least an error message that makes it clear what the issue is. I'd guess this is not just an issue for the message utterances, but for utterances in general.

Ah, you mean a task for when the utterance-files and audio-files are out of sync?

Yes.

I don't think this is something that's likely to happen without manually moving things around. However it sounds like when it does happen it's quite opaque. It could save people time and confusion down the line.

In T407474#11310701, @Sebastian_Berlin-WMSE wrote:

Yes.

I don't think this is something that's likely to happen without manually moving things around. However it sounds like when it does happen it's quite opaque. It could save people time and confusion down the line.

Good point, I'll make a task for it!

Viktoria_Hillerud_WMSE mentioned this in T408367: Handle missing audio files for existing utterance entries.Oct 27 2025, 8:59 AM

Sebastian_Berlin-WMSE updated the task description. (Show Details)Oct 28 2025, 9:52 AM

I tried this in Special:ApiSandbox and if I give a non existing message key it still runs:

{
    "wikispeech-listen": {
        "message-key": "not-a-message",
        "segment-hash": "6740979a8e53384c70f9d55b464163690233f7e0702f0297064776b3e3f8cebd",
        "audio": "[...]",
        "tokens": [
            {
                "endtime": 1130,
                "expanded": "⧼not a message⧽",
                "orth": "⧼not-a-message⧽"
            },
            {
                "endtime": 1535,
                "orth": ""
            }
        ]
    }
}

Also the name of this task isn't really right. From what I can tell what it does is allowing you to use messages as input for the listen API.

In T407474#11318250, @Sebastian_Berlin-WMSE wrote:

Also the name of this task isn't really right. From what I can tell what it does is allowing you to use messages as input for the listen API.

But isn't this what we want? To use messages as input for the listen API?

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Oct 28 2025, 12:28 PM

Yes, but the task title talks about "segmentation storage" (which I don't know what it refers to).

I changed the title for the task, hoping it will make things clearer, and regarding the error you got regarding if giving a non existing message key as a parameter, it was a good point, and I will fix that, checking that the message-key already is presynthesized right?

I think you have to check that the message key exists at all. If you ask MW for a messages and it can't find it you get the "⧼...⧽" with the messages key. I think this is because it's better to have something, even if it's a cryptic message key, than nothing at all in some places. In this case we don't want that however.

Just to make sure I understand your point correctly:
I will add a check to ensure the key actually exists in MWs message system, so we don’t get the fallback ⧼not-a-message⧽.

That said, I’m also performing a check to verify that the utterance has been pre-synthesized, i.e., that PreSynthesizeMessages.php has been run and audio exists.
I believe both validations are necessary to:

First: to ensure the message exists
Then: to ensure the utterance has been pre-synthesized (retrieveUtteranceMetadata() + loadUtteranceAudio())

Otherwise, it seems like running the preSynthesizing script would serve no real purpose? Just wanted to check if you agree with this approach?

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Oct 29 2025, 9:15 AM

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.

Viktoria_Hillerud_WMSE changed the point value for this task from 16 to 6.Oct 29 2025, 9:39 AM

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Oct 30 2025, 9:05 AM

When creating this task: T408367 - Handle missing audio files for existing utterance entries.
I didn't really think of that I already was up to implement the error handling for when a message is out of sync with the database (i.e the audio is manually removed, but the metadata still exists in the database), so I am thinking about closing that task

I think most of the additions to ApiWikispeechListen should live somewhere else. They're not API logic. This is the same reason that we're moving logic out of it in T408813. I think the new code also makes sense to put in UtteranceGenerator. It could deal with all utterances, be they from page content or messages.

Good point, I'll move that specific logic to UtteranceGenerator.

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Nov 3 2025, 8:25 AM

Now it is broken out to UtteranceGenerator

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Nov 3 2025, 1:23 PM

When you use the API to retrieve utterances the response can include multiple segments. For content you only ever have one. Have you considered how this may impact network usage? If you have a multi sentence message it'll fetch the whole thing before it can start to play.

Also, the logic for playing utterances we have now works one segment at a time. This could mean you need to make more changes to that to support messages. Not sure how much that would be though.

Just to clarify, are you suggesting that instead of returning all segments (with audio) for a message key in one API call, we should:

either return only one segment at a time, like we do with page content, or

return only metadata first, and let the client fetch each segment’s audio individually?

The current implementation sends all segments + audio in one response. Do you see a need to change this for performance or to better match how the client handles playback (one segment at a time)?

I'm not sure we need to, but I was wondering if you've investigated it. For instance how long are the longest messages? I'd guess most of them are short (a few words to a sentence), but there may be longer ones.

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Nov 13 2025, 3:08 PM

I took a look at the i18n messages we're targeting, and most of them are very short, typically a few words, like "Main menu", "Tools", "Appearance", and similar.

Here are some examples from the current en.json:

vector-main-menu-label: "Main menu"
vector-toc-label: "Contents"
vector-feature-custom-font-size-0-label: "Small"
skin-theme-night-label: "Dark"

Based on this, I think it's safe to assume these messages won't cause any significant payload issues when fetching all segments + audio at once.
But it's definitely good that you raised this, if longer messages are added later, for example when DOM-strings (T402622) should be introduced, since they could contain configurable strings made from a "wiki-maintainer"(?), it might be worth revisiting..

So either we do this in this patch, or create a follow up task about this. What would you prefer @Sebastian_Berlin-WMSE

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Nov 14 2025, 8:39 AM

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Nov 18 2025, 8:51 AM

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Nov 18 2025, 10:48 AM

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Wed, Nov 19, 9:24 AM

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Wed, Nov 19, 10:31 AM

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Thu, Nov 20, 11:01 AM

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Thu, Nov 20, 12:54 PM

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Fri, Nov 21, 12:10 PM

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Fri, Nov 21, 1:41 PM

Viktoria_Hillerud_WMSE moved this task from Waiting to This week on the User-Viktoria_Hillerud_WMSE board.Mon, Dec 1, 8:55 AM

Viktoria_Hillerud_WMSE moved this task from This week to Waiting on the User-Viktoria_Hillerud_WMSE board.Mon, Dec 1, 9:48 AM