UI strings coming from mw.msg() (message-based strings) should be pre-segmented and synthesized using the existing PreSynthesizeMessages maintenance script (T387284), and then referenced by their message key via ApiWikispeechListen (which will need to be extended to support message-based requests).
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| Enable message-key input in listen API | mediawiki/extensions/Wikispeech | master | +224 -26 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T396579 ☂ Play interface | |||
| Open | Viktoria_Hillerud_WMSE | T407474 Enable message-key input in listen API |
Event Timeline
Change #1196801 had a related patch set uploaded (by Viktoria Hillerud WMSE; author: Viktoria Hillerud WMSE):
[mediawiki/extensions/Wikispeech@master] Store utterances from MW:s message system in Wikispeech segmentation storage
I am a bit confused about how the wikispeech-listen action should or should not recieve a parameter ' message-key' if we want the action to automatically call the preSynthesizeMessages::synthesizeErrorMessage()?
The message utterance should be created by the maintenance script beforehand. The API should just fetch the correct message utterance. We need to tell it where the utterance comes from, similarly to revision for page content. You should be able to use the messages keys for this since they are unique identifiers.
Since this is related to T387284 - Pre-synthesise messages, I noticed an issue in the way the script handles message utterances:
when running:
$this->utteranceGenerator->getUtterance( null, $voice, $language, 0, $segment, $messageKey ) ;)
correctly triggers findMessageUtterance() when $pageId === 0, and that method does find an existing utterance. However, the result seems to be ignored, the code still proceeds to synthesize and store a duplicate.
I suspect a bug in findMessageUtterance() or retrieveUtteranceMetadata(), and will investigate further, as this likely affects other parts of the system (e.g. ApiWikispeechListen.php) in the same way.
The issue occurred because the database contained references to utterances (rows) for certain messageKey values (e.g., 'wikispeech-error-loading-audio-title'), but the corresponding audio files were missing from the file backend, because I had manually removed them earlier... 🤦
getUtterance() retrieved these metadata entries from the database (via retrieveUtteranceMetadata()), but when trying to load the actual audio using loadUtteranceAudio(), it failed, since the .opus files no longer existed.
I removed the outdated utterance rows for the problematic messageKeys directly from the database and once removed, the system could correctly regenerate utterances using PreSynthesizeMessages.php, ensuring that both the database and audio files were in sync again.
So turns out the bug wasn’t in the code logic itself but was caused by a desynchronization between the utterance table and the audio files, due to manual file deletion 😅
It might be worth it to create a task to handle this. I don't think there need to be a way to fix it automatically, but at least an error message that makes it clear what the issue is. I'd guess this is not just an issue for the message utterances, but for utterances in general.
Yes.
I don't think this is something that's likely to happen without manually moving things around. However it sounds like when it does happen it's quite opaque. It could save people time and confusion down the line.
I tried this in Special:ApiSandbox and if I give a non existing message key it still runs:
{ "wikispeech-listen": { "message-key": "not-a-message", "segment-hash": "6740979a8e53384c70f9d55b464163690233f7e0702f0297064776b3e3f8cebd", "audio": "[...]", "tokens": [ { "endtime": 1130, "expanded": "⧼not a message⧽", "orth": "⧼not-a-message⧽" }, { "endtime": 1535, "orth": "" } ] } }
Also the name of this task isn't really right. From what I can tell what it does is allowing you to use messages as input for the listen API.
Yes, but the task title talks about "segmentation storage" (which I don't know what it refers to).
I changed the title for the task, hoping it will make things clearer, and regarding the error you got regarding if giving a non existing message key as a parameter, it was a good point, and I will fix that, checking that the message-key already is presynthesized right?
I think you have to check that the message key exists at all. If you ask MW for a messages and it can't find it you get the "⧼...⧽" with the messages key. I think this is because it's better to have something, even if it's a cryptic message key, than nothing at all in some places. In this case we don't want that however.
Just to make sure I understand your point correctly:
I will add a check to ensure the key actually exists in MWs message system, so we don’t get the fallback ⧼not-a-message⧽.
That said, I’m also performing a check to verify that the utterance has been pre-synthesized, i.e., that PreSynthesizeMessages.php has been run and audio exists.
I believe both validations are necessary to:
- First: to ensure the message exists
- Then: to ensure the utterance has been pre-synthesized (retrieveUtteranceMetadata() + loadUtteranceAudio())
Otherwise, it seems like running the preSynthesizing script would serve no real purpose? Just wanted to check if you agree with this approach?
When creating this task: T408367 - Handle missing audio files for existing utterance entries.
I didn't really think of that I already was up to implement the error handling for when a message is out of sync with the database (i.e the audio is manually removed, but the metadata still exists in the database), so I am thinking about closing that task
I think most of the additions to ApiWikispeechListen should live somewhere else. They're not API logic. This is the same reason that we're moving logic out of it in T408813. I think the new code also makes sense to put in UtteranceGenerator. It could deal with all utterances, be they from page content or messages.
When you use the API to retrieve utterances the response can include multiple segments. For content you only ever have one. Have you considered how this may impact network usage? If you have a multi sentence message it'll fetch the whole thing before it can start to play.
Also, the logic for playing utterances we have now works one segment at a time. This could mean you need to make more changes to that to support messages. Not sure how much that would be though.
Just to clarify, are you suggesting that instead of returning all segments (with audio) for a message key in one API call, we should:
- either return only one segment at a time, like we do with page content, or
- return only metadata first, and let the client fetch each segment’s audio individually?
The current implementation sends all segments + audio in one response. Do you see a need to change this for performance or to better match how the client handles playback (one segment at a time)?
I'm not sure we need to, but I was wondering if you've investigated it. For instance how long are the longest messages? I'd guess most of them are short (a few words to a sentence), but there may be longer ones.
I took a look at the i18n messages we're targeting, and most of them are very short, typically a few words, like "Main menu", "Tools", "Appearance", and similar.
Here are some examples from the current en.json:
vector-main-menu-label: "Main menu" vector-toc-label: "Contents" vector-feature-custom-font-size-0-label: "Small" skin-theme-night-label: "Dark"
Based on this, I think it's safe to assume these messages won't cause any significant payload issues when fetching all segments + audio at once.
But it's definitely good that you raised this, if longer messages are added later, for example when DOM-strings (T402622) should be introduced, since they could contain configurable strings made from a "wiki-maintainer"(?), it might be worth revisiting..
So either we do this in this patch, or create a follow up task about this. What would you prefer @Sebastian_Berlin-WMSE
Change #1196801 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Enable message-key input in listen API