transcribe: add model path to vosk Model#12479
Conversation
LocalStack Community integration with Pro 2 files ± 0 2 suites ±0 1m 20s ⏱️ - 1h 51m 30s Results for commit 77f25ee. ± Comparison against base commit 073eab9. This pull request removes 4325 tests.♻️ This comment has been updated with latest results. |
68ad955 to
827124f
Compare
viren-nadkarni
left a comment
There was a problem hiding this comment.
These tests have been sticking point, hopefully this should resolve it once and for all 🤞
| from vosk import KaldiRecognizer, Model # noqa | ||
|
|
||
| model = Model(model_name=model_name) | ||
| model = Model(model_path=str(model_path), model_name=model_name) |
There was a problem hiding this comment.
I understand it's implemented in C, but did you look into why the cache detection didn't work?
There was a problem hiding this comment.
I am not entirely sure at the moment, but the changes in this PR are based on two key observations:
- The vosk
Modelmay attempt to download a model when the model directory is empty due to an incorrectmodel_pathbeing used. We want to make sure that we useVOSK_MODEL_PATHas the path. - When checking the LocalStack cache, we only verify whether the model path exists. It's possible that the directory exists but doesn't actually contain any model files.
I have added a log to check if we use a cached model and log its path.
We can take as a next step to verify the checksum if the tests remain flaky, let me know thoughts on it?
There was a problem hiding this comment.
The vosk Model may attempt to download a model when the model directory is empty due to an incorrect model_path being used. We want to make sure that we use VOSK_MODEL_PATH as the path.
If model path is now passed as part of Model instantiation, is it still necessary to set VOSK_MODEL_PATH?
When checking the LocalStack cache, we only verify whether the model path exists. It's possible that the directory exists but doesn't actually contain any model files.
Any idea what would lead to this situation, where empty directories are created?
I think our CI networks are pretty resilient and shouldn't cause datastream to be corrupted, otherwise this would pop up in several other tests that we have. Adding checksum verification is probably unnecessary, I suspect the root cause is something else.
There was a problem hiding this comment.
Yes absolutely, thanks for pointing that out 🙌 I have removed VOSK_MODEL_PATH in b6c7b46.
One of the possible reasons of failures could be that we are setting up cache directory from the environment variable. Which needs to be set before vosk module is imported. We currently import vosk in different locations - which could be the possible issue of cache path not being set.
|
I am currently working on investigating |
f34abe5 to
280df94
Compare
|
For the latest runs of the PR, the pipeline is green. Current findings for
As discussed @alexrashed do you think there is something more I can verify? Please let me know your thoughts here @alexrashed @viren-nadkarni. |
alexrashed
left a comment
There was a problem hiding this comment.
Thanks for the deliberate investigation of the issues with the transcribe tests! 💯
The implemented fix for the model caching will hopefully fix some of the instabilities with the transcribe tests. When it comes to the issues with the Python package download, I can also only imagine that this is due to some kind of issues on the side of PyPi, their CDN, or some kind of rate limiting with the CI/CD runners. In my opinion we can merge this PR and unskip the tests to see if they stay stable on master.
|
Successful Github run on |
Motivation
CI flaky test runs for transcribe:
Changes
Fix the vosk
Modelclass downloading issue by providingmodel_pathto prevent re-downloading of the model and ensure accurate and consistent model path is being used.Added a check to see if the model path is non-empty along with a log statement for the path.
Enabled the skipped flaky tests: #12473.
Testing
Running the pipeline a few times to ensure stability of the pipelines:
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/a3f797d5-461a-4a7b-ad47-e977efb253ee
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/5b74f6a1-242b-4549-bce6-610541b95388
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/23981c99-96e2-4eb7-be83-a800d0fb7b9b
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/c98da336-23f4-4696-8c8a-e223a8323d40
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/0162b75f-3be3-4ba1-9dcb-222472b0b4f3
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32055/workflows/f6c63c2f-747c-4dbc-8e6d-49c673eeb6fc
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32093/workflows/db5d09e3-792d-4d91-b8d0-b7822cd09c7a
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32102/workflows/48508602-6c4c-44ca-b5fe-1a68efb97293
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32164/workflows/42109e13-4c81-41e6-b744-a32d1e81bc0a
✅ https://app.circleci.com/pipelines/github/localstack/localstack/32167/workflows/c483268e-802b-4a41-86ff-8775df25cec0
TODO
localstack.packages.api.PackageException: Installation of vosk 0.3.43 failed.error.voskinstallation failures without model path changes: Test PyPivoskinstallation failures #12510✅ Successful Runs: transcribe: add model path to vosk Model #12479 (comment)