-
Notifications
You must be signed in to change notification settings - Fork 45
Description
The steps of transcription and translation currently appear to be relatively tightly coupled. We can see that the subtitles generated by the transcription are processed in the translation step.
# file: openlrc/openlrc.py
def process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans):
...
if skip_trans:
shutil.copy(transcribed_opt_sub.filename, final_json_path)
transcribed_opt_sub.filename = final_json_path
return transcribed_opt_sub
...And finally generated in translation worker.
def translation_worker(self, transcription_queue, target_lang, skip_trans, bilingual_sub):
...
# Handle translation
final_subtitle = process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans)
# Generate and move subtitle files
generate_subtitle_files(final_subtitle, base_name, subtitle_format)
...This seems to violate the SRP.
At the same time, even specified skip trans=True , the translation thread will still be started. Users pay for the additional performance overhead even though they are not using it.
I wish we could decouple the two steps of transcription and translation:
- The translation step no longer processes transcribed files.
- The translation thread is no longer started when skip_trans=False is specified.
I am not familiar with nlp related knowledge. But if you agree, maybe I can try to complete this improvement.