Fix Amazon Transcribe streaming + low-latency captions; auto-build AWS SDK for C++ 1.11.710 (libcurl) on Windows/macOS/Linux#275
Fix Amazon Transcribe streaming + low-latency captions; auto-build AWS SDK for C++ 1.11.710 (libcurl) on Windows/macOS/Linux#275Folotu wants to merge 25 commits into
Conversation
- Add AWS Transcribe SDK streaming client + SigV4 eventstream auth enforcement\n- Improve transcript assembly (items/punctuation), partial/final handling, and pacing\n- Fix trimming bug that dropped first character\n- Add roots.pem + SSL/Windows helpers\n
There was a problem hiding this comment.
Pull request overview
This PR adds AWS Transcribe streaming support for Windows with low-latency captions, fixes authentication issues, and enhances subtitle output capabilities.
Key changes:
- Implements true streaming AWS Transcribe integration with SigV4 event-stream authentication and partial result stabilization
- Adds file-based subtitle output with deduplication to avoid writing partial results
- Includes Windows build automation for AWS SDK C++ with libcurl support
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| src/whisper-utils/cloud-speech.h | Defines CloudSpeechProcessor interface and AWS streaming state structures for multi-provider cloud transcription |
| src/whisper-utils/cloud-speech.cpp | Core implementation of AWS Transcribe streaming with thread-safe audio submission, transcript consumption, and fallback providers |
| src/whisper-utils/aws-memory-manager.h | Custom AWS SDK memory manager wrapper to avoid CRT allocation issues |
| src/whisper-utils/ssl-utils.h/cpp | Utility to resolve PEM certificate path for AWS SDK HTTPS connections |
| src/whisper-utils/whisper-processing.h/cpp | Integrates cloud speech inference path with fallback to local Whisper and streaming transcript updates in main loop |
| src/whisper-utils/vad-processing.cpp | Feeds resampled 16kHz audio continuously to Amazon streaming session and reduces buffering for low-latency |
| src/whisper-utils/token-buffer-thread.cpp | Flushes partial results immediately to avoid caption lag from cloud streaming |
| src/transcription-utils.cpp | Fixes logical bug in remove_leading_trailing_nonalpha using AND instead of OR |
| src/transcription-filter.h | Adds extern declaration for transcription_filter_info |
| src/transcription-filter.cpp | Manages cloud speech processor lifecycle and implements file output open/close logic with runtime switching support |
| src/transcription-filter-data.h | Adds cloud speech configuration, processor, and file output state fields |
| src/transcription-filter-callbacks.cpp | Routes captions to file output and implements finals-only file writing with deduplication |
| src/transcription-filter-utils.cpp | Refactors text source creation to ensure source exists in current scene on multiple events |
| src/transcription-filter-properties.h/cpp | Adds cloud speech provider UI group with credentials, region, model, and fallback options |
| src/plugin-main.c | Initializes and shuts down AWS SDK on plugin load/unload |
| data/locale/en-US.ini | Adds localized strings for cloud speech options and file output |
| CMakeLists.txt | Integrates optional AWS Transcribe SDK detection and cURL client configuration |
| .github/scripts/Build-Windows.ps1 | Adds BuildAwsSdk switch to trigger AWS SDK build automation |
| .github/scripts/Build-AwsSdk-Windows.ps1 | New script to clone, configure, and build AWS SDK C++ with cURL for TranscribeStreaming |
| README.md | Documents AWS Transcribe build requirements and optional SDK automation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This is very cool. I'll try and do a more thorough review when I can but at a quick glance this looks pretty good. I would want it to work on MacOS and Linux as well as Windows but I'd be happy to help get that working and tested This plugin is intended to be used with locally running models without sending data to the cloud, but given there's already the option for cloud translations, I don't think it unreasonable to also have the option to use cloud transcription as well, as long as it's not the default |
Add macOS/Linux AWS SDK build scripts (transcribestreaming only) and integrate into zsh build so ENABLE_AWS_TRANSCRIBE_SDK is defined in CI builds. Cache aws-sdk-built-curl in workflows. Also guard Amazon streaming paths when SDK is absent to avoid silent no-output.
Disable setup-rust-toolchain caching to avoid cargo metadata failures when no top-level Cargo.toml exists, and apply clang-format to the touched C++ sources.
Fix Ubuntu CI failure where aws-sdk-cpp exported targets reference ZLIB::ZLIB but the target is not defined unless ZLIB is found before find_package(AWSSDK).
Run cmake-format on CMakeLists.txt to satisfy the CI cmake-format check.
Use aws/core/utils/threading/Semaphore.h (lowercase path) so Linux builds don't fail on case-sensitive filesystems.
Totally fair points. I agree the primary goal of this plugin is local transcription. The reason I opened this PR here is that the existing cloud path was effectively broken for AWS Transcribe, and I wasn’t sure of the best place to land this work given CloudVocal/aws_transcribe inactive dev state. Since this repo already supports cloud translation, adding an optional cloud transcription provider felt consistent as long as it’s not the default and doesn’t impact local-only users. Re: macOS/Linux - yes, I’d really appreciate your help testing. I did a quick validation in a macOS VM and it appeared to work, but I’d much prefer confirmation on native hardware and on Linux as well. I also updated CI/build scripts so AWS Transcribe support can be built on macOS/Linux (and should no longer be Windows-only). Screenshots from the macOS VM run:
|
|
Finally getting around to testing this, thanks for being patient. It builds and runs fine on Linux fwiw, and I'll get it built for MacOS soon as I now have a MacBook I can use for testing I just need to dust off my AWS account and then I'll test the cloud transcription part of it on both platforms and then we can get it merged |
| CloudSpeechConfig cloud_speech_config; | ||
| std::unique_ptr<CloudSpeechProcessor> cloud_speech_processor; |
There was a problem hiding this comment.
These two lines at least need to be surrounded by a check because if ENABLE_AWS_TRANSCRIBE_SDK is not defined the types won't exist
| CloudSpeechConfig cloud_speech_config; | |
| std::unique_ptr<CloudSpeechProcessor> cloud_speech_processor; | |
| #ifdef ENABLE_AWS_TRANSCRIBE_SDK | |
| CloudSpeechConfig cloud_speech_config; | |
| std::unique_ptr<CloudSpeechProcessor> cloud_speech_processor; | |
| #endif |
Tabby
left a comment
There was a problem hiding this comment.
A bunch of things around the properties could do with being fixed, as well as the way the logging is used
It also doesn't seem to be working for me as far as I can tell? It logs that it's initialised the provider...
info: [obs-localvocal] Initializing cloud speech processor
info: Cloud speech processor initialized for provider: 0
...but there's no indication in the logs that it's using the cloud transcription and no indication of why not either, so it could do with more logging around which transcription path is being used and why
| obs_properties_add_group(ppts, "cloud_speech_group", MT_("cloud_speech_group"), | ||
| OBS_GROUP_CHECKABLE, cloud_speech_group); |
There was a problem hiding this comment.
The second argument to obs_properties_add_group is the name of a boolean property which controls whether or not the group is visible. It would be better to call this property use_cloud_speech and get rid of the extra boolean property you've added further down with that name
| obs_properties_add_group(ppts, "cloud_speech_group", MT_("cloud_speech_group"), | |
| OBS_GROUP_CHECKABLE, cloud_speech_group); | |
| obs_properties_add_group(ppts, "use_cloud_speech", MT_("use_cloud_speech"), | |
| OBS_GROUP_CHECKABLE, cloud_speech_group); |
|
|
||
| // Add main cloud speech enable checkbox | ||
| obs_properties_add_bool(cloud_speech_group, "use_cloud_speech", MT_("use_cloud_speech")); | ||
|
|
There was a problem hiding this comment.
With the suggested change above, we can get rid of this
| // Add main cloud speech enable checkbox | |
| obs_properties_add_bool(cloud_speech_group, "use_cloud_speech", MT_("use_cloud_speech")); | |
|
|
||
| // Add callback to show/hide cloud speech options | ||
| obs_property_t *cloud_speech_group_prop = obs_properties_get(ppts, "cloud_speech_group"); | ||
| obs_property_set_modified_callback(cloud_speech_group_prop, cloud_speech_options_callback); |
There was a problem hiding this comment.
Add an explanation that is visible when the group is disabled
| obs_property_set_modified_callback(cloud_speech_group_prop, cloud_speech_options_callback); | |
| obs_property_set_modified_callback(cloud_speech_group_prop, cloud_speech_options_callback); | |
| // add explanation text | |
| obs_properties_add_text(cloud_speech_group_prop, "cloud_speech_explanation", | |
| MT_("translate_cloud_explanation"), OBS_TEXT_INFO); |
| cloud_speech_group="Cloud Speech-to-Text" | ||
| use_cloud_speech="Use Cloud Speech-to-Text" |
There was a problem hiding this comment.
Following from my comments on transcription-filter-properties.cpp, get rid of the extra boolean variable name and add an explanation. Also I prefer that the group be called "Cloud Transcription" rather than "Cloud Speech-to-Text" but we can add that phrase to the description in case it's not clear to people what transcription is
| cloud_speech_group="Cloud Speech-to-Text" | |
| use_cloud_speech="Use Cloud Speech-to-Text" | |
| use_cloud_speech="Cloud Transcription" | |
| cloud_speech_explanation="Cloud transcription (speech-to-text) requires an active internet connection and API keys to the translation provider. If enabled, this will be used instead of local transcription." |
| blog(LOG_ERROR, "curl_global_init failed: %s", | ||
| curl_easy_strerror(curl_init_result)); |
There was a problem hiding this comment.
Can you please replace all the uses of blog with obs_log (declared in plugin-support.h), as this ensures all log messages are prefixed with the plugin name to make it easier to tell which log messages are from this plugin


This PR restores and hardens Amazon Transcribe streaming support (SigV4 event-stream) and improves the real-time caption/subtitle experience across Windows, macOS, and Ubuntu.
It also updates CI/build plumbing so AWS Transcribe support can be built out-of-the-box (including in GitHub Actions) without requiring contributors to manually build/install the AWS SDK first.
Key changes
1) Amazon Transcribe streaming auth + low-latency pipeline
EVENTSTREAM_SIGV4_SIGNERauth scheme (prevents “request signature does not match”).StartStreamTranscriptionAsyncsession open2) Output handling improvements (subtitles + file output)
3) Build + CI: AWS SDK setup is now automated (all platforms)
To avoid “works on Windows but not on macOS/Linux” and to make CI green, the AWS SDK build is now automated:
.github/scripts/Build-AwsSdk-Windows.ps1and integrates it intoBuild-Windows.ps1so AWS can be built when needed..github/scripts/Build-AwsSdk-macOS.zsh.github/scripts/Build-AwsSdk-Linux.zsh.github/scripts/.build.zshso macOS/Linux builds auto-buildaws-sdk-cpp(TranscribeStreaming only) into a repo-local prefixaws-sdk-built-curl/.aws-sdk-built-curl/to avoid rebuilding every matrix entry.4) Non-Windows correctness fixes required for CI
threading/vsThreading/).ZLIB::ZLIBexists beforefind_package(AWSSDK ...)because the exported AWS targets can reference it on Linux.cargo metadatabeing executed at repo root (no top-levelCargo.toml).Why this change is needed
AWSSDK transcribestreaming, which meant the Amazon streaming path was compiled out and could lead to “no output” behavior depending on configuration.How to test
Windows
.\.github\scripts\Build-Windows.ps1 -Configuration Releaserelease\Release\*into your OBS install directory.macOS / Ubuntu
aws-sdk-built-curl/when missing.AWS SDK found - enabling full AWS Transcribe supportOpt-out / Notes
ENABLE_AWS_TRANSCRIBE=OFForBUILD_AWS_SDK=0for macOS/Linux builds.