Skip to content

Fix Amazon Transcribe streaming + low-latency captions; auto-build AWS SDK for C++ 1.11.710 (libcurl) on Windows/macOS/Linux#275

Open
Folotu wants to merge 25 commits into
royshil:masterfrom
Folotu:master
Open

Fix Amazon Transcribe streaming + low-latency captions; auto-build AWS SDK for C++ 1.11.710 (libcurl) on Windows/macOS/Linux#275
Folotu wants to merge 25 commits into
royshil:masterfrom
Folotu:master

Conversation

@Folotu
Copy link
Copy Markdown

@Folotu Folotu commented Dec 29, 2025

This PR restores and hardens Amazon Transcribe streaming support (SigV4 event-stream) and improves the real-time caption/subtitle experience across Windows, macOS, and Ubuntu.

It also updates CI/build plumbing so AWS Transcribe support can be built out-of-the-box (including in GitHub Actions) without requiring contributors to manually build/install the AWS SDK first.

Key changes

1) Amazon Transcribe streaming auth + low-latency pipeline

  • Fixes SigV4 auth for Transcribe event-stream by forcing the EVENTSTREAM_SIGV4_SIGNER auth scheme (prevents “request signature does not match”).
  • Implements a true low-latency Transcribe path:
    • Keeps a long-lived StartStreamTranscriptionAsync session open
    • Continuously feeds 16kHz mono PCM
    • Emits partial/final updates to OBS as they arrive (lower lag than segment-based inference).
  • Enables Partial Results Stabilization and improves partial handling to reduce flicker/duplication while keeping partials responsive.

2) Output handling improvements (subtitles + file output)

  • Fixes runtime switching between subtitle output targets and file output:
    • File output opens correctly without restart
    • File output writes finals-only (no partial spam) and avoids duplication.
  • Improves subtitle display behavior to keep updates responsive.

3) Build + CI: AWS SDK setup is now automated (all platforms)

To avoid “works on Windows but not on macOS/Linux” and to make CI green, the AWS SDK build is now automated:

  • Windows
  • macOS + Linux
    • Adds:
      • .github/scripts/Build-AwsSdk-macOS.zsh
      • .github/scripts/Build-AwsSdk-Linux.zsh
    • Integrates them into .github/scripts/.build.zsh so macOS/Linux builds auto-build aws-sdk-cpp (TranscribeStreaming only) into a repo-local prefix aws-sdk-built-curl/.
    • Adds CI caching for aws-sdk-built-curl/ to avoid rebuilding every matrix entry.

4) Non-Windows correctness fixes required for CI

  • Linux case-sensitive include fix for AWS semaphore header (threading/ vs Threading/).
  • CMake fix: ensure ZLIB::ZLIB exists before find_package(AWSSDK ...) because the exported AWS targets can reference it on Linux.
  • CI workflow fixes:
    • Avoid cargo metadata being executed at repo root (no top-level Cargo.toml).
    • Formatting fixes (clang-format/cmake-format) so formatting checks pass.

Why this change is needed

  • Windows could be made to work once the AWS SDK was built with the correct HTTP client support and the event-stream signing scheme was forced.
  • macOS/Linux builds initially did not build or find AWSSDK transcribestreaming, which meant the Amazon streaming path was compiled out and could lead to “no output” behavior depending on configuration.
  • This PR makes AWS Transcribe support consistent across platforms and prevents silent “no-op” states.

How to test

Windows

  1. Build:
    • .\.github\scripts\Build-Windows.ps1 -Configuration Release
  2. Install output into OBS:
    • Copy release\Release\* into your OBS install directory.
  3. In OBS → LocalVocal:
    • Enable Cloud Speech
    • Select Amazon Transcribe
    • Enter Access Key / Secret Key (and optional Session Token) + Region

macOS / Ubuntu

  • CI artifacts should build with AWS Transcribe support enabled by default.
  • Locally, building through the repo scripts should auto-build the AWS SDK (TranscribeStreaming only) into aws-sdk-built-curl/ when missing.
  • Verify in build logs:
    • AWS SDK found - enabling full AWS Transcribe support

Opt-out / Notes

  • Contributors can skip AWS SDK build/support if desired:
    • Set ENABLE_AWS_TRANSCRIBE=OFF or BUILD_AWS_SDK=0 for macOS/Linux builds.
  • Other providers (Whisper/OpenAI/Google/Azure/custom) remain unchanged aside from shared output plumbing/CI fixes.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds AWS Transcribe streaming support for Windows with low-latency captions, fixes authentication issues, and enhances subtitle output capabilities.

Key changes:

  • Implements true streaming AWS Transcribe integration with SigV4 event-stream authentication and partial result stabilization
  • Adds file-based subtitle output with deduplication to avoid writing partial results
  • Includes Windows build automation for AWS SDK C++ with libcurl support

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
src/whisper-utils/cloud-speech.h Defines CloudSpeechProcessor interface and AWS streaming state structures for multi-provider cloud transcription
src/whisper-utils/cloud-speech.cpp Core implementation of AWS Transcribe streaming with thread-safe audio submission, transcript consumption, and fallback providers
src/whisper-utils/aws-memory-manager.h Custom AWS SDK memory manager wrapper to avoid CRT allocation issues
src/whisper-utils/ssl-utils.h/cpp Utility to resolve PEM certificate path for AWS SDK HTTPS connections
src/whisper-utils/whisper-processing.h/cpp Integrates cloud speech inference path with fallback to local Whisper and streaming transcript updates in main loop
src/whisper-utils/vad-processing.cpp Feeds resampled 16kHz audio continuously to Amazon streaming session and reduces buffering for low-latency
src/whisper-utils/token-buffer-thread.cpp Flushes partial results immediately to avoid caption lag from cloud streaming
src/transcription-utils.cpp Fixes logical bug in remove_leading_trailing_nonalpha using AND instead of OR
src/transcription-filter.h Adds extern declaration for transcription_filter_info
src/transcription-filter.cpp Manages cloud speech processor lifecycle and implements file output open/close logic with runtime switching support
src/transcription-filter-data.h Adds cloud speech configuration, processor, and file output state fields
src/transcription-filter-callbacks.cpp Routes captions to file output and implements finals-only file writing with deduplication
src/transcription-filter-utils.cpp Refactors text source creation to ensure source exists in current scene on multiple events
src/transcription-filter-properties.h/cpp Adds cloud speech provider UI group with credentials, region, model, and fallback options
src/plugin-main.c Initializes and shuts down AWS SDK on plugin load/unload
data/locale/en-US.ini Adds localized strings for cloud speech options and file output
CMakeLists.txt Integrates optional AWS Transcribe SDK detection and cURL client configuration
.github/scripts/Build-Windows.ps1 Adds BuildAwsSdk switch to trigger AWS SDK build automation
.github/scripts/Build-AwsSdk-Windows.ps1 New script to clone, configure, and build AWS SDK C++ with cURL for TranscribeStreaming
README.md Documents AWS Transcribe build requirements and optional SDK automation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/whisper-utils/cloud-speech.cpp
Comment thread src/whisper-utils/cloud-speech.cpp Outdated
Comment thread src/whisper-utils/whisper-processing.cpp Outdated
Comment thread src/whisper-utils/cloud-speech.cpp Outdated
Comment thread src/whisper-utils/cloud-speech.cpp
Comment thread src/whisper-utils/cloud-speech.cpp
Comment thread src/transcription-filter.cpp
Comment thread src/transcription-filter-properties.cpp
Comment thread src/whisper-utils/cloud-speech.cpp Outdated
Comment thread src/transcription-filter-callbacks.cpp Outdated
@Tabby
Copy link
Copy Markdown
Collaborator

Tabby commented Jan 5, 2026

This is very cool. I'll try and do a more thorough review when I can but at a quick glance this looks pretty good. I would want it to work on MacOS and Linux as well as Windows but I'd be happy to help get that working and tested

This plugin is intended to be used with locally running models without sending data to the cloud, but given there's already the option for cloud translations, I don't think it unreasonable to also have the option to use cloud transcription as well, as long as it's not the default

Add macOS/Linux AWS SDK build scripts (transcribestreaming only) and integrate into zsh build so ENABLE_AWS_TRANSCRIBE_SDK is defined in CI builds. Cache aws-sdk-built-curl in workflows. Also guard Amazon streaming paths when SDK is absent to avoid silent no-output.
Disable setup-rust-toolchain caching to avoid cargo metadata failures when no top-level Cargo.toml exists, and apply clang-format to the touched C++ sources.
Fix Ubuntu CI failure where aws-sdk-cpp exported targets reference ZLIB::ZLIB but the target is not defined unless ZLIB is found before find_package(AWSSDK).
Run cmake-format on CMakeLists.txt to satisfy the CI cmake-format check.
Use aws/core/utils/threading/Semaphore.h (lowercase path) so Linux builds don't fail on case-sensitive filesystems.
@Folotu Folotu changed the title Fix AWS Transcribe auth and add low-latency streaming captions (Windows, AWS SDK C++ 1.11.710 + libcurl) Fix Amazon Transcribe streaming + low-latency captions; auto-build AWS SDK for C++ 1.11.710 (libcurl) on Windows/macOS/Linux Feb 1, 2026
@Folotu
Copy link
Copy Markdown
Author

Folotu commented Feb 1, 2026

This is very cool. I'll try and do a more thorough review when I can but at a quick glance this looks pretty good. I would want it to work on MacOS and Linux as well as Windows but I'd be happy to help get that working and tested

This plugin is intended to be used with locally running models without sending data to the cloud, but given there's already the option for cloud translations, I don't think it unreasonable to also have the option to use cloud transcription as well, as long as it's not the default

Totally fair points.

I agree the primary goal of this plugin is local transcription. The reason I opened this PR here is that the existing cloud path was effectively broken for AWS Transcribe, and I wasn’t sure of the best place to land this work given CloudVocal/aws_transcribe inactive dev state. Since this repo already supports cloud translation, adding an optional cloud transcription provider felt consistent as long as it’s not the default and doesn’t impact local-only users.

Re: macOS/Linux - yes, I’d really appreciate your help testing. I did a quick validation in a macOS VM and it appeared to work, but I’d much prefer confirmation on native hardware and on Linux as well. I also updated CI/build scripts so AWS Transcribe support can be built on macOS/Linux (and should no longer be Windows-only).

Screenshots from the macOS VM run:

image Screenshot 2026-02-01 010026

@Tabby
Copy link
Copy Markdown
Collaborator

Tabby commented Apr 2, 2026

Finally getting around to testing this, thanks for being patient. It builds and runs fine on Linux fwiw, and I'll get it built for MacOS soon as I now have a MacBook I can use for testing

I just need to dust off my AWS account and then I'll test the cloud transcription part of it on both platforms and then we can get it merged

Comment on lines +187 to +188
CloudSpeechConfig cloud_speech_config;
std::unique_ptr<CloudSpeechProcessor> cloud_speech_processor;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines at least need to be surrounded by a check because if ENABLE_AWS_TRANSCRIBE_SDK is not defined the types won't exist

Suggested change
CloudSpeechConfig cloud_speech_config;
std::unique_ptr<CloudSpeechProcessor> cloud_speech_processor;
#ifdef ENABLE_AWS_TRANSCRIBE_SDK
CloudSpeechConfig cloud_speech_config;
std::unique_ptr<CloudSpeechProcessor> cloud_speech_processor;
#endif

Copy link
Copy Markdown
Collaborator

@Tabby Tabby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of things around the properties could do with being fixed, as well as the way the logging is used

It also doesn't seem to be working for me as far as I can tell? It logs that it's initialised the provider...

info: [obs-localvocal] Initializing cloud speech processor
info: Cloud speech processor initialized for provider: 0

...but there's no indication in the logs that it's using the cloud transcription and no indication of why not either, so it could do with more logging around which transcription path is being used and why

Comment on lines +747 to +748
obs_properties_add_group(ppts, "cloud_speech_group", MT_("cloud_speech_group"),
OBS_GROUP_CHECKABLE, cloud_speech_group);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second argument to obs_properties_add_group is the name of a boolean property which controls whether or not the group is visible. It would be better to call this property use_cloud_speech and get rid of the extra boolean property you've added further down with that name

Suggested change
obs_properties_add_group(ppts, "cloud_speech_group", MT_("cloud_speech_group"),
OBS_GROUP_CHECKABLE, cloud_speech_group);
obs_properties_add_group(ppts, "use_cloud_speech", MT_("use_cloud_speech"),
OBS_GROUP_CHECKABLE, cloud_speech_group);

Comment on lines +753 to +756

// Add main cloud speech enable checkbox
obs_properties_add_bool(cloud_speech_group, "use_cloud_speech", MT_("use_cloud_speech"));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggested change above, we can get rid of this

Suggested change
// Add main cloud speech enable checkbox
obs_properties_add_bool(cloud_speech_group, "use_cloud_speech", MT_("use_cloud_speech"));


// Add callback to show/hide cloud speech options
obs_property_t *cloud_speech_group_prop = obs_properties_get(ppts, "cloud_speech_group");
obs_property_set_modified_callback(cloud_speech_group_prop, cloud_speech_options_callback);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an explanation that is visible when the group is disabled

Suggested change
obs_property_set_modified_callback(cloud_speech_group_prop, cloud_speech_options_callback);
obs_property_set_modified_callback(cloud_speech_group_prop, cloud_speech_options_callback);
// add explanation text
obs_properties_add_text(cloud_speech_group_prop, "cloud_speech_explanation",
MT_("translate_cloud_explanation"), OBS_TEXT_INFO);

Comment thread data/locale/en-US.ini
Comment on lines +26 to +27
cloud_speech_group="Cloud Speech-to-Text"
use_cloud_speech="Use Cloud Speech-to-Text"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following from my comments on transcription-filter-properties.cpp, get rid of the extra boolean variable name and add an explanation. Also I prefer that the group be called "Cloud Transcription" rather than "Cloud Speech-to-Text" but we can add that phrase to the description in case it's not clear to people what transcription is

Suggested change
cloud_speech_group="Cloud Speech-to-Text"
use_cloud_speech="Use Cloud Speech-to-Text"
use_cloud_speech="Cloud Transcription"
cloud_speech_explanation="Cloud transcription (speech-to-text) requires an active internet connection and API keys to the translation provider. If enabled, this will be used instead of local transcription."

Comment on lines +57 to +58
blog(LOG_ERROR, "curl_global_init failed: %s",
curl_easy_strerror(curl_init_result));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please replace all the uses of blog with obs_log (declared in plugin-support.h), as this ensures all log messages are prefixed with the plugin name to make it easier to tell which log messages are from this plugin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants