Skip to content

amazon-contributing/upstream-to-obs-localvocal

 
 

Repo sponsors: Recall.ai - API for desktop recording

If you’re looking for a hosted desktop recording API, consider checking out Recall.ai, an API that records Zoom, Google Meet, Microsoft Teams, in-person meetings, and more.

LocalVocal - Speech AI assistant OBS Plugin

GitHub GitHub Workflow Status Total downloads GitHub release (latest by date) GitHub stars
Download:
Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

Introduction

LocalVocal lets you transcribe, locally on your machine, speech into text and simultaneously translate to any language. âś… No GPU required, âś… no cloud costs, âś… no network and âś… no downtime! Privacy first - all data stays on your machine.

The plugin runs OpenAI's Whisper to process real-time speech and predict a transcription, utilizing Whisper.cpp from ggerganov to run the model efficiently on CPUs and GPUs. Translation is done with CTranslate2.

Usage

   
https://youtu.be/ns4cP9HFTxQ https://youtu.be/4llyfNi9FGs https://youtu.be/R04w02qG26o

Do more with LocalVocal:

Current Features:

  • Transcribe audio to text in real time in 100 languages
  • Display captions on screen using text sources
  • Send captions to a .txt or .srt file (to read by external sources or video playback) with and without aggregation option
  • Sync'ed captions with OBS recording timestamps
  • Send captions on a RTMP stream to e.g. YouTube, Twitch
  • Bring your own Whisper model (any GGML)
  • Translate captions in real time to major languages (with cloud prviders, Whisper built-in translation as well as NMT models)
  • CUDA, hipBLAS (AMD ROCm), Apple Arm64, AVX & SSE acceleration support
  • Filter out or replace any part of the produced captions
  • Partial transcriptions for a streaming-captions experience
  • 100s of fine-tuned Whisper models for dozens of languages from HuggingFace

Download

Check out the latest releases for downloads and install instructions.

Available Versions

LocalVocal is available in multiple versions to cater to different hardware configurations and operating systems. Below is a brief explanation of the different versions you can download:

  • Windows (please ensure you have the latest MSVC runtime installed)
  • MacOS
    • Intel (x86_64): This version is for Mac computers with Intel processors. See MacOS variants
    • Apple Silicon (arm64): This version is optimized for Mac computers with Apple Silicon (M1, M2, etc.) processors. See MacOS variants
  • Linux x86_64: This version is for Linux systems with x86_64 architecture.

Make sure to download the version that matches your system's hardware and operating system for the best performance.

Whisper backends are now loaded dynamically when the plugin starts, which has 2 major benefits:

  • Better CPU performance and compatibility - Whisper can automatically select the best CPU backend that works on your system out of all the ones available. This means that the plugin can now make full use of newer CPUs with more features, as well as making it usable on even older hardware than before (prior to v0.5.0 it was assumed that users would have at least AVX2 capable CPUs)
  • More stability - If a backend is present that cannot be used on your system, either due to unavailable CPU features, missing dependencies, or something else, it will simply not be loaded instead of causing a crash

To ensure the plugin works "out-of-the-box", it is configured by default to use the CPU only (this is also the case for users upgrading from versions older than v0.5.0). This is to avoid immediate crashes on startup if for any reason your GPU cannot be used by one of the Whisper backends (e.g. the Metal backend on Apple just crashes if it is unable to allocate a buffer to load a model into)

If you want to use GPU acceleration, please ensure you go into the plugin settings and select your desired GPU acceleration backend

Generic variants

These variants should run well on any system regardless of hardware configuration. They contain the following Whispercpp backends:

  • CPU
    • Generic x86_64
    • Generic x86_64 with SSE4.2
    • Sandy Bridge (CPU with SSE4.2, AVX)
    • Haswell (CPU with SSE4.2, AVX, F16C, AVX2, BMI2, FMA)
    • Sky Lake (CPU with SSE4.2, AVX, F16C, AVX2, BMI2, FMA, AVX512)
    • Ice Lake (CPU with SSE4.2, AVX, F16C, AVX2, BMI2, FMA, AVX512, AVX512_VBMI AVX512_VNNI)
    • Alder Lake (CPU with SSE4.2, AVX, F16C, AVX2, BMI2, FMA, AVX_VNNI)
    • Sapphire Rapids (CPU with SSE4.2, AVX, F16C, AVX2, BMI2, FMA, AVX512, AVX512_VBMI AVX512_VNNI, AVX512_BF16, AMX_TITLE, AMX_INT8)
  • OpenBLAS - Used in conjunction with a CPU backend to accelerate processing speed
  • Vulkan - Standard cross-platform graphics library allowing for GPU accelerated processing on GPUs that aren't supported by CUDA or ROCm. Can also work with integrated GPUs)
  • OpenCL (currently Linux only) - Industry standard parallel compute library that may be faster than Vulkan on supported GPUs

NVidia optimized variants

These variants contain all the backends from the generic variant, plus a CUDA backend that provides accelerated performance on supported NVidia GPUS. If the OpenCL backend is available on your platform, it also uses the CUDA OpenCL library instead of the generic one.

Make sure you have the latest NVidia GPU drivers installed and you will likely also need the CUDA toolkit v12.8.0 or newer.

If installing on Linux, to avoid installing the entire CUDA toolkit if you don't need it you can just install either the cuda-runtime-12.8 package to get all the runtime libs and drivers, or the cuda-libaries-12-8 package to just get the runtime libraries.

AMD optimized variants

These variants contain all the backends from the generic variant, plus a hipblas backend using AMD's ROCm framework that accelerates computation on supported AMD GPUs

Please ensure you have a compatible AMD GPU driver installed

Mac OS variants

These variants come with the following backends available:

  • CPU
    • The same x86_64 variants as listed in Generic variants for Intel CPUs
    • m1, m2/m3, and m4 variants for ARM CPUs
  • Accelerate - Used in conjunction with a CPU backend to accelerate processing speed
  • Metal - Uses the system's GPU for accelerated processing
  • CoreML - Special backend that uses Apple's CoreML instead of Whisper's normal model processing, running on either the Metal or CPU backends

Models

The plugin ships with the Tiny.en model, and will autonomously download other Whisper models through a dropdown. There's also an option to select an external GGML Whisper model file if you have it on disk.

If using CoreML on Apple, it will also automatically download the appropriate CoreML encoder model for your selected model.

Get more models from https://ggml.ggerganov.com/ and HuggingFace, follow the instructions on whisper.cpp to create your own models or download others such as distilled models.

Building

The plugin was built and tested on Mac OSX (Intel & Apple silicon), Windows (with and without Nvidia CUDA) and Linux.

Start by cloning this repo to a directory of your choice.

Mac OSX

Using the CI pipeline scripts, locally you would just call the zsh script, which builds for the architecture specified in $MACOS_ARCH (either x86_64 or arm64).

$ MACOS_ARCH="x86_64" ./.github/scripts/build-macos -c Release

Install

The above script should succeed and the plugin files (e.g. obs-localvocal.plugin) will reside in the ./release/Release folder off of the root. Copy the .plugin file to the OBS directory e.g. ~/Library/Application Support/obs-studio/plugins.

To get .pkg installer file, run for example

$ ./.github/scripts/package-macos -c Release

(Note that maybe the outputs will be in the Release folder and not the install folder like pakage-macos expects, so you will need to rename the folder from build_x86_64/Release to build_x86_64/install)

Linux

Using pre-compiled variants

  1. Clone the repository and if not using Ubuntu install the development versions of these dependencies using your distribution's package manager:

    • libcurl
    • libsimde
    • libssl
    • icu
    • openblas (preferably the OpenMP variant rather than the pthreads variant)
    • OpenCL
    • Vulkan

    Installing ccache is also recommended if you are likely to be building the plugin multiple times

  2. Install rust via rustup (recommended), or your distribution's package manager

  3. Set the ACCELERATION environment variable to one of generic, nvidia, or amd (defaults to generic if unset)

    export ACCELERATION="nvidia"
  4. Then from the repo directory build the plugin by running:

    ./.github/scripts/build-linux

    If you can't use the CI build script for some reason, you can build the plugin as follows

    cmake -B build_x86_64 --preset linux-x86_64 -DCMAKE_INSTALL_PREFIX=./release
    cmake --build build_x86_64 --target install
  5. Installing

    If using Ubuntu and the plugin was previously installed using a .deb package, copy the results to the standard OBS folders on Ubuntu

    sudo cp -R release/RelWithDebInfo/lib/* /usr/lib/
    sudo cp -R release/RelWithDebInfo/share/* /usr/share/

    Otherwise, follow the official OBS plugins guide and copy the results to your user plugins folder

    mkdir -p ~/.config/obs-studio/plugins/obs-localvocal/bin/64bit
    cp -R release/RelWithDebInfo/lib/x86_64-linux-gnu/obs-plugins/* ~/.config/obs-studio/plugins/obs-localvocal/bin/64bit/
    mkdir -p ~/.config/obs-studio/plugins/obs-localvocal/data
    cp -R release/RelWithDebInfo/share/obs/obs-plugins/obs-localvocal/* ~/.config/obs-studio/plugins/obs-localvocal/data/

    Note: The lib path in the release folder varies depending on your Linux distribution (e.g. on Gentoo the plugin libraries are found in release/RelWithDebInfo/lib64/obs-plugins) but the destination directory to copy them into will always be the same.

Building Whispercpp from source along with the plugin

If you can't use the CI build script for some reason, or simply prefer to build the Whispercpp dependency from source along with the plugin, follow the steps above but build the plugin using the following commands:

cmake -B build_x86_64 --preset linux-x86_64 -DLINUX_SOURCE_BUILD=ON -DCMAKE_INSTALL_PREFIX=./release
cmake --build build_x86_64 --target install

When building from source, the Vulkan and OpenCL development libraries are optional and will only be used in the build if they are installed. Similarly if the CUDA or ROCm toolkits are found, they will also be used and the relevant Whisper backends will be enabled.

The default for a full source build is to build both Whisper and the plugin optimized for the host system. To change this behaviour add one or both of the following options to the CMake configure command (the first of the two):

  • to build all CPU backends add -DWHISPER_DYNAMIC_BACKENDS=ON
  • to build all CUDA kernels add -DWHISPER_BUILD_ALL_CUDA_ARCHITECTURES=ON

Windows

Use the CI scripts again, for example:

> .github/scripts/Build-Windows.ps1 -Configuration Release

The build should exist in the ./release folder off the root. You can manually install the files in the OBS directory.

> Copy-Item -Recurse -Force "release\Release\*" -Destination "C:\Program Files\obs-studio\"

Building with CUDA support on Windows

LocalVocal will now build with CUDA support automatically through a prebuilt binary of Whisper.cpp from https://github.com/locaal-ai/locaal-ai-dep-whispercpp. The CMake scripts will download all necessary files.

To build with cuda add ACCELERATION as an environment variable (with cpu, hipblas, or cuda) and build regularly

> $env:ACCELERATION="cuda"
> .github/scripts/Build-Windows.ps1 -Configuration Release
Star History Chart

About

OBS plugin for local speech recognition and captioning using AI

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 72.8%
  • CMake 22.7%
  • C 1.9%
  • Shell 1.3%
  • Other 1.3%