Skip to content

fix: Windows CUDA detection and speaker/confidence bugs#9

Merged
namastex888 merged 8 commits intomainfrom
feat/cuda128-dependency-optimization
Dec 17, 2025
Merged

fix: Windows CUDA detection and speaker/confidence bugs#9
namastex888 merged 8 commits intomainfrom
feat/cuda128-dependency-optimization

Conversation

@namastex888
Copy link
Contributor

@namastex888 namastex888 commented Dec 16, 2025

Summary

  • deps.py: Detect CPU-only PyTorch and recommend --torch-backend=auto for Windows
  • README.md: Add Windows install section with uv pip --torch-backend=auto
  • pyproject.toml: Remove unnecessary torch uv.sources (API doesn't depend on torch), update murmurai-core>=1.0.5
  • transcriber.py: Fix phantom speaker label and hardcoded 0.85 confidence when diarization/word_timestamps disabled

Test plan

  • Verify Windows install with uv pip install murmurai --torch-backend=auto
  • Confirm speaker field omitted when speaker_labels=false
  • Confirm confidence field omitted when no word-level data

- Upgrade to CUDA 12.8 + cuDNN 9 (7.6% faster vs 12.6)
- Remove cuDNN 8 detection (ctranslate2 >= 4.5.0 requires cuDNN 9)
- Remove redundant torch/torchaudio/torchvision deps (inherit from core)
- Tighten dependency floors to tested versions
- Fix Dockerfile to use uv.lock for reproducible builds

Dependency floor changes:
- fastapi: >=0.100 → >=0.110
- uvicorn: >=0.20 → >=0.25
- pydantic: >=2.0 → >=2.5
- pydantic-settings: >=2.0 → >=2.3
- httpx: >=0.25 → >=0.27
- aiosqlite: >=0.19 → >=0.20

Council reviewed: 4 perspectives (questioner, simplifier, operator, ergonomist)
Split dependency installation from source copy:
1. Copy manifests first (pyproject.toml, uv.lock)
2. Install deps with --no-install-project (cached layer)
3. Copy source code
4. Install project with --no-deps (fast)

This ensures source code changes don't invalidate the expensive
~2GB dependency installation layer.
- Add Native Apple Silicon (MLX) as 100-star milestone
- Gate cu128 torch source to Linux/Windows only
- macOS falls back to PyPI CPU wheels

This fixes uv-based installation on macOS which was broken
by the cu128 index (no Darwin wheels available).
@gemini-code-assist
Copy link

Summary of Changes

Hello @namastex888, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the user experience and reliability of the application, particularly for Windows users and in the transcription output. It addresses issues with CUDA detection and installation on Windows, refines the logic for including speaker labels and confidence scores in transcription results, and updates core dependencies and the Docker build process to leverage newer CUDA versions and a more efficient package manager.

Highlights

  • Windows CUDA Support: Enhanced CUDA detection for Windows, providing clearer guidance for CPU-only PyTorch installations and recommending the --torch-backend=auto option for uv pip install.
  • Transcription Output Accuracy: Resolved bugs where speaker labels and confidence scores were incorrectly included or hardcoded when diarization or word-level timestamps were disabled, ensuring more accurate and conditional output.
  • Dependency and Build System Updates: Upgraded the Docker build process to use CUDA 12.8 and uv for dependency management, removing unnecessary PyTorch-related uv.sources and updating various package versions for improved stability and performance.
  • Documentation Improvement: Added a dedicated 'Windows Install' section to the README.md to guide users through the specific steps required for PyTorch with CUDA on Windows.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves Windows support by enhancing CUDA detection and providing clear installation instructions. The fixes in transcriber.py to correctly handle speaker labels and confidence scores are well-implemented and address important bugs. The Dockerfile has also been nicely refactored to use uv and optimize layer caching. I've identified two critical issues that need to be addressed: the Dockerfile is missing the ffmpeg system dependency, and the uv.lock file is out of sync with pyproject.toml. Once these are fixed, the PR will be in excellent shape.

FROM nvidia/cuda:12.8.0-cudnn-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /app

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The av Python package, a dependency of murmurai-core, is a wrapper around the FFmpeg libraries and requires them to be installed on the system. The previous Dockerfile correctly installed ffmpeg, but this step appears to have been missed in the refactor. Without these system libraries, the container build may fail or the application will crash at runtime when av is used.

WORKDIR /app

# Install system dependencies for audio processing (required by PyAV)
RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg && rm -rf /var/lib/apt/lists/*

[[package]]
name = "murmurai-core"
version = "1.0.1"
version = "1.0.2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's an inconsistency between your pyproject.toml and uv.lock. The pyproject.toml file requires murmurai-core>=1.0.4, but this lock file has resolved murmurai-core to version 1.0.2. This indicates the lock file is stale and will cause issues with reproducible builds.

Please regenerate it to match the dependencies in pyproject.toml by running uv lock or uv sync.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 902 to 905
[[package]]
name = "murmurai-core"
version = "1.0.1"
version = "1.0.2"
source = { registry = "https://pypi.org/simple" }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Regenerate uv.lock for bumped murmurai-core

pyproject now requires murmurai-core>=1.0.4 (pyproject.toml lines 27‑35), but uv.lock still locks murmurai-core to 1.0.2 here; when the Dockerfile runs uv sync --frozen --no-dev --no-install-project, the locked 1.0.2 no longer satisfies the declared constraint, so builds that rely on the lockfile will fail until the lock is regenerated to match the new requirement.

Useful? React with 👍 / 👎.

@namastex888 namastex888 force-pushed the feat/cuda128-dependency-optimization branch from 2d643b9 to 736fea5 Compare December 17, 2025 01:32
@namastex888 namastex888 added the rc Release Candidate label Dec 17, 2025
- deps.py: Detect CPU-only PyTorch and recommend --torch-backend=auto for Windows
- README.md: Add Windows install section with uv pip --torch-backend=auto
- pyproject.toml: Remove unnecessary torch uv.sources (API doesn't depend on torch),
  update murmurai-core>=1.0.4
- transcriber.py: Fix phantom speaker label and hardcoded 0.85 confidence when
  diarization/word_timestamps disabled
@namastex888 namastex888 force-pushed the feat/cuda128-dependency-optimization branch from 736fea5 to 45a3f20 Compare December 17, 2025 02:11
@namastex888 namastex888 merged commit 9f061a9 into main Dec 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rc Release Candidate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant