fix: Windows CUDA detection and speaker/confidence bugs#9
Conversation
- Upgrade to CUDA 12.8 + cuDNN 9 (7.6% faster vs 12.6) - Remove cuDNN 8 detection (ctranslate2 >= 4.5.0 requires cuDNN 9) - Remove redundant torch/torchaudio/torchvision deps (inherit from core) - Tighten dependency floors to tested versions - Fix Dockerfile to use uv.lock for reproducible builds Dependency floor changes: - fastapi: >=0.100 → >=0.110 - uvicorn: >=0.20 → >=0.25 - pydantic: >=2.0 → >=2.5 - pydantic-settings: >=2.0 → >=2.3 - httpx: >=0.25 → >=0.27 - aiosqlite: >=0.19 → >=0.20 Council reviewed: 4 perspectives (questioner, simplifier, operator, ergonomist)
Split dependency installation from source copy: 1. Copy manifests first (pyproject.toml, uv.lock) 2. Install deps with --no-install-project (cached layer) 3. Copy source code 4. Install project with --no-deps (fast) This ensures source code changes don't invalidate the expensive ~2GB dependency installation layer.
- Add Native Apple Silicon (MLX) as 100-star milestone - Gate cu128 torch source to Linux/Windows only - macOS falls back to PyPI CPU wheels This fixes uv-based installation on macOS which was broken by the cu128 index (no Darwin wheels available).
Summary of ChangesHello @namastex888, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on improving the user experience and reliability of the application, particularly for Windows users and in the transcription output. It addresses issues with CUDA detection and installation on Windows, refines the logic for including speaker labels and confidence scores in transcription results, and updates core dependencies and the Docker build process to leverage newer CUDA versions and a more efficient package manager. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly improves Windows support by enhancing CUDA detection and providing clear installation instructions. The fixes in transcriber.py to correctly handle speaker labels and confidence scores are well-implemented and address important bugs. The Dockerfile has also been nicely refactored to use uv and optimize layer caching. I've identified two critical issues that need to be addressed: the Dockerfile is missing the ffmpeg system dependency, and the uv.lock file is out of sync with pyproject.toml. Once these are fixed, the PR will be in excellent shape.
| FROM nvidia/cuda:12.8.0-cudnn-runtime-ubuntu22.04 | ||
|
|
||
| ENV DEBIAN_FRONTEND=noninteractive | ||
| WORKDIR /app |
There was a problem hiding this comment.
The av Python package, a dependency of murmurai-core, is a wrapper around the FFmpeg libraries and requires them to be installed on the system. The previous Dockerfile correctly installed ffmpeg, but this step appears to have been missed in the refactor. Without these system libraries, the container build may fail or the application will crash at runtime when av is used.
WORKDIR /app
# Install system dependencies for audio processing (required by PyAV)
RUN apt-get update && apt-get install -y --no-install-recommends ffmpeg && rm -rf /var/lib/apt/lists/*
| [[package]] | ||
| name = "murmurai-core" | ||
| version = "1.0.1" | ||
| version = "1.0.2" |
There was a problem hiding this comment.
There's an inconsistency between your pyproject.toml and uv.lock. The pyproject.toml file requires murmurai-core>=1.0.4, but this lock file has resolved murmurai-core to version 1.0.2. This indicates the lock file is stale and will cause issues with reproducible builds.
Please regenerate it to match the dependencies in pyproject.toml by running uv lock or uv sync.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| [[package]] | ||
| name = "murmurai-core" | ||
| version = "1.0.1" | ||
| version = "1.0.2" | ||
| source = { registry = "https://pypi.org/simple" } |
There was a problem hiding this comment.
Regenerate uv.lock for bumped murmurai-core
pyproject now requires murmurai-core>=1.0.4 (pyproject.toml lines 27‑35), but uv.lock still locks murmurai-core to 1.0.2 here; when the Dockerfile runs uv sync --frozen --no-dev --no-install-project, the locked 1.0.2 no longer satisfies the declared constraint, so builds that rely on the lockfile will fail until the lock is regenerated to match the new requirement.
Useful? React with 👍 / 👎.
2d643b9 to
736fea5
Compare
- deps.py: Detect CPU-only PyTorch and recommend --torch-backend=auto for Windows - README.md: Add Windows install section with uv pip --torch-backend=auto - pyproject.toml: Remove unnecessary torch uv.sources (API doesn't depend on torch), update murmurai-core>=1.0.4 - transcriber.py: Fix phantom speaker label and hardcoded 0.85 confidence when diarization/word_timestamps disabled
736fea5 to
45a3f20
Compare
…cy-optimization # Conflicts: # pyproject.toml
Summary
--torch-backend=autofor Windowsuv pip --torch-backend=autoTest plan
uv pip install murmurai --torch-backend=autospeaker_labels=false