lnco-transcribe: A CLI Tool for Automated Audio Transcription & Diarization

A lightweight and efficient CLI tool for converting audio interviews into structured text, featuring speaker diarization and multi-language support.

This repository is a packaged version of the Automatic Interviews Processing project. The original project covers a broader workflow, including text preprocessing, transcript evaluation, text analysis, and topic modeling. However, this repository focuses exclusively on the Audio-to-Text module, making it easily installable and deployable via PyPI.

Developed by David Friou as part of a semester project at LNCO Lab.

Installation

1. Prerequisites

Before installing the package, ensure you have the following dependencies installed:

Install FFMPEG from here, you can follow a guide like this for Windows installation.
Ensure that FFMPEG is added to your system’s PATH.
Install Strawberry Perl from here.
Visual C++ Build Tools: If you encounter build errors during installation, install the Visual C++ Build Tools by following this guide or download directly from Visual C++ Build Tools.
Python >= 3.10 is required.

2.1 Install via `pip`

pip install lnco-transcribe

OR If Pypi version don't working:

pip install .\dist\lnco_transcribe-1.0.0-py3-none-any.whl

2.2 Install Manually with a Virtual Environment

If you want to manually manage dependencies, create an environment and install dependencies:

Locked Environment Installation
This setup recreates the exact environment used during my semester project:

conda create --name tti python=3.10 --yes
conda activate tti
pip install -r freeze_requirements.txt

Flexible / Adaptive Installation
If you need more flexibility, like updating certain packages or adapting the repository replace the pip install -r freeze_requirements.txt step with:

pip install cython
pip install -c constraints.txt -r requirements.txt

Usage

Preparing Your Data

The pipeline supports nested folder structures, making it easy to process multiple experiments and interviews. To use the pipeline:

Simply indicate the path to your folder with your audio files.
The pipeline recursively processes all audio files within these folder and subfolders.

Transcription & Diarization (Audio-to-Text)

To transcribe audio files with speaker diarization, use:

Transcribe audio in its original language : (specified with --language)

lnco-transcribe -d path_to_folder --whisper-model large-v3 --language en

Transcribe and translate audio to english : (e.g. from french to english)

lnco-transcribe -d path_to_folder --whisper-model large-v3 --language fr --task translate

If only language is specified, the model will attempt to translate any detected language into the specified language.

To improve performance, specify the task as translate if you know in advance that the audio is in a certain language (e.g., French) and want to translate it into English.

You can view the list of all supported languages along with their corresponding language codes just here: Languages

Parameter	Description	Default
`-d, --directory`	Path to the directory containing audio files.	None
`--whisper_model`	Name of the Whisper model used for transcription.	None
`--language`	Language code for transcription (e.g., `fra` for French, `eng` for English).	None
`--task`	Task to perform (e.g., "transcribe", "translate").	None
`-e, --extensions`	List of allowed audio file extensions.	[".m4a", ".mp4", ".wav"]
`--overwrite`	Overwrites existing transcriptions if specified.	False

Run lnco-transcribe --help for full options &/or see run_diarize.py for additional information.

Outputs

The tool generates transcripts in two structured formats:

Text Format: Simplified and easy-to-read files for manual review.
CSV Format: A structured format ideal for analysis, with columns such as:
- Experiment name (derived from the name of the folder directory).
- File name.
- Participant ID.
- Timestamps for each segment.
- Speaker roles and transcription content.

Processed folder

Contains the same outpouts after aditional preprocessing steps:

Removal of vocalized fillers
Visual cleaning of the text
Prediction of the speaker role in interview set-up (Participant & Interviewer)

For a more modular approach you can use preprocessing notebook.

File Structure

Audio-to-Text Processing

This section focuses on converting raw audio data into text through transcription and diarization, enabling subsequent analysis.

Preprocessing and Conversion:
- pre_analysis.ipynb: Analyzes audio files and experiment structure.
- MTS_to_audio.py: Converts .MTS videos into .wav format for processing.
Transcription & Diarization:
- run_diarize.py: The main script for batch-processing transcription and speaker diarization.
- whisper_diarization/: Source code from the Whisper-Diarization framework. (See Mentions)
- nemo_msdd_configs/: YAML configuration files for diarization tasks.
Transcript Preprocessing:
- preprocessing.ipynb: Modular workflow for cleaning and preparing transcripts for further analysis.

utils/ format_helpers.py and preprocessing_helpers.py: Assist with structured formatting and transcript preprocessing.

Mentions

This package relies heavily on the Whisper-Diarization framework to handle transcription and diarization of audio files into structured text formats, which is licensed under the BSD 2-Clause License.

@unpublished{hassouna2024whisperdiarization,
  title={Whisper Diarization: Speaker Diarization Using OpenAI Whisper},
  author={Ashraf, Mahmoud},
  year={2024}}

For additional details, visit the Whisper-Diarization GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dist		dist
images		images
lnco_transcribe		lnco_transcribe
nemo_msdd_configs		nemo_msdd_configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
constraints.txt		constraints.txt
freeze_requirements.txt		freeze_requirements.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lnco-transcribe: A CLI Tool for Automated Audio Transcription & Diarization

Table of Contents

Installation

1. Prerequisites

2.1 Install via `pip`

2.2 Install Manually with a Virtual Environment

Usage

Preparing Your Data

Transcription & Diarization (Audio-to-Text)

Outputs

Processed folder

File Structure

Audio-to-Text Processing

Mentions

About

Releases

Packages

Languages

License

AfroDeivid/lnco-transcribe

Folders and files

Latest commit

History

Repository files navigation

lnco-transcribe: A CLI Tool for Automated Audio Transcription & Diarization

Table of Contents

Installation

1. Prerequisites

2.1 Install via pip

2.2 Install Manually with a Virtual Environment

Usage

Preparing Your Data

Transcription & Diarization (Audio-to-Text)

Outputs

Processed folder

File Structure

Audio-to-Text Processing

Mentions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

2.1 Install via `pip`

Packages