This project provides a real-time speech-to-speech translation system using Deepgram for speech recognition, LangChain for language translation, and ElevenLabs for voice synthesis.
- Real-time speech recognition using Deepgram
- Language translation using LangChain and OpenAI's GPT model (not using OpenAI's real-time model, so it won't be pricey)
- Voice synthesis using ElevenLabs
- Python 3.12.0
- Deepgram API key
- OpenAI API key
- ElevenLabs API key
-
Clone the repository:
git clone https://github.com/yourusername/speech-to-speech-realtime-translation.git cd speech-to-speech-realtime-translation
-
Install dependencies using Poetry or pip:
poetry install
or
pip install -r requirements.txt
-
Create a
.env
file based on the .env.example file and add your API keys:cp .env.example .env
Fill in the
.env
file with your API keys:OPENAI_API_KEY=your_openai_api_key DEEPGRAM_API_KEY=your_deepgram_api_key ELEVEN_API_KEY=your_elevenlabs_api_key
-
Run the main script:
poetry run python main.py
-
Follow the prompts to enter the input and output languages.
- main.py: Entry point of the application.
- stt_streaming.py: Handles real-time speech recognition and translation.
- llm.py: Contains the logic for language translation using LangChain and OpenAI.
- voice_synthesis.py: Handles voice synthesis using ElevenLabs.
- requirements.txt: Lists the required Python packages.
- pyproject.toml: Configuration file for Poetry.
- .env.example: Example environment variables file.
This project is licensed under the MIT License.