This directory contains example implementations demonstrating how to use the Sayna SDKs for building real-time voice-enabled applications. All examples should be built using the official Sayna SDKs located in ../saysdk.
Sayna provides real-time voice processing services including Speech-to-Text (STT), Text-to-Speech (TTS), and LiveKit integration for multi-participant voice rooms. The platform enables developers to build voice-enabled AI agents, phone systems, transcription services, and interactive voice applications.
The Sayna SDK monorepo (../saysdk) contains three official client libraries:
- Location:
../saysdk/js-sdk/ - Package:
@sayna-ai/js-sdk - Platform: Browser-only (ES module)
- Documentation:
../saysdk/js-sdk/README.md - Purpose: Client-side voice room connections with automatic token handling and audio playback
- Location:
../saysdk/node-sdk/ - Package:
@sayna-ai/node-sdk - Platform: Node.js 18+
- Documentation:
../saysdk/node-sdk/README.md - Purpose: Server-side real-time voice processing via WebSocket and REST APIs
- Location:
../saysdk/python-sdk/ - Package:
sayna-client - Platform: Python 3.9+
- Documentation:
../saysdk/python-sdk/README.md - Purpose: Async Python client for voice API with full Pydantic type safety
Detailed development rules for each SDK are defined in the .cursor/rules/ directory:
| Rule File | Purpose |
|---|---|
.cursor/rules/node-sdk.mdc |
Node.js SDK patterns, API methods, error handling |
.cursor/rules/js-sdk.mdc |
Browser SDK lifecycle, token configuration, audio handling |
.cursor/rules/python-sdk.mdc |
Python async patterns, Pydantic models, callback registration |
.cursor/rules/api-reference.mdc |
REST/WebSocket API reference, authentication, room ownership |
Choose the appropriate SDK based on your implementation requirements:
- Building browser-based voice interfaces
- Implementing real-time voice chat in web applications
- Integrating voice capabilities into React, Vue, or other frontend frameworks
- Users need to speak directly through their browser
- Building backend services that process voice
- Implementing webhook handlers for SIP events
- Creating API endpoints that generate LiveKit tokens
- Processing audio server-side before streaming
- Integrating with telephony systems
- Building Python-based backend services
- Integrating with Python AI/ML pipelines
- Creating async voice processing applications
- Implementing webhook receivers in FastAPI or Flask
For web applications, combine the JavaScript SDK (frontend) with Node.js or Python SDK (backend):
- Backend generates LiveKit tokens using
getLiveKitToken()/get_livekit_token() - Frontend receives token via API endpoint
- Frontend connects to voice room using JavaScript SDK
- Backend handles webhooks and server-side processing
For telephony or headless applications, use Node.js or Python SDK:
- Server connects via WebSocket to Sayna API
- Server sends/receives audio for STT/TTS
- Server handles all voice processing logic
- LiveKit used for multi-participant scenarios
For complex applications requiring both client and server voice handling:
- Backend maintains WebSocket connection to Sayna
- Backend joins LiveKit rooms as participant
- Frontend users connect via JavaScript SDK
- Backend processes and responds to user speech
- OpenAPI Specification:
../sayna/docs/openapi.yaml- Complete REST API schema - WebSocket Protocol:
../sayna/docs/websocket.md- Message formats and flow - API Reference:
../sayna/docs/api-reference.md- Human-readable guide - Authentication:
../sayna/docs/authentication.md- JWT and auth strategies - LiveKit Integration:
../sayna/docs/livekit_integration.md- Room management - SIP Routing:
../sayna/docs/sip_routing.md- Webhook configuration
Located in ../sayna/docs/:
- STT Providers:
deepgram-stt.md,google-stt.md - TTS Providers:
elevenlabs-tts.md,google-tts.md,deepgram-tts.md,cartesia-tts.md
- Website: https://docs.sayna.ai
- Quickstart Guide:
../docs/quickstart.mdx - Architecture Overview:
../docs/guides/architecture.mdx - Operations Guide:
../docs/guides/operations.mdx
| Feature | JS SDK | Node.js SDK | Python SDK |
|---|---|---|---|
| REST API calls | Via backend | Full support | Full support |
| WebSocket streaming | N/A | Full support | Full support |
| LiveKit client | Built-in | Server-side | Server-side |
| Token generation | Via backend | getLiveKitToken() |
get_livekit_token() |
| STT processing | Via backend | onAudioInput() |
on_audio_input() |
| TTS synthesis | Via backend | speak() |
speak() |
| Webhook receiver | N/A | WebhookReceiver |
WebhookReceiver |
| SIP calling | Via backend | sipCall() |
sip_call() |
| Room management | Via LiveKit | Full REST API | Full REST API |
| Type safety | TypeScript | TypeScript | Pydantic |
The Sayna API provides two communication modes:
-
REST API - Stateless HTTP endpoints for:
- Health checks and voice catalog
- One-shot TTS synthesis
- LiveKit token generation
- Room and participant management
- SIP webhook configuration
- Recording downloads
-
WebSocket API - Stateful streaming for:
- Real-time STT transcription
- Streaming TTS audio
- Bidirectional messaging
- Session management
The standard connection pattern for WebSocket-based SDKs:
- Create client with configuration (URL, STT config, TTS config)
- Register event callbacks for responses
- Call
connect()to establish WebSocket - Wait for
readystate before sending commands - Use API methods for voice interaction
- Call
disconnect()for cleanup
Sayna enforces tenant isolation through room metadata:
- Room names are passed unchanged (no SDK-level prefixing)
- Server manages
metadata.auth_idfor ownership - Room listings are scoped to authenticated context
- 403 errors indicate cross-tenant access attempts
- 404 errors may mask access denial for security
For SIP event webhooks:
- HMAC-SHA256 signature verification required
- Timestamp-based replay protection (5-minute window)
- Raw request body must be used for signature validation
- Minimum 16-character secret required
- Use environment variables for API keys and secrets
- Never commit credentials to version control
- Separate configuration from business logic
- Document required environment variables
- Catch and handle SDK-specific error types
- Provide meaningful error messages to users
- Implement retry logic for transient failures
- Log errors appropriately for debugging
- Always disconnect WebSocket connections when finished
- Clean up audio resources and event listeners
- Handle component unmount in frontend frameworks
- Implement graceful shutdown in backend services
- Match sample rates between configuration and audio source
- Use appropriate encoding for the target platform
- Implement proper buffering for streaming
- Handle audio interruptions gracefully
- Test with local development server when possible
- Verify webhook signatures in test environment
- Test error scenarios and edge cases
- Validate type safety with TypeScript/mypy
When building examples, consider organizing by use case:
- Health check and voice catalog exploration
- Simple TTS synthesis
- Basic STT transcription
- Token generation for LiveKit
- Browser-based voice rooms
- Multi-participant conversations
- Real-time transcription display
- Voice command interfaces
- Inbound SIP call handling
- Outbound call initiation
- Call transfer workflows
- Webhook processing
- Voice-enabled chatbots
- Conversational AI interfaces
- Speech-to-text pipelines
- Text-to-speech response generation
For implementation details, always consult the SDK README files:
- JavaScript SDK:
../saysdk/js-sdk/README.md - Node.js SDK:
../saysdk/node-sdk/README.md - Python SDK:
../saysdk/python-sdk/README.md - SDK CLAUDE.md:
../saysdk/CLAUDE.md
For API specifications:
- OpenAPI Schema:
../sayna/docs/openapi.yaml - WebSocket Protocol:
../sayna/docs/websocket.md - Public Docs: https://docs.sayna.ai
Always use the latest SDK versions when building examples. Do not pin to specific versions unless there is a compatibility requirement.