Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 27 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
Flock is an advanced **DuckDB** extension that seamlessly integrates analytics with semantic analysis through declarative SQL queries. Designed for modern data analysis needs, Flock empowers users to work with structured and unstructured data, combining OLAP workflows with the capabilities of **LLMs** (Large Language Models) and **RAG** (Retrieval-Augmented Generation) pipelines.

To cite the project:

```
@article{10.14778/3750601.3750685,
author = {Dorbani, Anas and Yasser, Sunny and Lin, Jimmy and Mhedhbi, Amine},
Expand All @@ -66,21 +67,35 @@ To cite the project:
## 🔥 Features

- **Declarative SQL Interface**: Perform text generation, classification, summarization, filtering, and embedding generation using SQL queries.
- **Multi-Provider Support**: Easily integrate with OpenAI, Azure, and Ollama for your AI needs.
- **Multi-Provider Support**: Easily integrate with **OpenAI**, **Azure**, **Ollama**, and **Anthropic/Claude** for your AI needs.
- **End-to-End RAG Pipelines**: Enable retrieval and augmentation workflows for enhanced analytics.
- **Map and Reduce Functions**: Intuitive APIs for combining semantic tasks and data analytics directly in DuckDB.
- **Multimodal Analytics**: First-class support for text, images, and audio (via transcription) directly in SQL.
- **LLM Observability**: Built-in metrics tracking for tokens, latency, and call counts across Flock LLM functions.
- **Browser & WASM Support**: Run Flock-powered DuckDB workloads in the browser via DuckDB-WASM.

## ✨ Key Highlights (v0.4.0 and later)

- **Anthropic/Claude Provider**: Use Claude models as a **fourth provider**, alongside OpenAI, Azure, and Ollama, with full support for structured output and image analysis.
- **WASM Support**: Compile Flock as a DuckDB-WASM loadable extension to run in the browser, enabling client-side analytics and demos without server infrastructure.
- **LLM Metrics Tracking**: Track token usage, API latency, and execution time through dedicated functions like `flock_get_metrics()` for better cost and performance monitoring.
- **Audio Transcription**: Send audio inputs to OpenAI or Azure and obtain text transcripts using the same `context_columns` abstraction (with `type: 'audio'`).
- **DuckDB v1.4.4**: Upgraded to DuckDB **1.4.4**, inheriting the latest performance and stability improvements.
- **Architecture Improvements**: Centralized bind data and RAII-based storage guards reduce duplication and improve robustness across scalar and aggregate functions.
- **Developer Experience**: Interactive build scripts, improved extension CI tooling, and GitHub Copilot agent instructions streamline local development and contributions.

<p align="right"><a href="#readme-top">🔝 back to top</a></p>

## 🚀 Getting Started

### 📝 Prerequisites

1. **DuckDB**: Version 1.1.1 or later. Install it from the official [DuckDB installation guide](https://duckdb.org/docs/installation/).
1. **DuckDB**: Version **1.4.4 or later**. Install it from the official [DuckDB installation guide](https://duckdb.org/docs/installation/).
2. **Supported Providers**: Ensure you have credentials or API keys for at least one of the supported providers:
- OpenAI
- Azure
- Ollama
- Anthropic/Claude
3. **Supported OS**:
- Linux
- macOS
Expand Down Expand Up @@ -110,17 +125,20 @@ Flock is a **Community Extension** available directly from DuckDB's community ca
If you want to build Flock from source or contribute to the project, you can use our automated build script:

1. Clone the repository with submodules:

```bash
git clone --recursive https://github.com/dais-polymtl/flock.git
cd flock
```

Or if you've already cloned without submodules:

```bash
git submodule update --init --recursive
```

2. Run the build and run script:

```bash
./scripts/build_and_run.sh
```
Expand All @@ -136,6 +154,7 @@ If you want to build Flock from source or contribute to the project, you can use
3. The script will launch DuckDB with Flock extension ready to use. Make sure to check the [documentation](https://dais-polymtl.github.io/flock/docs/what-is-flock) for usage examples.

**Requirements for building from source:**

- CMake (3.5 or later)
- C++ compiler (GCC, Clang, or MSVC)
- Build system (Ninja or Make)
Expand All @@ -160,6 +179,9 @@ SELECT llm_complete(

Explore more usage examples in the [documentation](https://dais-polymtl.github.io/flock/docs/what-is-flock).

If you are a contributor or want to work on Flock itself, see the dedicated
[Developer Guide](https://dais-polymtl.github.io/flock/docs/developer-guide) for build, testing, and contribution details.

<p align="right"><a href="#readme-top">🔝 back to top</a></p>

## 🛣️ Roadmap
Expand Down Expand Up @@ -187,6 +209,6 @@ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file

## ✨ Team

This project is under active development by the [**Data & AI Systems Laboratory (DAIS Lab)**](https://github.com/dais-polymtl) at [**Polytechnique Montréal**](https://www.polymtl.ca/).
This project is under active development by the [**Data & AI Systems Laboratory (DAIS Lab)**](https://github.com/dais-polymtl) at **Polytechnique Montréal**.

<p align="right"><a href="#readme-top">🔝 back to top</a></p>
<p align="right"><a href="#readme-top">🔝 back to top</a></p>
233 changes: 233 additions & 0 deletions docs/docs/audio-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
title: Audio Transcription
sidebar_position: 7
---

# Audio Transcription in Flock

Flock supports audio transcription in SQL by sending audio inputs to compatible providers and returning text transcripts
that you can join, filter, and analyze like any other column.

import TOCInline from '@theme/TOCInline';

<TOCInline toc={toc} />

## Overview

With audio support you can:

- Transcribe spoken content (meetings, calls, notes) directly in DuckDB.
- Combine transcripts with structured data for analytics.
- Feed transcripts into `llm_complete`, `llm_filter`, or `llm_embedding` for downstream tasks (summarization,
classification, similarity search, RAG, etc.).

Flock uses the same `context_columns` abstraction as for images, but with `type: 'audio'` and a required
`transcription_model`.

## Supported Providers

Audio transcription is supported for:

- **OpenAI** – via the `audio/transcriptions` endpoint (e.g., Whisper models).
- **Azure OpenAI** – via the Azure audio transcription endpoint.

The following providers **do not** support audio transcription:

- **Anthropic/Claude** – not supported; calls will raise an error.
- **Ollama** – not supported; calls will raise an error.

Refer to the provider-specific getting-started guides for API key setup:

- [OpenAI](/docs/getting-started/openai)
- [Azure](/docs/getting-started/azure)
- [Anthropic](/docs/getting-started/anthropic) (for completions/vision only, no audio)

## Using Audio in Context Columns

To use audio in Flock functions, specify `type: 'audio'` and provide a `transcription_model` in the `context_columns`
array. The audio must be accessible as a file path or URL (depending on the provider).

### Context Column Structure for Audio

```sql
'context_columns': [
{
'data': audio_path,
'type': 'audio',
'transcription_model': 'whisper-1'
}
]
```

Each audio context column supports:

- **`data`** _(required)_: SQL column containing the audio source (local file path or URL, depending on provider).
- **`type`** _(required for audio)_: Must be set to `'audio'`.
- **`transcription_model`** _(required when `type = 'audio'`)_: Provider-specific transcription model name.
- **`name`** _(optional)_: Alias for referencing in prompts after transcription.

### Validation Rules

Flock enforces the following rules at bind time:

- If `type = 'audio'`, then `transcription_model` **must** be provided, otherwise an error is raised.
- If `transcription_model` is provided but `type` is not `'audio'`, Flock raises an error.

## Basic Transcription Example

The most common pattern is to transcribe audio into text, then store or further process the transcript.

```sql
-- Transcribe a list of audio files with OpenAI
SELECT
audio_id,
file_path,
llm_complete(
{'model_name': 'gpt-4o'},
{
'prompt': 'Transcribe the following audio file verbatim.',
'context_columns': [
{
'data': file_path,
'type': 'audio',
'transcription_model': 'whisper-1'
}
]
}
) AS transcript
FROM VALUES
(1, '/data/audio/meeting_01.mp3'),
(2, '/data/audio/meeting_02.mp3')
AS t(audio_id, file_path);
```

## Summarizing Transcripts

After transcription, you can treat the transcript as regular text and chain additional LLM calls.

```sql
WITH raw_transcripts AS (
SELECT
audio_id,
llm_complete(
{'model_name': 'gpt-4o'},
{
'prompt': 'Transcribe the following audio file verbatim.',
'context_columns': [
{
'data': file_path,
'type': 'audio',
'transcription_model': 'whisper-1'
}
]
}
) AS transcript
FROM VALUES
(1, '/data/audio/support_call_01.wav'),
(2, '/data/audio/support_call_02.wav')
AS t(audio_id, file_path)
)
SELECT
audio_id,
llm_complete(
{'model_name': 'gpt-4o'},
{
'prompt': 'Summarize this call in 3 bullet points.',
'context_columns': [
{'data': transcript, 'name': 'call'}
]
}
) AS call_summary
FROM raw_transcripts;
```

## Filtering Based on Audio Content

You can also use `llm_filter` to flag or select rows based on the audio’s content:

```sql
-- Flag calls that mention cancellations
SELECT
audio_id,
customer_id,
file_path
FROM VALUES
(1, 101, '/data/audio/call_01.wav'),
(2, 102, '/data/audio/call_02.wav'),
(3, 103, '/data/audio/call_03.wav')
AS t(audio_id, customer_id, file_path)
WHERE llm_filter(
{'model_name': 'gpt-4o'},
{
'prompt': 'Does this call mention cancelling a subscription? Answer true or false.',
'context_columns': [
{
'data': file_path,
'type': 'audio',
'transcription_model': 'whisper-1'
}
]
}
);
```

## Embeddings from Audio (via Text)

There is no direct audio embedding API in Flock. Instead, you can:

1. Transcribe audio into text.
2. Generate embeddings from the transcript using `llm_embedding`.

```sql
WITH transcripts AS (
SELECT
audio_id,
llm_complete(
{'model_name': 'gpt-4o'},
{
'prompt': 'Transcribe the following audio file.',
'context_columns': [
{
'data': file_path,
'type': 'audio',
'transcription_model': 'whisper-1'
}
]
}
) AS transcript
FROM VALUES
(1, '/data/audio/note_01.m4a'),
(2, '/data/audio/note_02.m4a')
AS t(audio_id, file_path)
),
audio_embeddings AS (
SELECT
audio_id,
llm_embedding(
{'model_name': 'text-embedding-3-small'},
{
'context_columns': [
{'data': transcript}
]
}
) AS embedding
FROM transcripts
)
SELECT * FROM audio_embeddings;
```

## Function Support for Audio

Audio transcription is available in the following functions (via `type: 'audio'` + `transcription_model`):

| Function | Audio Support | Description |
| --------------- | ------------- | -------------------------------------------- |
| `llm_complete` | ✅ Full | Transcribe and optionally transform content |
| `llm_filter` | ✅ Full | Filter rows based on audio-derived semantics |
| `llm_reduce` | ✅ Full | Summarize or aggregate transcripts |
| `llm_rerank` | ✅ Via text | Rerank based on derived text features |
| `llm_first` | ✅ Via text | Pick top row based on transcript criteria |
| `llm_last` | ✅ Via text | Pick bottom row based on transcript criteria |
| `llm_embedding` | ✅ Via text | Embeddings over transcripts (not raw audio) |

For image-specific workflows, see the [Image Support](/docs/image-support) page.
Loading
Loading