Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
92267ac
fixed the readme links (#210)
anasdorbani Sep 17, 2025
ead7a55
Add build script and update installation docs
anasdorbani Jan 23, 2026
20afc02
Update README.md
anasdorbani Jan 24, 2026
efdd00a
Update scripts/build_and_run.sh
anasdorbani Jan 24, 2026
005a480
Update scripts/build_and_run.sh
anasdorbani Jan 24, 2026
858f36d
Update scripts/build_and_run.sh
anasdorbani Jan 24, 2026
3c031a6
Added LLM metrics tracking (#233)
anasdorbani Feb 17, 2026
89c5bb8
Enhanced audio transcription support with improved null safety and er…
anasdorbani Feb 17, 2026
9f9a54d
Fix llm_filter to support prompts without context_columns (#220)
anasdorbani Feb 17, 2026
cc781a7
Refactored storage attachment with RAII guard and retry mechanism (#222)
anasdorbani Feb 17, 2026
5e23632
Refactor LLM functions to use centralized LlmFunctionBindData structu…
anasdorbani Feb 18, 2026
8dec44b
Fix duplicate check for CREATE MODEL and CREATE PROMPT to check all t…
anasdorbani Feb 18, 2026
23fa537
Upgrade to DuckDB v1.4.4 and Update GH Action (#234)
anasdorbani Feb 19, 2026
2325643
Add Anthropic/Claude provider support (#225)
hfmsio Feb 23, 2026
92d09f3
Fix and extend Anthropic provider support (#235)
anasdorbani Feb 24, 2026
cbf5645
Add WASM support (#229)
iZarrios Feb 28, 2026
0ca0b89
Enable WASM CI (#237)
anasdorbani Mar 2, 2026
decaaa1
Add `.github/copilot-instructions.md` for Copilot coding agent onboar…
Copilot Mar 2, 2026
17c61b3
Replace busy-wait with sleep_for in StorageAttachmentGuard::Wait (#240)
Copilot Mar 2, 2026
840a83a
Update src/core/config/prompt.cpp
anasdorbani Mar 2, 2026
6f49ca5
Remove per-row runtime_error from CastInputsToJson in aggregate funct…
Copilot Mar 2, 2026
b378d9f
Update src/core/config/config.cpp
anasdorbani Mar 3, 2026
4fd919c
Fixed config redundant declarations (#244)
anasdorbani Mar 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Copilot Instructions for Flock

## Overview

**Flock** is a C++ DuckDB extension that integrates LLMs (Large Language Models) and RAG (Retrieval-Augmented Generation) into DuckDB through declarative SQL queries. It supports OpenAI, Azure, and Ollama providers and enables semantic functions such as `llm_complete`, `llm_filter`, `llm_embedding`, and hybrid search directly from SQL.

- **Language**: C++17
- **Build system**: CMake (3.5+) with DuckDB's extension CI tools (`extension-ci-tools/`)
- **Dependency manager**: vcpkg (managed via `vcpkg.json`)
- **Key dependencies**: `nlohmann-json`, `curl`, `gtest` (see `vcpkg.json`)
- **DuckDB version targeted**: v1.4.4 (see `MainDistributionPipeline.yml`)

## Repository Layout

```
.
├── CMakeLists.txt # Top-level CMake (builds static + loadable extension, unit tests)
├── extension_config.cmake # DuckDB extension load config (references this repo)
├── vcpkg.json # vcpkg dependency manifest
├── Makefile # Convenience targets (lf_setup, lf_setup_dev)
├── scripts/
│ ├── build_and_run.sh # Interactive guided build + run script
│ ├── build_project.sh # Non-interactive build script
│ └── setup_vcpkg.sh # vcpkg bootstrap
├── src/
│ ├── flock_extension.cpp # Extension entry point (LoadInternal, FlockExtension::Load)
│ ├── include/flock/ # Public headers
│ ├── core/ # Config, common utilities
│ ├── functions/ # Scalar and aggregate SQL functions
│ ├── model_manager/ # Provider integrations (OpenAI, Azure, Ollama)
│ ├── prompt_manager/ # Prompt management
│ ├── secret_manager/ # API key/secret handling
│ ├── registry/ # Model/prompt registries
│ ├── metrics/ # Metrics collection
│ └── custom_parser/ # Custom SQL parser extension
├── test/
│ ├── unit/ # C++ unit tests (GTest), built via CMake
│ └── integration/ # Python integration tests (pytest, uv)
├── duckdb/ # DuckDB source submodule
├── extension-ci-tools/ # DuckDB extension CI/build helpers submodule
├── .clang-format # clang-format config (LLVM style, IndentWidth=4, ColumnLimit=0)
├── .cmake-format # cmake-format config
└── .pre-commit-config.yaml # Pre-commit hooks: clang-format v18.1.8, cmake-format v0.6.13
```

## Building

Always ensure submodules are initialised before building:

```bash
git submodule update --init --recursive
```

### Setup vcpkg (first time or after clean)

```bash
bash scripts/setup_vcpkg.sh
export VCPKG_TOOLCHAIN_PATH="$(pwd)/vcpkg/scripts/buildsystems/vcpkg.cmake"
```

### Release build

```bash
mkdir -p build/release
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release \
-DEXTENSION_STATIC_BUILD=1 \
-DVCPKG_BUILD=1 \
-DCMAKE_TOOLCHAIN_FILE="$VCPKG_TOOLCHAIN_PATH" \
-DVCPKG_MANIFEST_DIR="$(pwd)" \
-DDUCKDB_EXTENSION_CONFIGS="$(pwd)/extension_config.cmake" \
-S duckdb -B build/release
cmake --build build/release --config Release
```

Use `-G "Unix Makefiles"` if Ninja is not available. The DuckDB binary will be at `build/release/duckdb`.

### Debug build

Replace `Release` with `Debug` and `build/release` with `build/debug` in the commands above. Debug builds enable AddressSanitizer automatically.

## Running Unit Tests

Unit tests use GTest and are built as part of the CMake build. After building:

```bash
cd build/release # or build/debug
ctest --output-on-failure
```

Or run the test binary directly: `./flock_tests`

## Running Integration Tests

Integration tests use Python/pytest and are in `test/integration/`. They require a running DuckDB binary with the Flock extension loaded and provider credentials set in a `.env` file (see `test/integration/.env-example`).

```bash
cd test/integration
uv sync # install Python deps (requires uv)
uv run pytest
```

## Code Style & Linting

- **C++**: `clang-format` v18.1.8 (config in `.clang-format`, LLVM-based, indent=4, no column limit)
- **CMake**: `cmake-format` v0.6.13 (config in `.cmake-format`)
- Run pre-commit on staged files: `pre-commit run` or `pre-commit run --all-files`
- Install dev tools: `make lf_setup_dev`

Always run `clang-format` on modified C++ files before committing. The CI pipeline enforces both `format` and `tidy` checks (`code-quality-check` job in `MainDistributionPipeline.yml`).

## CI Pipeline

Defined in `.github/workflows/MainDistributionPipeline.yml`:

- **duckdb-stable-build**: Builds extension binaries for all platforms using DuckDB v1.4.4 CI tools.
- **code-quality-check**: Runs `clang-format` and `clang-tidy` checks.

Triggered on push to `main`/`dev` when `src/`, `test/`, `CMakeLists.txt`, or workflow files change, and on `workflow_dispatch`.

## Key Notes

- The extension entry point is `src/flock_extension.cpp` → `FlockExtension::Load` → `LoadInternal`.
- All SQL functions are registered via `flock::Config::Configure(loader)` in `src/core/config/`.
- New scalar functions go in `src/functions/scalar/`; aggregate functions in `src/functions/aggregate/`.
- Provider implementations live in `src/model_manager/providers/`.
- Public headers for the extension are under `src/include/flock/`.
- The `duckdb/` and `extension-ci-tools/` directories are git submodules — do not modify them.
1 change: 0 additions & 1 deletion .github/workflows/MainDistributionPipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ jobs:
duckdb_version: v1.4.4
ci_tools_version: v1.4.4
extension_name: flock
exclude_archs: 'wasm_mvp;wasm_threads;wasm_eh'

code-quality-check:
name: Code Quality Check
Expand Down
26 changes: 18 additions & 8 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ include_directories(src/include)
add_subdirectory(src)

# Find dependencies
find_package(CURL REQUIRED)
if(NOT EMSCRIPTEN)
find_package(CURL REQUIRED)
endif()
find_package(nlohmann_json CONFIG REQUIRED)

# Build the DuckDB static and loadable extensions
Expand All @@ -43,12 +45,18 @@ if(CMAKE_BUILD_TYPE STREQUAL "Debug")
endif()

# Link libraries for the static extension
target_link_libraries(${EXTENSION_NAME} CURL::libcurl
nlohmann_json::nlohmann_json)
if(NOT EMSCRIPTEN)
target_link_libraries(${EXTENSION_NAME} CURL::libcurl)
endif()
target_link_libraries(${EXTENSION_NAME} nlohmann_json::nlohmann_json)

# Link libraries for the loadable extension
target_link_libraries(${LOADABLE_EXTENSION_NAME} CURL::libcurl
nlohmann_json::nlohmann_json)
if(NOT EMSCRIPTEN)
target_link_libraries(${LOADABLE_EXTENSION_NAME} CURL::libcurl)
endif()
target_link_libraries(${LOADABLE_EXTENSION_NAME} nlohmann_json::nlohmann_json)

# WASM builds use EM_JS with synchronous XMLHttpRequest for HTTP

# Install the extension
install(
Expand All @@ -63,6 +71,8 @@ if(CMAKE_BUILD_TYPE STREQUAL "Coverage")
add_link_options(-fprofile-instr-generate -fcoverage-mapping)
endif()

# Add the test directory
enable_testing()
add_subdirectory(test/unit)
if(NOT EMSCRIPTEN)
# Add the test directory if not on WASM
enable_testing()
add_subdirectory(test/unit)
endif()
1 change: 1 addition & 0 deletions _codeql_detected_source_root
103 changes: 65 additions & 38 deletions src/core/config/config.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
#include "flock/core/config.hpp"
#include "filesystem.hpp"
#include "duckdb/common/file_system.hpp"
#include "flock/secret_manager/secret_manager.hpp"
#include <chrono>
#include <thread>
#include <fmt/format.h>

namespace flock {
Expand All @@ -11,15 +12,15 @@ duckdb::DatabaseInstance* Config::db;
std::string Config::get_schema_name() { return "flock_config"; }

std::filesystem::path Config::get_global_storage_path() {
#ifdef _WIN32
const char* homeDir = getenv("USERPROFILE");
#ifdef __EMSCRIPTEN__
return std::filesystem::path("opfs://flock_data/flock.db");
#else
const char* homeDir = getenv("HOME");
#endif
if (homeDir == nullptr) {
const auto& home = duckdb::FileSystem::GetHomeDirectory(nullptr);
if (home.empty()) {
throw std::runtime_error("Could not find home directory");
}
return std::filesystem::path(homeDir) / ".duckdb" / "flock_storage" / "flock.db";
return std::filesystem::path(home) / ".duckdb" / "flock_storage" / "flock.db";
#endif
}

duckdb::Connection Config::GetConnection(duckdb::DatabaseInstance* db) {
Expand All @@ -30,42 +31,58 @@ duckdb::Connection Config::GetConnection(duckdb::DatabaseInstance* db) {
return con;
}

duckdb::Connection Config::GetGlobalConnection() {
const duckdb::DuckDB db(Config::get_global_storage_path().string());
duckdb::Connection con(*db.instance);
return con;
}

void Config::SetupGlobalStorageLocation() {
const auto flock_global_path = get_global_storage_path();
const auto flockDir = flock_global_path.parent_path();
if (!std::filesystem::exists(flockDir)) {
try {
std::filesystem::create_directories(flockDir);
} catch (const std::filesystem::filesystem_error& e) {
std::cerr << "Error creating directories: " << e.what() << std::endl;
void Config::SetupGlobalStorageLocation(duckdb::DatabaseInstance* db_instance) {
if (!db_instance) {
return;
}
#ifdef __EMSCRIPTEN__
// WASM: Client registers OPFS files before loading extension
return;
#endif
auto& fs = duckdb::FileSystem::GetFileSystem(*db_instance);
const std::string dir_path = get_global_storage_path().parent_path().string();
try {
if (!dir_path.empty() && !fs.DirectoryExists(dir_path)) {
fs.CreateDirectory(dir_path);
}
} catch (const std::exception& e) {
std::cerr << "Error creating directory " << dir_path << ": " << e.what() << std::endl;
}
}

void Config::ConfigSchema(duckdb::Connection& con, std::string& schema_name) {
auto result = con.Query(duckdb_fmt::format(" SELECT * "
" FROM information_schema.schemata "
" WHERE schema_name = '{}'; ",
schema_name));
if (result->RowCount() == 0) {
con.Query(duckdb_fmt::format("CREATE SCHEMA {};", schema_name));
}
con.Query(duckdb_fmt::format("CREATE SCHEMA IF NOT EXISTS {};", schema_name));
}

void Config::ConfigureGlobal() {
auto con = Config::GetGlobalConnection();
void Config::ConfigureGlobal(duckdb::DatabaseInstance* db_instance) {
if (!db_instance) {
return;
}
// Use the already-attached flock_storage database
auto con = Config::GetConnection(db_instance);
// Switch to flock_storage so ConfigureTables creates tables there.
// We switch back to memory afterward to avoid leaving the connection
// pointing at flock_storage, which would affect subsequent queries.
auto use_result = con.Query("USE flock_storage;");
if (use_result->HasError()) {
std::cerr << "Failed to USE flock_storage: " << use_result->GetError() << std::endl;
return;
}
ConfigureTables(con, ConfigType::GLOBAL);
con.Query("USE memory;");
}

void Config::ConfigureLocal(duckdb::DatabaseInstance& db) {
auto con = Config::GetConnection(&db);
ConfigureTables(con, ConfigType::LOCAL);

const std::string global_path = get_global_storage_path().string();
auto result = con.Query(
duckdb_fmt::format("ATTACH DATABASE '{}' AS flock_storage;", global_path));
if (result->HasError()) {
std::cerr << "Failed to attach flock_storage: " << result->GetError() << std::endl;
}
}

void Config::ConfigureTables(duckdb::Connection& con, const ConfigType type) {
Expand All @@ -81,11 +98,23 @@ void Config::Configure(duckdb::ExtensionLoader& loader) {
Registry::Register(loader);
SecretManager::Register(loader);
auto& db = loader.GetDatabaseInstance();
if (const auto db_path = db.config.options.database_path; db_path != get_global_storage_path().string()) {
SetupGlobalStorageLocation();
ConfigureGlobal();
const auto db_path = db.config.options.database_path;
const std::string global_path = get_global_storage_path().string();

// If the main database is already at the global storage path, still attach for WASM :memory: case
if (db_path == global_path) {
auto con = GetConnection(&db);
ConfigureTables(con, ConfigType::LOCAL);
ConfigureTables(con, ConfigType::GLOBAL);
#ifdef __EMSCRIPTEN__
ConfigureLocal(db);
#endif
return;
}

SetupGlobalStorageLocation(&db);
ConfigureLocal(db);
ConfigureGlobal(&db);
}

void Config::AttachToGlobalStorage(duckdb::Connection& con, bool read_only) {
Expand Down Expand Up @@ -116,11 +145,7 @@ bool Config::StorageAttachmentGuard::TryDetach() {
}

void Config::StorageAttachmentGuard::Wait(int milliseconds) {
auto start = std::chrono::steady_clock::now();
auto duration = std::chrono::milliseconds(milliseconds);
while (std::chrono::steady_clock::now() - start < duration) {
// Busy-wait until the specified duration has elapsed
}
std::this_thread::sleep_for(std::chrono::milliseconds(milliseconds));
}

Config::StorageAttachmentGuard::StorageAttachmentGuard(duckdb::Connection& con, bool read_only)
Expand All @@ -130,7 +155,9 @@ Config::StorageAttachmentGuard::StorageAttachmentGuard(duckdb::Connection& con,
attached = true;
return;
}
Wait(RETRY_DELAY_MS);
if (attempt < MAX_RETRIES - 1) {
Wait(RETRY_DELAY_MS);
}
}
Config::AttachToGlobalStorage(connection, read_only);
attached = true;
Expand Down
Loading
Loading