Virtual Vector Filesystem (vvfs)

A high-performance, AI-enhanced virtual filesystem implementation in Go with embedded LibSQL, designed for modern file organization and management with advanced indexing, concurrent operations, and machine learning capabilities.

Warning

This repository is a WIP, use are your own risk. The project is in pre-alpha stages, the API is subject to change. Check back frequently, the project is moving fast and will enter alpha hopefully soon.

🚀 Key Features

Embedded Database Engine

Single Binary: No external database server required
LibSQL: SQLite fork with modern features and vector support
Compiled Extensions: FTS5, JSON1, R*Tree, Vector, SQLean modules
Production Ready: Optimized for performance and reliability

Advanced Search & AI Features

Vector Search: Native LibSQL vector operations for semantic similarity
Full-Text Search: FTS5 virtual tables for document content indexing
Spatial Queries: R*Tree indexing for GPS-enabled files
Text Processing: SQLean text normalization and fuzzy matching
Statistical Analysis: SQLean statistical functions for search ranking

🌟 Features

Core Filesystem Operations

Hierarchical Directory Structures - Advanced tree-based file organization
Concurrent File Operations - High-performance parallel processing using goroutines
Intelligent File Organization - Automated categorization and workflow management
Conflict Resolution - Smart handling of file conflicts with multiple strategies
Git Integration - Seamless version control operations within the filesystem

Advanced Indexing & Search

Spatial Indexing - KD-tree based spatial indexing for efficient file location
Bitmap Indexing - Roaring bitmaps for ultra-fast set operations
Multi-dimensional Indexing - Eytzinger layout optimization for cache efficiency
Path-based Indexing - Hierarchical path indexing for rapid traversal

AI/ML Integration

Open-Source Models - Production-ready GGUF models (Qwen3, Llama 3.2)
Native GGUF Support - Direct llama.cpp integration for optimal performance
Hardware Acceleration - GPU/CPU optimization with automatic detection
Production Hardened - Comprehensive error handling and resource management
Commercial Friendly - Permissive licenses for commercial use (Apache 2.0)

Database & Persistence

Embedded LibSQL Integration - Single-binary embedded database
Workspace Management - Multi-workspace support with isolated configurations
Metadata Persistence - Comprehensive file metadata storage
Central Database - Shared metadata across workspaces

Developer Experience

Hexagonal Architecture - Clean, testable, and maintainable code structure
Comprehensive Testing - Extensive test suites with table-driven tests
Structured Logging - Zerolog integration for observability
Configuration Management - Viper-based configuration with multiple sources
CLI Integration - Command-line interface for filesystem operations

🚀 Quick Start

Prerequisites

Go 1.25 or later
SQLite3 development libraries (optional, for enhanced performance)

Installation

# Clone the repository
git clone https://github.com/ZanzyTHEbar/virtual-vectorfs.git
cd virtual-vectorfs

# Install dependencies
go mod download

# Run tests
go test ./...

# Build the project
go build ./...

Database Setup (Embedded LibSQL)

Virtual VectorFS uses embedded LibSQL with all advanced features compiled into the single binary.

Quick Start (Embedded)

# Build with all features
make build-libsql-amd64
make build-app-amd64

# Run the single binary
./bin/vvfs-amd64

Custom Build

# Build LibSQL static libraries
make build-libsql-amd64  # or build-libsql-arm64

# Build application
make build-app-amd64

# Run smoke tests
make smoke-test

Compiled Database Features

Virtual VectorFS includes these statically compiled features:

AI Model Setup

Virtual VectorFS uses open-source GGUF models with permissive licenses for commercial use.

Prerequisites

# Install Hugging Face CLI
pipx install huggingface_hub[cli]
# Or: pip install huggingface_hub[cli]

# Optional: Install pv for progress bars
sudo pacman -S pv  # Arch Linux
sudo apt install pv  # Ubuntu/Debian

Download Models

# Enhanced download (recommended) - parallel, checksums, caching
make models-download-v2

# Or basic sequential download
make models-download

# Verify models
make models-validate

# Check model information
make models-info

Model Specifications

Model	Purpose	Size	Context	License
Qwen3-Embed-0.6B	Text Embeddings	265 MB	2K tokens	Apache 2.0
Qwen3-Chat-1.7B	Conversational AI	1.2 GB	32K tokens	Apache 2.0
Llama 3.2 Vision	Vision-Language	1.9 GB	8K tokens	Llama 3.2 License

Total Size: ~3.4 GB (all models)

Enhanced Download Features (v2)

The models-download-v2 target provides advanced features:

✅ Parallel Downloads - 3x faster (10 min vs 30 min)
✅ SHA256 Verification - Automatic integrity checks
✅ Model Caching - CI/CD optimization (~3 sec on cached builds)
✅ Progress Bars - Real-time download status (requires pv)
✅ Update Detection - Check for new model versions
✅ Custom Repositories - Enterprise/air-gapped support

# Configuration options
PARALLEL_DOWNLOADS=5 make models-download-v2
MODEL_CACHE_DIR="/opt/cache" make models-download-v2
CUSTOM_MODEL_REPO="myorg/models" make models-download-v2

AI/ML Features

Core SQLite Features

✅ FTS5: Full-text search with virtual tables and ranking
✅ JSON1: Complete JSON manipulation and querying
✅ R*Tree: Spatial indexing for GPS coordinates

LibSQL Native Features

✅ Vector Operations: Native vector data types and similarity functions
✅ Vector Search: Cosine, L2, and other distance metrics
✅ Vector Indexing: Efficient storage and retrieval

SQLean Extensions (Compiled-in)

✅ Math: sqrt(), pow(), ceil(), floor(), exp(), log()
✅ Stats: median(), percentile(), stddev(), advanced aggregations
✅ Text: concat_ws(), trim(), text normalization functions
✅ Fuzzy: damerau_levenshtein(), jaro_winkler(), string similarity
✅ Crypto: sha256(), md5(), cryptographic hash functions

Advanced Usage Examples

Vector Search

-- Vector similarity search
SELECT * FROM files
WHERE vector_distance_cos(embedding, vector32('[1,2,3]')) < 0.8;

Full-Text Search

-- FTS5 content search
SELECT * FROM files_fts WHERE files_fts MATCH 'database vector';

Spatial Queries

-- R*Tree GPS queries
SELECT * FROM file_gps_rtree
WHERE min_lat <= 40.7 AND max_lat >= 40.7
  AND min_lon <= -74.0 AND max_lon >= -74.0;

SQLean Text Processing

-- Normalized text search
SELECT * FROM files
WHERE file_name_normalized LIKE concat_ws('%', 'report', '%');

Statistical Analysis

-- Statistical aggregations
SELECT median(vector_distance_cos(embedding, query_vector)) as median_distance
FROM search_results;

Basic Usage

package main

import (
    "context"
    "log"

    "github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/filesystem"
    "github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/db"
    "github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/ports"
)

func main() {
    // Create database provider
    centralDB, err := db.NewCentralDBProvider()
    if err != nil {
        log.Fatal(err)
    }
    defer centralDB.Close()

    // Create terminal interactor
    interactor := ports.NewTerminalInteractor()

    // Create filesystem manager
    fs, err := filesystem.New(interactor, centralDB)
    if err != nil {
        log.Fatal(err)
    }

    // Index a directory
    ctx := context.Background()
    err = fs.IndexDirectory(ctx, "/path/to/directory", filesystem.DefaultIndexOptions())
    if err != nil {
        log.Fatal(err)
    }

    // Build directory tree with analysis
    tree, analysis, err := fs.BuildDirectoryTreeWithAnalysis(ctx, "/path/to/directory", filesystem.DefaultTraversalOptions())
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Indexed %d files, %d directories", analysis.FileCount, analysis.DirectoryCount)
    _ = tree
}

📖 Documentation

Architecture Overview

The project follows a hexagonal architecture (ports and adapters) pattern:

├── ports/           # Application ports (interfaces)
├── filesystem/      # Core filesystem business logic
│   ├── interfaces/  # Service interfaces
│   ├── services/    # Service implementations
│   ├── types/       # Data types and DTOs
│   ├── options/     # Configuration options
│   └── common/      # Shared utilities
├── trees/           # Tree data structures and algorithms
├── indexing/        # Advanced indexing implementations
├── embedding/       # AI/ML embedding providers
├── db/              # Database providers and interfaces
├── memory/          # In-memory data structures
└── config/          # Configuration management

Key Components

Filesystem Services

DirectoryService - Directory indexing and tree building
FileOperations - File manipulation operations
OrganizationService - Intelligent file organization
ConflictResolver - File conflict detection and resolution
GitService - Git repository operations

Advanced Features

ConcurrentTraverser - High-performance parallel directory traversal
KDTree - Spatial indexing for file locations
RoaringBitmaps - Efficient set operations for file indexing
go-llama.cpp - Native GGUF model execution via llama.cpp bindings

🔧 Configuration

Create a configuration file at ~/.config/vvfs/config.toml:

[database]
type = "sqlite3"
dsn = "file:~/vvfs/central.db"

[filesystem]
cache_dir = "~/.config/vvfs/.cache"
max_concurrent_operations = 10

[embedding]
default_provider = "llama"
# Defaults to ollama model directory
model_path = "~/.config/vvfs/models"

[logging]
level = "info"
format = "json"

🧪 Testing

Run the comprehensive test suite:

# Run all tests
go test ./...

# Run tests with coverage
go test -cover ./...

# Run integration tests
go test -tags=integration ./...

# Run specific test
go test -run TestConcurrentTraverser ./vvfs/filesystem/

📊 Performance

The filesystem is optimized for high-performance operations:

Concurrent Processing - Utilizes all available CPU cores
Memory-Efficient - Streaming operations for large file sets
Cache-Optimized - Eytzinger layout for improved cache locality
Database Performance - Connection pooling and prepared statements

🤝 Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Write tests for your changes
Ensure all tests pass: go test ./...
Follow conventional commit format for commits
Submit a pull request

Development Guidelines

Code Style - Follow standard Go formatting (go fmt)
Testing - Write table-driven tests for new functionality
Documentation - Update documentation for API changes
Performance - Include benchmarks for performance-critical code
Security - Validate inputs and handle errors properly

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Roaring Bitmaps - For efficient bitmap operations
llama.cpp - For native GGUF model inference
go-llama.cpp - Go bindings for llama.cpp
Turso - For distributed SQLite database
Go Community - For the excellent standard library and ecosystem

🔗 Related Projects

go-fuse - FUSE filesystem implementation
bleve - Full-text search library
badger - Key-value database

📞 Support

For questions and support:

Open an issue on GitHub
Check the documentation for detailed guides
Join our community discussions

Built with ❤️ in Go

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
build		build
docs		docs
scripts		scripts
third_party/libsql		third_party/libsql
vvfs		vvfs
.gitignore		.gitignore
.golangci.yml		.golangci.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.example.yaml		config.example.yaml
go.mod		go.mod
go.sum		go.sum
goose.yaml		goose.yaml
sqlc.yaml		sqlc.yaml
vvfs.code-workspace		vvfs.code-workspace

License

ZanzyTHEbar/virtual-vectorfs

Folders and files

Latest commit

History

Repository files navigation

Virtual Vector Filesystem (vvfs)

🚀 Key Features

Embedded Database Engine

Advanced Search & AI Features

🌟 Features

Core Filesystem Operations

Advanced Indexing & Search

AI/ML Integration

Database & Persistence

Developer Experience

🚀 Quick Start

Prerequisites

Installation

Database Setup (Embedded LibSQL)

Quick Start (Embedded)

Custom Build

Compiled Database Features

AI Model Setup

Prerequisites

Download Models

Model Specifications

Enhanced Download Features (v2)

AI/ML Features

Core SQLite Features

LibSQL Native Features

SQLean Extensions (Compiled-in)

Advanced Usage Examples

Vector Search

Full-Text Search

Spatial Queries

SQLean Text Processing

Statistical Analysis

Basic Usage

📖 Documentation

Architecture Overview

Key Components

Filesystem Services

Advanced Features

🔧 Configuration

🧪 Testing

📊 Performance

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

🔗 Related Projects

📞 Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages