Skip to content

ZanzyTHEbar/virtual-vectorfs

Repository files navigation

Virtual Vector Filesystem (vvfs)

Go Version License: MIT GitHub Repo

A high-performance, AI-enhanced virtual filesystem implementation in Go with embedded LibSQL, designed for modern file organization and management with advanced indexing, concurrent operations, and machine learning capabilities.

Warning

This repository is a WIP, use are your own risk. The project is in pre-alpha stages, the API is subject to change. Check back frequently, the project is moving fast and will enter alpha hopefully soon.

🚀 Key Features

Embedded Database Engine

  • Single Binary: No external database server required
  • LibSQL: SQLite fork with modern features and vector support
  • Compiled Extensions: FTS5, JSON1, R*Tree, Vector, SQLean modules
  • Production Ready: Optimized for performance and reliability

Advanced Search & AI Features

  • Vector Search: Native LibSQL vector operations for semantic similarity
  • Full-Text Search: FTS5 virtual tables for document content indexing
  • Spatial Queries: R*Tree indexing for GPS-enabled files
  • Text Processing: SQLean text normalization and fuzzy matching
  • Statistical Analysis: SQLean statistical functions for search ranking

🌟 Features

Core Filesystem Operations

  • Hierarchical Directory Structures - Advanced tree-based file organization
  • Concurrent File Operations - High-performance parallel processing using goroutines
  • Intelligent File Organization - Automated categorization and workflow management
  • Conflict Resolution - Smart handling of file conflicts with multiple strategies
  • Git Integration - Seamless version control operations within the filesystem

Advanced Indexing & Search

  • Spatial Indexing - KD-tree based spatial indexing for efficient file location
  • Bitmap Indexing - Roaring bitmaps for ultra-fast set operations
  • Multi-dimensional Indexing - Eytzinger layout optimization for cache efficiency
  • Path-based Indexing - Hierarchical path indexing for rapid traversal

AI/ML Integration

  • Open-Source Models - Production-ready GGUF models (Qwen3, Llama 3.2)
  • Native GGUF Support - Direct llama.cpp integration for optimal performance
  • Hardware Acceleration - GPU/CPU optimization with automatic detection
  • Production Hardened - Comprehensive error handling and resource management
  • Commercial Friendly - Permissive licenses for commercial use (Apache 2.0)

Database & Persistence

  • Embedded LibSQL Integration - Single-binary embedded database
  • Workspace Management - Multi-workspace support with isolated configurations
  • Metadata Persistence - Comprehensive file metadata storage
  • Central Database - Shared metadata across workspaces

Developer Experience

  • Hexagonal Architecture - Clean, testable, and maintainable code structure
  • Comprehensive Testing - Extensive test suites with table-driven tests
  • Structured Logging - Zerolog integration for observability
  • Configuration Management - Viper-based configuration with multiple sources
  • CLI Integration - Command-line interface for filesystem operations

🚀 Quick Start

Prerequisites

  • Go 1.25 or later
  • SQLite3 development libraries (optional, for enhanced performance)

Installation

# Clone the repository
git clone https://github.com/ZanzyTHEbar/virtual-vectorfs.git
cd virtual-vectorfs

# Install dependencies
go mod download

# Run tests
go test ./...

# Build the project
go build ./...

Database Setup (Embedded LibSQL)

Virtual VectorFS uses embedded LibSQL with all advanced features compiled into the single binary.

Quick Start (Embedded)

# Build with all features
make build-libsql-amd64
make build-app-amd64

# Run the single binary
./bin/vvfs-amd64

Custom Build

# Build LibSQL static libraries
make build-libsql-amd64  # or build-libsql-arm64

# Build application
make build-app-amd64

# Run smoke tests
make smoke-test

Compiled Database Features

Virtual VectorFS includes these statically compiled features:

AI Model Setup

Virtual VectorFS uses open-source GGUF models with permissive licenses for commercial use.

Prerequisites

# Install Hugging Face CLI
pipx install huggingface_hub[cli]
# Or: pip install huggingface_hub[cli]

# Optional: Install pv for progress bars
sudo pacman -S pv  # Arch Linux
sudo apt install pv  # Ubuntu/Debian

Download Models

# Enhanced download (recommended) - parallel, checksums, caching
make models-download-v2

# Or basic sequential download
make models-download

# Verify models
make models-validate

# Check model information
make models-info

Model Specifications

Model Purpose Size Context License
Qwen3-Embed-0.6B Text Embeddings 265 MB 2K tokens Apache 2.0
Qwen3-Chat-1.7B Conversational AI 1.2 GB 32K tokens Apache 2.0
Llama 3.2 Vision Vision-Language 1.9 GB 8K tokens Llama 3.2 License

Total Size: ~3.4 GB (all models)

Enhanced Download Features (v2)

The models-download-v2 target provides advanced features:

  • Parallel Downloads - 3x faster (10 min vs 30 min)
  • SHA256 Verification - Automatic integrity checks
  • Model Caching - CI/CD optimization (~3 sec on cached builds)
  • Progress Bars - Real-time download status (requires pv)
  • Update Detection - Check for new model versions
  • Custom Repositories - Enterprise/air-gapped support
# Configuration options
PARALLEL_DOWNLOADS=5 make models-download-v2
MODEL_CACHE_DIR="/opt/cache" make models-download-v2
CUSTOM_MODEL_REPO="myorg/models" make models-download-v2

AI/ML Features

Core SQLite Features

  • FTS5: Full-text search with virtual tables and ranking
  • JSON1: Complete JSON manipulation and querying
  • R*Tree: Spatial indexing for GPS coordinates

LibSQL Native Features

  • Vector Operations: Native vector data types and similarity functions
  • Vector Search: Cosine, L2, and other distance metrics
  • Vector Indexing: Efficient storage and retrieval

SQLean Extensions (Compiled-in)

  • Math: sqrt(), pow(), ceil(), floor(), exp(), log()
  • Stats: median(), percentile(), stddev(), advanced aggregations
  • Text: concat_ws(), trim(), text normalization functions
  • Fuzzy: damerau_levenshtein(), jaro_winkler(), string similarity
  • Crypto: sha256(), md5(), cryptographic hash functions

Advanced Usage Examples

Vector Search

-- Vector similarity search
SELECT * FROM files
WHERE vector_distance_cos(embedding, vector32('[1,2,3]')) < 0.8;

Full-Text Search

-- FTS5 content search
SELECT * FROM files_fts WHERE files_fts MATCH 'database vector';

Spatial Queries

-- R*Tree GPS queries
SELECT * FROM file_gps_rtree
WHERE min_lat <= 40.7 AND max_lat >= 40.7
  AND min_lon <= -74.0 AND max_lon >= -74.0;

SQLean Text Processing

-- Normalized text search
SELECT * FROM files
WHERE file_name_normalized LIKE concat_ws('%', 'report', '%');

Statistical Analysis

-- Statistical aggregations
SELECT median(vector_distance_cos(embedding, query_vector)) as median_distance
FROM search_results;

Basic Usage

package main

import (
    "context"
    "log"

    "github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/filesystem"
    "github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/db"
    "github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/ports"
)

func main() {
    // Create database provider
    centralDB, err := db.NewCentralDBProvider()
    if err != nil {
        log.Fatal(err)
    }
    defer centralDB.Close()

    // Create terminal interactor
    interactor := ports.NewTerminalInteractor()

    // Create filesystem manager
    fs, err := filesystem.New(interactor, centralDB)
    if err != nil {
        log.Fatal(err)
    }

    // Index a directory
    ctx := context.Background()
    err = fs.IndexDirectory(ctx, "/path/to/directory", filesystem.DefaultIndexOptions())
    if err != nil {
        log.Fatal(err)
    }

    // Build directory tree with analysis
    tree, analysis, err := fs.BuildDirectoryTreeWithAnalysis(ctx, "/path/to/directory", filesystem.DefaultTraversalOptions())
    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Indexed %d files, %d directories", analysis.FileCount, analysis.DirectoryCount)
    _ = tree
}

📖 Documentation

Architecture Overview

The project follows a hexagonal architecture (ports and adapters) pattern:

├── ports/           # Application ports (interfaces)
├── filesystem/      # Core filesystem business logic
│   ├── interfaces/  # Service interfaces
│   ├── services/    # Service implementations
│   ├── types/       # Data types and DTOs
│   ├── options/     # Configuration options
│   └── common/      # Shared utilities
├── trees/           # Tree data structures and algorithms
├── indexing/        # Advanced indexing implementations
├── embedding/       # AI/ML embedding providers
├── db/              # Database providers and interfaces
├── memory/          # In-memory data structures
└── config/          # Configuration management

Key Components

Filesystem Services

  • DirectoryService - Directory indexing and tree building
  • FileOperations - File manipulation operations
  • OrganizationService - Intelligent file organization
  • ConflictResolver - File conflict detection and resolution
  • GitService - Git repository operations

Advanced Features

  • ConcurrentTraverser - High-performance parallel directory traversal
  • KDTree - Spatial indexing for file locations
  • RoaringBitmaps - Efficient set operations for file indexing
  • go-llama.cpp - Native GGUF model execution via llama.cpp bindings

🔧 Configuration

Create a configuration file at ~/.config/vvfs/config.toml:

[database]
type = "sqlite3"
dsn = "file:~/vvfs/central.db"

[filesystem]
cache_dir = "~/.config/vvfs/.cache"
max_concurrent_operations = 10

[embedding]
default_provider = "llama"
# Defaults to ollama model directory
model_path = "~/.config/vvfs/models"

[logging]
level = "info"
format = "json"

🧪 Testing

Run the comprehensive test suite:

# Run all tests
go test ./...

# Run tests with coverage
go test -cover ./...

# Run integration tests
go test -tags=integration ./...

# Run specific test
go test -run TestConcurrentTraverser ./vvfs/filesystem/

📊 Performance

The filesystem is optimized for high-performance operations:

  • Concurrent Processing - Utilizes all available CPU cores
  • Memory-Efficient - Streaming operations for large file sets
  • Cache-Optimized - Eytzinger layout for improved cache locality
  • Database Performance - Connection pooling and prepared statements

🤝 Contributing

We welcome contributions! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Write tests for your changes
  4. Ensure all tests pass: go test ./...
  5. Follow conventional commit format for commits
  6. Submit a pull request

Development Guidelines

  • Code Style - Follow standard Go formatting (go fmt)
  • Testing - Write table-driven tests for new functionality
  • Documentation - Update documentation for API changes
  • Performance - Include benchmarks for performance-critical code
  • Security - Validate inputs and handle errors properly

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Roaring Bitmaps - For efficient bitmap operations
  • llama.cpp - For native GGUF model inference
  • go-llama.cpp - Go bindings for llama.cpp
  • Turso - For distributed SQLite database
  • Go Community - For the excellent standard library and ecosystem

🔗 Related Projects

  • go-fuse - FUSE filesystem implementation
  • bleve - Full-text search library
  • badger - Key-value database

📞 Support

For questions and support:

  • Open an issue on GitHub
  • Check the documentation for detailed guides
  • Join our community discussions

Built with ❤️ in Go

About

Virtual filesystem implementation in Go with spatial indexing and concurrent operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published