A high-performance, AI-enhanced virtual filesystem implementation in Go with embedded LibSQL, designed for modern file organization and management with advanced indexing, concurrent operations, and machine learning capabilities.
Warning
This repository is a WIP, use are your own risk. The project is in pre-alpha stages, the API is subject to change. Check back frequently, the project is moving fast and will enter alpha hopefully soon.
- Single Binary: No external database server required
- LibSQL: SQLite fork with modern features and vector support
- Compiled Extensions: FTS5, JSON1, R*Tree, Vector, SQLean modules
- Production Ready: Optimized for performance and reliability
- Vector Search: Native LibSQL vector operations for semantic similarity
- Full-Text Search: FTS5 virtual tables for document content indexing
- Spatial Queries: R*Tree indexing for GPS-enabled files
- Text Processing: SQLean text normalization and fuzzy matching
- Statistical Analysis: SQLean statistical functions for search ranking
- Hierarchical Directory Structures - Advanced tree-based file organization
- Concurrent File Operations - High-performance parallel processing using goroutines
- Intelligent File Organization - Automated categorization and workflow management
- Conflict Resolution - Smart handling of file conflicts with multiple strategies
- Git Integration - Seamless version control operations within the filesystem
- Spatial Indexing - KD-tree based spatial indexing for efficient file location
- Bitmap Indexing - Roaring bitmaps for ultra-fast set operations
- Multi-dimensional Indexing - Eytzinger layout optimization for cache efficiency
- Path-based Indexing - Hierarchical path indexing for rapid traversal
- Open-Source Models - Production-ready GGUF models (Qwen3, Llama 3.2)
- Native GGUF Support - Direct llama.cpp integration for optimal performance
- Hardware Acceleration - GPU/CPU optimization with automatic detection
- Production Hardened - Comprehensive error handling and resource management
- Commercial Friendly - Permissive licenses for commercial use (Apache 2.0)
- Embedded LibSQL Integration - Single-binary embedded database
- Workspace Management - Multi-workspace support with isolated configurations
- Metadata Persistence - Comprehensive file metadata storage
- Central Database - Shared metadata across workspaces
- Hexagonal Architecture - Clean, testable, and maintainable code structure
- Comprehensive Testing - Extensive test suites with table-driven tests
- Structured Logging - Zerolog integration for observability
- Configuration Management - Viper-based configuration with multiple sources
- CLI Integration - Command-line interface for filesystem operations
- Go 1.25 or later
- SQLite3 development libraries (optional, for enhanced performance)
# Clone the repository
git clone https://github.com/ZanzyTHEbar/virtual-vectorfs.git
cd virtual-vectorfs
# Install dependencies
go mod download
# Run tests
go test ./...
# Build the project
go build ./...
Virtual VectorFS uses embedded LibSQL with all advanced features compiled into the single binary.
# Build with all features
make build-libsql-amd64
make build-app-amd64
# Run the single binary
./bin/vvfs-amd64
# Build LibSQL static libraries
make build-libsql-amd64 # or build-libsql-arm64
# Build application
make build-app-amd64
# Run smoke tests
make smoke-test
Virtual VectorFS includes these statically compiled features:
Virtual VectorFS uses open-source GGUF models with permissive licenses for commercial use.
# Install Hugging Face CLI
pipx install huggingface_hub[cli]
# Or: pip install huggingface_hub[cli]
# Optional: Install pv for progress bars
sudo pacman -S pv # Arch Linux
sudo apt install pv # Ubuntu/Debian
# Enhanced download (recommended) - parallel, checksums, caching
make models-download-v2
# Or basic sequential download
make models-download
# Verify models
make models-validate
# Check model information
make models-info
Model | Purpose | Size | Context | License |
---|---|---|---|---|
Qwen3-Embed-0.6B | Text Embeddings | 265 MB | 2K tokens | Apache 2.0 |
Qwen3-Chat-1.7B | Conversational AI | 1.2 GB | 32K tokens | Apache 2.0 |
Llama 3.2 Vision | Vision-Language | 1.9 GB | 8K tokens | Llama 3.2 License |
Total Size: ~3.4 GB (all models)
The models-download-v2
target provides advanced features:
- ✅ Parallel Downloads - 3x faster (10 min vs 30 min)
- ✅ SHA256 Verification - Automatic integrity checks
- ✅ Model Caching - CI/CD optimization (~3 sec on cached builds)
- ✅ Progress Bars - Real-time download status (requires
pv
) - ✅ Update Detection - Check for new model versions
- ✅ Custom Repositories - Enterprise/air-gapped support
# Configuration options
PARALLEL_DOWNLOADS=5 make models-download-v2
MODEL_CACHE_DIR="/opt/cache" make models-download-v2
CUSTOM_MODEL_REPO="myorg/models" make models-download-v2
- ✅ FTS5: Full-text search with virtual tables and ranking
- ✅ JSON1: Complete JSON manipulation and querying
- ✅ R*Tree: Spatial indexing for GPS coordinates
- ✅ Vector Operations: Native vector data types and similarity functions
- ✅ Vector Search: Cosine, L2, and other distance metrics
- ✅ Vector Indexing: Efficient storage and retrieval
- ✅ Math:
sqrt()
,pow()
,ceil()
,floor()
,exp()
,log()
- ✅ Stats:
median()
,percentile()
,stddev()
, advanced aggregations - ✅ Text:
concat_ws()
,trim()
, text normalization functions - ✅ Fuzzy:
damerau_levenshtein()
,jaro_winkler()
, string similarity - ✅ Crypto:
sha256()
,md5()
, cryptographic hash functions
-- Vector similarity search
SELECT * FROM files
WHERE vector_distance_cos(embedding, vector32('[1,2,3]')) < 0.8;
-- FTS5 content search
SELECT * FROM files_fts WHERE files_fts MATCH 'database vector';
-- R*Tree GPS queries
SELECT * FROM file_gps_rtree
WHERE min_lat <= 40.7 AND max_lat >= 40.7
AND min_lon <= -74.0 AND max_lon >= -74.0;
-- Normalized text search
SELECT * FROM files
WHERE file_name_normalized LIKE concat_ws('%', 'report', '%');
-- Statistical aggregations
SELECT median(vector_distance_cos(embedding, query_vector)) as median_distance
FROM search_results;
package main
import (
"context"
"log"
"github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/filesystem"
"github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/db"
"github.com/ZanzyTHEbar/virtual-vectorfs/vvfs/ports"
)
func main() {
// Create database provider
centralDB, err := db.NewCentralDBProvider()
if err != nil {
log.Fatal(err)
}
defer centralDB.Close()
// Create terminal interactor
interactor := ports.NewTerminalInteractor()
// Create filesystem manager
fs, err := filesystem.New(interactor, centralDB)
if err != nil {
log.Fatal(err)
}
// Index a directory
ctx := context.Background()
err = fs.IndexDirectory(ctx, "/path/to/directory", filesystem.DefaultIndexOptions())
if err != nil {
log.Fatal(err)
}
// Build directory tree with analysis
tree, analysis, err := fs.BuildDirectoryTreeWithAnalysis(ctx, "/path/to/directory", filesystem.DefaultTraversalOptions())
if err != nil {
log.Fatal(err)
}
log.Printf("Indexed %d files, %d directories", analysis.FileCount, analysis.DirectoryCount)
_ = tree
}
The project follows a hexagonal architecture (ports and adapters) pattern:
├── ports/ # Application ports (interfaces)
├── filesystem/ # Core filesystem business logic
│ ├── interfaces/ # Service interfaces
│ ├── services/ # Service implementations
│ ├── types/ # Data types and DTOs
│ ├── options/ # Configuration options
│ └── common/ # Shared utilities
├── trees/ # Tree data structures and algorithms
├── indexing/ # Advanced indexing implementations
├── embedding/ # AI/ML embedding providers
├── db/ # Database providers and interfaces
├── memory/ # In-memory data structures
└── config/ # Configuration management
- DirectoryService - Directory indexing and tree building
- FileOperations - File manipulation operations
- OrganizationService - Intelligent file organization
- ConflictResolver - File conflict detection and resolution
- GitService - Git repository operations
- ConcurrentTraverser - High-performance parallel directory traversal
- KDTree - Spatial indexing for file locations
- RoaringBitmaps - Efficient set operations for file indexing
- go-llama.cpp - Native GGUF model execution via llama.cpp bindings
Create a configuration file at ~/.config/vvfs/config.toml
:
[database]
type = "sqlite3"
dsn = "file:~/vvfs/central.db"
[filesystem]
cache_dir = "~/.config/vvfs/.cache"
max_concurrent_operations = 10
[embedding]
default_provider = "llama"
# Defaults to ollama model directory
model_path = "~/.config/vvfs/models"
[logging]
level = "info"
format = "json"
Run the comprehensive test suite:
# Run all tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Run integration tests
go test -tags=integration ./...
# Run specific test
go test -run TestConcurrentTraverser ./vvfs/filesystem/
The filesystem is optimized for high-performance operations:
- Concurrent Processing - Utilizes all available CPU cores
- Memory-Efficient - Streaming operations for large file sets
- Cache-Optimized - Eytzinger layout for improved cache locality
- Database Performance - Connection pooling and prepared statements
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature
- Write tests for your changes
- Ensure all tests pass:
go test ./...
- Follow conventional commit format for commits
- Submit a pull request
- Code Style - Follow standard Go formatting (
go fmt
) - Testing - Write table-driven tests for new functionality
- Documentation - Update documentation for API changes
- Performance - Include benchmarks for performance-critical code
- Security - Validate inputs and handle errors properly
This project is licensed under the MIT License - see the LICENSE file for details.
- Roaring Bitmaps - For efficient bitmap operations
- llama.cpp - For native GGUF model inference
- go-llama.cpp - Go bindings for llama.cpp
- Turso - For distributed SQLite database
- Go Community - For the excellent standard library and ecosystem
- go-fuse - FUSE filesystem implementation
- bleve - Full-text search library
- badger - Key-value database
For questions and support:
- Open an issue on GitHub
- Check the documentation for detailed guides
- Join our community discussions
Built with ❤️ in Go