Skip to content

Latest commit

 

History

History
47 lines (37 loc) · 2.07 KB

README.md

File metadata and controls

47 lines (37 loc) · 2.07 KB

IO-optimized-pgvector

Overview

This repository provides an optimized implementation of pgvector with SSD-aware enhancements for high-performance vector search. The optimizations focus on improving I/O efficiency, leveraging SSD parallelism, query reordering, and locality-preserving indexing.

Branches

This repository consists of 6 branches:

  • main: Includes README file.
  • pg_orig: Vanilla (original) PostgreSQL + pgvector.
  • pg_iou: Asynchronous I/O using IOUring.
  • pg_async_iou: Asynchronous I/O + overlapping execution.
  • pg_colocation: Partitioning + locality-aware insertion.
  • pg_async_iou_colocation: pg_async_iou + pg_colocation

Prerequisites

To install the required dependencies, run:

apt install zlib1g-dev flex bison libreadline-dev gdb rsync liburing-dev

PostgreSQL & pgvector

This project builds upon the following open-source projects:

ANN-Benchmark

For benchmarking ANN search performance, we use:

Datasets

The following datasets are used for evaluation:

How to Benchmark

To reproduce the benchmark experiments:

  1. Set up PostgreSQL with pgvector following the installation guide with the desired branch.
  2. Prepare datasets using the links above.
  3. Run ANN benchmarks to evaluate approximate nearest neighbor (ANN) search performance.
  4. Run CREATE INDEX command to measure indexing performance.