This repository provides an optimized implementation of pgvector with SSD-aware enhancements for high-performance vector search. The optimizations focus on improving I/O efficiency, leveraging SSD parallelism, query reordering, and locality-preserving indexing.
This repository consists of 6 branches:
- main: Includes README file.
- pg_orig: Vanilla (original) PostgreSQL + pgvector.
- pg_iou: Asynchronous I/O using IOUring.
- pg_async_iou: Asynchronous I/O + overlapping execution.
- pg_colocation: Partitioning + locality-aware insertion.
- pg_async_iou_colocation: pg_async_iou + pg_colocation
To install the required dependencies, run:
apt install zlib1g-dev flex bison libreadline-dev gdb rsync liburing-dev
This project builds upon the following open-source projects:
- PostgreSQL: https://github.com/postgres/postgres
- pgvector: https://github.com/pgvector/pgvector
For benchmarking ANN search performance, we use:
- ANN-Benchmarks: https://github.com/erikbern/ann-benchmarks
The following datasets are used for evaluation:
- DBpedia: Hugging Face Dataset
- Deep: Skoltech Deep1B
- GloVe: Stanford NLP GloVe
- NYTimes: UCI Bag of Words
- COCO: Skoltech COCO
To reproduce the benchmark experiments:
- Set up PostgreSQL with pgvector following the installation guide with the desired branch.
- Prepare datasets using the links above.
- Run ANN benchmarks to evaluate approximate nearest neighbor (ANN) search performance.
- Run
CREATE INDEX
command to measure indexing performance.