@@ -22,13 +22,53 @@ We have a number of precomputed data sets. All data sets have been pre-split int
22
22
23
23
| Dataset | Dimensions | Train size | Test size | Neighbors | Distance |
24
24
| ----------------------------------------------------------------------------------------------------------- | ---------: | ---------: | --------: | --------: | --------- |
25
- | [ LAION-1M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 1,000,000 | 10,000 | 100 | Angular |
26
- | [ LAION-10M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 10,000,000 | 10,000 | 100 | Angular |
27
- | [ LAION-20M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 20,000,000 | 10,000 | 100 | Angular |
28
- | [ LAION-40M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 40,000,000 | 10,000 | 100 | Angular |
29
- | [ LAION-100M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 100,000,000 | 10,000 | 100 | Angular |
30
- | [ LAION-200M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 200,000,000 | 10,000 | 100 | Angular |
31
- | [ LAION-400M: from LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 400,000,000 | 10,000 | 100 | Angular |
25
+ | ** LAION Image Embeddings (512D)** | | | | | |
26
+ | [ LAION-1M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 1,000,000 | 10,000 | 100 | Cosine |
27
+ | [ LAION-10M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 10,000,000 | 10,000 | 100 | Cosine |
28
+ | [ LAION-20M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 20,000,000 | 10,000 | 100 | Cosine |
29
+ | [ LAION-40M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 40,000,000 | 10,000 | 100 | Cosine |
30
+ | [ LAION-100M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 100,000,000 | 10,000 | 100 | Cosine |
31
+ | [ LAION-200M: subset of LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 200,000,000 | 10,000 | 100 | Cosine |
32
+ | [ LAION-400M: from LAION 400M English (image embedings)] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 400,000,000 | 10,000 | 100 | Cosine |
33
+ | ** LAION Image Embeddings (768D)** | | | | | |
34
+ | [ LAION-1M: 768D image embeddings] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 768 | 1,000,000 | 10,000 | 100 | Cosine |
35
+ | [ LAION-1B: 768D image embeddings] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 768 | 1,000,000,000| 10,000 | 100 | Cosine |
36
+ | ** Standard Benchmarks** | | | | | |
37
+ | [ GloVe-25: Word vectors] ( http://ann-benchmarks.com ) | 25 | 1,183,514 | 10,000 | 100 | Cosine |
38
+ | [ GloVe-100: Word vectors] ( http://ann-benchmarks.com ) | 100 | 1,183,514 | 10,000 | 100 | Cosine |
39
+ | [ Deep Image-96: CNN image features] ( http://ann-benchmarks.com ) | 96 | 9,990,000 | 10,000 | 100 | Cosine |
40
+ | [ GIST-960: Image descriptors] ( http://ann-benchmarks.com ) | 960 | 1,000,000 | 1,000 | 100 | L2 |
41
+ | ** Text and Knowledge Embeddings** | | | | | |
42
+ | [ DBpedia OpenAI-1M: Knowledge embeddings] ( https://www.dbpedia.org/ ) | 1,536 | 1,000,000 | 10,000 | 100 | Cosine |
43
+ | [ LAION Small CLIP: Small CLIP embeddings] ( https://laion.ai/blog/laion-400-open-dataset/ ) | 512 | 100,000 | 1,000 | 100 | Cosine |
44
+ | ** Yandex Datasets** | | | | | |
45
+ | [ Yandex T2I: Text-to-image embeddings] ( https://research.yandex.com/ ) | 200 | 1,000,000 | 100,000 | 100 | Dot |
46
+ | ** Random and Synthetic** | | | | | |
47
+ | Random-100: Small synthetic dataset | 100 | 100 | 9 | 9 | Cosine |
48
+ | Random-100-Euclidean: Small synthetic dataset | 100 | 100 | 9 | 9 | L2 |
49
+ | ** Filtered Search Datasets** | | | | | |
50
+ | H&M-2048: Fashion product embeddings (with filters) | 2,048 | 105,542 | 2,000 | 100 | Cosine |
51
+ | H&M-2048: Fashion product embeddings (no filters) | 2,048 | 105,542 | 2,000 | 100 | Cosine |
52
+ | ArXiv-384: Academic paper embeddings (with filters) | 384 | 2,205,995 | 10,000 | 100 | Cosine |
53
+ | ArXiv-384: Academic paper embeddings (no filters) | 384 | 2,205,995 | 10,000 | 100 | Cosine |
54
+ | Random Match Keyword-100: Synthetic keyword matching (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
55
+ | Random Match Keyword-100: Synthetic keyword matching (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
56
+ | Random Match Int-100: Synthetic integer matching (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
57
+ | Random Match Int-100: Synthetic integer matching (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
58
+ | Random Range-100: Synthetic range queries (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
59
+ | Random Range-100: Synthetic range queries (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
60
+ | Random Geo Radius-100: Synthetic geo queries (with filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
61
+ | Random Geo Radius-100: Synthetic geo queries (no filters) | 100 | 1,000,000 | 10,000 | 100 | Cosine |
62
+ | Random Match Keyword-2048: Large synthetic keyword matching (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
63
+ | Random Match Keyword-2048: Large synthetic keyword matching (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
64
+ | Random Match Int-2048: Large synthetic integer matching (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
65
+ | Random Match Int-2048: Large synthetic integer matching (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
66
+ | Random Range-2048: Large synthetic range queries (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
67
+ | Random Range-2048: Large synthetic range queries (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
68
+ | Random Geo Radius-2048: Large synthetic geo queries (with filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
69
+ | Random Geo Radius-2048: Large synthetic geo queries (no filters) | 2,048 | 100,000 | 1,000 | 100 | Cosine |
70
+ | Random Match Keyword Small Vocab-256: Small vocabulary keyword matching (with filters) | 256 | 1,000,000 | 10,000 | 100 | Cosine |
71
+ | Random Match Keyword Small Vocab-256: Small vocabulary keyword matching (no filters) | 256 | 1,000,000 | 10,000 | 100 | Cosine |
32
72
33
73
34
74
## 🐳 Docker Usage
@@ -39,41 +79,43 @@ The easiest way to run vector-db-benchmark is using Docker. We provide pre-built
39
79
40
80
``` bash
41
81
# Pull the latest image
42
- docker pull redis-performance /vector-db-benchmark:latest
82
+ docker pull filipe958 /vector-db-benchmark:latest
43
83
44
84
# Run with help
45
- docker run --rm redis-performance /vector-db-benchmark:latest run.py --help
85
+ docker run --rm filipe958 /vector-db-benchmark:latest run.py --help
46
86
47
87
# Basic Redis benchmark with local Redis
48
- docker run --rm --network=host redis-performance /vector-db-benchmark:latest \
49
- run.py --host localhost --engines redis --dataset random-100 --experiment redis-m-16-ef-64
88
+ docker run --rm --network=host filipe958 /vector-db-benchmark:latest \
89
+ run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
50
90
51
91
# With results output (mount current directory)
52
92
docker run --rm -v $( pwd) /results:/app/results --network=host \
53
- redis-performance /vector-db-benchmark:latest \
54
- run.py --host localhost --engines redis --dataset random-100 --experiment redis-m-16-ef-64
93
+ filipe958 /vector-db-benchmark:latest \
94
+ run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
55
95
```
56
96
57
- ### Using Docker Compose
97
+ ### Using with Redis
58
98
59
- For a complete setup with Redis included :
99
+ For testing with Redis, start a Redis container first :
60
100
61
101
``` bash
62
- # Start Redis
63
- docker-compose up redis
102
+ # Start Redis container
103
+ docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm
64
104
65
105
# Run benchmark against Redis
66
- docker-compose run --rm vector-db-benchmark run.py --host redis --engines redis --dataset random-100 --experiment redis-m-16-ef-64
106
+ docker run --rm --network=host filipe958/vector-db-benchmark:latest \
107
+ run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
67
108
68
109
# Or use the convenience script
69
- ./docker-run.sh -H redis -e redis -d random-100 -x redis-m-16-ef-64
110
+ ./docker-run.sh -H localhost -e redis -d random-100 -x redis-default-simple
111
+
112
+ # Clean up Redis container when done
113
+ docker stop redis-test && docker rm redis-test
70
114
```
71
115
72
116
### Available Docker Images
73
117
74
- - ** Latest** : ` redis-performance/vector-db-benchmark:latest `
75
- - ** Specific versions** : ` redis-performance/vector-db-benchmark:v1.0.0 `
76
- - ** Development builds** : ` redis-performance/vector-db-benchmark:update-redisearch-{sha} `
118
+ - ** Latest** : ` filipe958/vector-db-benchmark:latest `
77
119
78
120
For detailed Docker setup and publishing information, see [ DOCKER_SETUP.md] ( DOCKER_SETUP.md ) .
79
121
0 commit comments