Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
image_tag: ["4.4.1.0-100000032025101610"]
init_sql: ["ALTER SYSTEM ob_vector_memory_limit_percentage = 30; SET GLOBAL ob_query_timeout=100000000;"]
test_filter: ["tests/test_hybrid_search.py::HybridSearchTest"]
Expand Down Expand Up @@ -65,3 +65,25 @@ jobs:
- name: Run tests
run: |
make test TEST_FILTER='${{ matrix.test_filter }}'

test-embedded-seekdb:
name: Test embedded SeekDB
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v6

- name: Install uv
uses: astral-sh/setup-uv@v6
with:
python-version: "3.12"

- name: Install dependencies
run: uv sync --dev

- name: Install pyseekdb (optional dependency for embedded SeekDB)
run: uv pip install pyseekdb

- name: Run embedded SeekDB tests
run: |
uv run python -m pytest tests/test_seekdb_embedded.py -v
73 changes: 72 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ uv sync
pip install pyobvector==0.2.23
```

- for **embedded SeekDB** support (local SeekDB without server):

```shell
pip install pyobvector[pyseekdb]
```

## Build Doc

You can build document locally with `sphinx`:
Expand All @@ -33,10 +39,11 @@ For detailed release notes and changelog, see [RELEASE_NOTES.md](RELEASE_NOTES.m

## Usage

`pyobvector` supports three modes:
`pyobvector` supports four modes:

- `Milvus compatible mode`: You can use the `MilvusLikeClient` class to use vector storage in a way similar to the Milvus API
- `SQLAlchemy hybrid mode`: You can use the vector storage function provided by the `ObVecClient` class and execute the relational database statement with the SQLAlchemy library. In this mode, you can regard `pyobvector` as an extension of SQLAlchemy.
- `Embedded SeekDB mode`: Use `ObVecClient` or `SeekdbRemoteClient` with local embedded SeekDB (no server). Same API as remote: `create_table`, `insert`, `ann_search`, etc. Requires optional dependency: `pip install pyobvector[pyseekdb]`.
- `Hybrid Search mode`: You can use the `HybridSearch` class to perform hybrid search that combines full-text search and vector similarity search, with Elasticsearch-compatible query syntax.

### Milvus compatible mode
Expand Down Expand Up @@ -264,6 +271,70 @@ engine = create_async_engine(connection_str)

- For further usage in pure `SQLAlchemy` mode, please refer to [SQLAlchemy](https://www.sqlalchemy.org/)

### Embedded SeekDB mode

Use the same ObClient/ObVecClient API with **embedded SeekDB** (local file, no server). Install the optional dependency:

```shell
pip install pyobvector[pyseekdb]
```

- connect with path or with an existing `pyseekdb.Client`:

```python
from pyobvector import SeekdbRemoteClient, ObVecClient
from pyobvector.client.ob_client import ObClient

# Option 1: path to SeekDB data directory
client = SeekdbRemoteClient(path="./seekdb_data", database="test")

# Option 2: use an existing pyseekdb.Client
import pyseekdb
pyseekdb_client = pyseekdb.Client(path="./seekdb_data", database="test")
client = SeekdbRemoteClient(pyseekdb_client=pyseekdb_client)

# Option 3: ObVecClient directly
client = ObVecClient(path="./seekdb_data", db_name="test")

assert isinstance(client, ObVecClient)
assert isinstance(client, ObClient)
```

- create table, insert, and ann search (same API as remote):

```python
from sqlalchemy import Column, Integer, VARCHAR
from pyobvector import VECTOR, VectorIndex, l2_distance

client.drop_table_if_exist("vec_table")
client.create_table(
table_name="vec_table",
columns=[
Column("id", Integer, primary_key=True),
Column("title", VARCHAR(255)),
Column("vec", VECTOR(3)),
],
indexes=[VectorIndex("vec_idx", "vec", params="distance=l2, type=hnsw, lib=vsag")],
mysql_organization="heap",
)
client.insert("vec_table", data=[
{"id": 1, "title": "doc A", "vec": [1.0, 1.0, 1.0]},
{"id": 2, "title": "doc B", "vec": [1.0, 2.0, 3.0]},
])
res = client.ann_search(
"vec_table",
vec_data=[1.0, 2.0, 3.0],
vec_column_name="vec",
distance_func=l2_distance,
with_dist=True,
topk=5,
output_column_names=["id", "title"],
)
client.drop_table_if_exist("vec_table")
```

- See `tests/test_seekdb_embedded.py` for more examples.

### Hybrid Search Mode

`pyobvector` supports hybrid search that combines full-text search and vector similarity search, with query syntax compatible with Elasticsearch. This allows you to perform semantic search with both keyword matching and vector similarity in a single query.
Expand Down
1 change: 1 addition & 0 deletions pyobvector/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
from .json_table import OceanBase

__all__ = [
"SeekdbRemoteClient",
"ObVecClient",
"MilvusLikeClient",
"ObVecJsonTableClient",
Expand Down
53 changes: 52 additions & 1 deletion pyobvector/client/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@
2. `SQLAlchemy hybrid mode`: You can use the vector storage function provided by the
`ObVecClient` class and execute the relational database statement with the SQLAlchemy library.
In this mode, you can regard `pyobvector` as an extension of SQLAlchemy.
3. `Embedded SeekDB`: ObClient/ObVecClient support path= or pyseekdb_client= for embedded
SeekDB (pip install pyobvector[pyseekdb]). Same API as remote: create_table, insert, etc.

* ObVecClient MySQL client in SQLAlchemy hybrid mode
* SeekdbRemoteClient Connect to embedded (path= / pyseekdb_client=) or remote; returns ObVecClient
* ObVecClient MySQL/SeekDB client in SQLAlchemy hybrid mode (uri, path, or pyseekdb_client)
* MilvusLikeClient Milvus compatible client
* VecIndexType VecIndexType is used to specify vector index type for MilvusLikeClient
* IndexParam Specify vector index parameters for MilvusLikeClient
Expand All @@ -31,6 +34,9 @@
* FtsIndexParam Full Text Search index parameter
"""

import os
from typing import Any

from .ob_vec_client import ObVecClient
from .milvus_like_client import MilvusLikeClient
from .ob_vec_json_table_client import ObVecJsonTableClient
Expand All @@ -40,7 +46,52 @@
from .partitions import *
from .fts_index_param import FtsParser, FtsIndexParam


def _resolve_password(password: str) -> str:
return password or os.environ.get("SEEKDB_PASSWORD", "")


def SeekdbRemoteClient(
path: str | None = None,
uri: str | None = None,
host: str | None = None,
port: int | None = None,
tenant: str = "test",
database: str = "test",
user: str | None = None,
password: str = "",
pyseekdb_client: Any | None = None,
**kwargs: Any,
) -> Any:
"""
Connect to embedded SeekDB (path= or pyseekdb_client=) or remote OceanBase/SeekDB (uri/host=).
Returns ObVecClient with the same API (create_table, insert, ann_search, etc.).
Embedded requires: pip install pyobvector[pyseekdb]
"""
password = _resolve_password(password)
if pyseekdb_client is not None:
return ObVecClient(pyseekdb_client=pyseekdb_client, **kwargs)
if path is not None:
return ObVecClient(path=path, db_name=database, **kwargs)
if uri is None and host is not None:
port = port if port is not None else 2881
uri = f"{host}:{port}"
if uri is None:
uri = "127.0.0.1:2881"
ob_user = user if user is not None else "root"
if "@" not in ob_user:
ob_user = f"{ob_user}@{tenant}"
return ObVecClient(
uri=uri,
user=ob_user,
password=password,
db_name=database,
**kwargs,
)


__all__ = [
"SeekdbRemoteClient",
"ObVecClient",
"MilvusLikeClient",
"ObVecJsonTableClient",
Expand Down
5 changes: 2 additions & 3 deletions pyobvector/client/collection_schema.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
"""FieldSchema & CollectionSchema definition module to be compatible with Milvus."""

import copy
from typing import Optional
from sqlalchemy import Column
from .schema_type import DataType, convert_datatype_to_sqltype
from .exceptions import *
Expand Down Expand Up @@ -129,8 +128,8 @@ class CollectionSchema:

def __init__(
self,
fields: Optional[list[FieldSchema]] = None,
partitions: Optional[ObPartition] = None,
fields: list[FieldSchema] | None = None,
partitions: ObPartition | None = None,
description: str = "", # ignored in oceanbase
**kwargs,
):
Expand Down
5 changes: 2 additions & 3 deletions pyobvector/client/fts_index_param.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
"""A module to specify fts index parameters"""

from enum import Enum
from typing import Optional, Union


class FtsParser(Enum):
Expand All @@ -28,13 +27,13 @@ def __init__(
self,
index_name: str,
field_names: list[str],
parser_type: Optional[Union[FtsParser, str]] = None,
parser_type: FtsParser | str | None = None,
):
self.index_name = index_name
self.field_names = field_names
self.parser_type = parser_type

def param_str(self) -> Optional[str]:
def param_str(self) -> str | None:
"""Convert parser type to string format for SQL."""
if self.parser_type is None:
return None # Default Space parser, no need to specify
Expand Down
3 changes: 1 addition & 2 deletions pyobvector/client/index_param.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
"""A module to specify vector index parameters for MilvusLikeClient"""

from enum import Enum
from typing import Union


class VecIndexType(Enum):
Expand Down Expand Up @@ -42,7 +41,7 @@ def __init__(
self,
index_name: str,
field_name: str,
index_type: Union[VecIndexType, str],
index_type: VecIndexType | str,
**kwargs,
):
self.index_name = index_name
Expand Down
Loading