Skip to content

Commit 5008989

Browse files
feat: add opensearch auth configuration and refactor modules (#4)
* feat: linting code * feat: refactor * feat: add open search auth * chore: complement documentation * chore: fix default values
1 parent 3a91ff9 commit 5008989

29 files changed

+486
-203
lines changed

.env.dist

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,6 @@ S3_BUCKET = "clone-ingestion-messages"
44
OPENSEARCH_INDEX = "clone-vector-index"
55
OPENSEARCH_CLUSTER_URL = "https://"
66
IS_LOCAL = True
7-
HF_HOME=/tmp/
7+
HF_HOME = /tmp/
8+
OPENSEARCH_USER = "user"
9+
OPENSEARCH_PASS = "pass"

.flake8

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[flake8]
2+
extend-ignore=E203,E501
3+
extend-exclude= env/

.flaskenv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
FLASK_APP = "main"
22
FLASK_RUN_PORT = "8000"
3+
FLASK_DEBUG=true

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,3 +158,4 @@ cython_debug/
158158
# and can be added to the global gitignore or merged into this file. For a more nuclear
159159
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160160
#.idea/
161+
.envrc

Makefile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
lint:
2+
@pip install black isort flake8
3+
@echo "\n--->Sorting imports"
4+
@isort .
5+
@echo "\n----->Formating code"
6+
@black .
7+
@echo "\n------>Linting code"
8+
@flake8 .
9+
10+
test:
11+
@pip3 install coverage pytest
12+
@coverage run -m pytest
13+
@coverage report
14+
.PHONY: test

README.md

Lines changed: 36 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,38 @@
11
# Clone Vector Search
2+
23
## Overview
34

45
This service provides endpoints for handling and vectorizing S3 objects, and consume and populate an OpenSearch index, inspired by hexagonal architecture principles.
56

67
## Table of Contents
7-
* [Project Structure](#project-structure)
8-
* [Tech Stack](#tech-stack)
9-
* [Installation](#installation)
10-
* [Running the service](#running-the-service)
11-
* [Building the Docker image](#building-the-docker-image)
12-
* [Code Contribution](#code-contribution)
8+
9+
- [Clone Vector Search](#clone-vector-search)
10+
- [Overview](#overview)
11+
- [Table of Contents](#table-of-contents)
12+
- [Project structure](#project-structure)
13+
- [Tech Stack](#tech-stack)
14+
- [Installation](#installation)
15+
- [Running the Service](#running-the-service)
16+
- [Building the Docker Image](#building-the-docker-image)
17+
- [Code Contribution](#code-contribution)
1318

1419
## Project structure
1520

16-
* **service**: Contains the third party services access logic.
17-
* **usecase**: Contains business logic layer.
18-
* **controller**: Contains the Flask API endpoint handlers.
21+
- **service**: Contains the third party services access logic.
22+
- **usecase**: Contains business logic layer.
23+
- **controller**: Contains the Flask API endpoint handlers.
1924
[⇧ back to top](#table-of-contents)
2025

2126
## Tech Stack
22-
* Python
23-
* Flask
24-
* boto3
25-
* Llama-Index
27+
28+
- Python
29+
- Flask
30+
- boto3
31+
- Llama-Index
2632
[⇧ back to top](#table-of-contents)
2733

2834
## Installation
35+
2936
1. Clone the repository
3037

3138
```Bash
@@ -44,39 +51,44 @@ source env/bin/activate
4451
```Bash
4552
pip install -r requirements.txt
4653
```
54+
4755
[⇧ back to top](#table-of-contents)
4856

4957
## Running the Service
58+
5059
1. Set Environment Variables (if applicable) in [.env](.env) and [.flaskenv](.flaskenv) files:
5160
2. Create the opensearch index. The application will create the needed mapping.
52-
3. In order to run this service locally, you'll need localstack in order to mock some AWS Services.
53-
* Once you have localstack installed and running, create a `clone-ingestion-messages` bucket:
61+
3. In order to run this service locally, you'll need localstack in order to mock some AWS Services.
62+
- Once you have localstack installed and running, create a `clone-ingestion-messages` bucket:
5463
`aws --endpoint-url=http://localhost:4566 s3 mb s3://clone-ingestion-messages`
55-
* Add the required test files by running:
64+
- Add the required test files by running:
5665
`aws --endpoint-url=http://localhost:4566 s3 cp /path/to/your/file/filename.json s3://clone-ingestion-messages/key/to/file.json`
5766
4. Start the Flask Server:
5867

5968
```Bash
6069
flask run
6170
```
71+
6272
[⇧ back to top](#table-of-contents)
6373

6474
## Building the Docker Image
75+
6576
```Bash
6677
docker compose up --build
6778
```
79+
6880
[⇧ back to top](#table-of-contents)
6981

7082
## Code Contribution
7183

7284
Ensure you adhere to the following conventions when working with code in the Clone Vector Search project:
7385

74-
* **Relate every commit to a ticket**: If the commit is not related to a ticket, the branch name contains the related ticket.
75-
* **Work on one feature for each PR**: Do not crowd unrelated features in one PR.
76-
* **Every line of code in your commits must be production-ready**: Do not create incomplete, work-in-progress commits.
77-
* **Ensure the branching strategy is simple**:
78-
* Create a feature branch and then merge it with the main branch.
79-
* Do not create extra branches beside the feature or fix branches to merge with the main.
80-
* Remove any feature or fix branches after you merge the changes.
86+
- **Relate every commit to a ticket**: If the commit is not related to a ticket, the branch name contains the related ticket.
87+
- **Work on one feature for each PR**: Do not crowd unrelated features in one PR.
88+
- **Every line of code in your commits must be production-ready**: Do not create incomplete, work-in-progress commits.
89+
- **Ensure the branching strategy is simple**:
90+
- Create a feature branch and then merge it with the main branch.
91+
- Do not create extra branches beside the feature or fix branches to merge with the main.
92+
- Remove any feature or fix branches after you merge the changes.
8193

82-
[⇧ back to top](#table-of-contents)
94+
[⇧ back to top](#table-of-contents)

config.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,45 @@
11
from os import environ
2+
23
from dotenv import load_dotenv
34

45
load_dotenv()
56

67

8+
IS_LOCAL = environ.get("IS_LOCAL")
9+
10+
711
class Config:
12+
"""
13+
Configuration class
14+
"""
15+
816
DEBUG = environ.get("DEBUG")
917
LOG_LEVEL = environ.get("LOG_LEVEL")
1018
S3_BUCKET = environ.get("S3_BUCKET")
1119
OPENSEARCH_INDEX = environ.get("OPENSEARCH_INDEX")
1220
OPENSEARCH_CLUSTER_URL = environ.get("OPENSEARCH_CLUSTER_URL")
1321
IS_LOCAL = environ.get("IS_LOCAL")
1422
S3_URL = None
23+
S3_INDEX_PATH = environ.get("S3_INDEX_PATH")
24+
OPENSEARCH_USER = environ.get("OPENSEARCH_USER")
25+
OPENSEARCH_PASS = environ.get("OPENSEARCH_PASS")
1526

1627

1728
class DevelopmentConfig(Config):
29+
"""
30+
Development configuration
31+
"""
32+
33+
DEBUG = True
1834
LOG_LEVEL = "DEBUG"
1935
OPENSEARCH_CLUSTER_URL = "http://host.docker.internal:9200"
2036
OPENSEARCH_INDEX = "clone-vector-index"
21-
S3_BUCKET = "clone-ingestion-messages"
37+
OPENSEARCH_USER = "clonAISearch"
38+
OPENSEARCH_PASS = "user"
39+
S3_BUCKET = "pass"
2240
IS_LOCAL = True
2341
S3_URL = "http://host.docker.internal:4566"
2442
AWS_ACCESS_KEY_ID = "test"
2543
AWS_SECRET_ACCESS_KEY = "test"
2644
AWS_DEFAULT_REGION = "us-east-1"
27-
45+
S3_INDEX_PATH = "/indexes"

controller/controller.py

Lines changed: 0 additions & 28 deletions
This file was deleted.
File renamed without changes.

core/abstracts/controller.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from abc import ABC, abstractmethod
2+
from typing import Any, Dict, Tuple
3+
4+
5+
class AbstractVectorController(ABC):
6+
"""
7+
Abstract class for main controller class.
8+
"""
9+
10+
@abstractmethod
11+
def vectoring(self, request: Dict[str, Any]) -> Tuple[Dict[str, str], int]:
12+
"""
13+
Abstract method to handle vectorization requests.
14+
15+
Args:
16+
request (Dict[str, Any]): Request body.
17+
18+
Returns:
19+
Tuple[Dict[str, str], int]: Tuple containing a JSON response indicating success or failure of the vectorization process and an HTTP status code.
20+
"""
21+
pass

core/abstracts/services.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
from abc import ABC, abstractmethod
2+
3+
4+
# Abstract base class for S3 service
5+
class AbstractS3Service(ABC):
6+
"""
7+
Abstract class for s3 services.
8+
"""
9+
10+
@abstractmethod
11+
def get_object(self, bucket_name: str, object_key: str) -> dict:
12+
"""
13+
Abstract method to get an object from S3.
14+
15+
Args:
16+
bucket_name (str): Name of the S3 bucket.
17+
object_key (str): Key of the object in the S3 bucket.
18+
19+
Returns:
20+
dict: Dictionary containing the loaded JSON content of the S3 object.
21+
"""
22+
pass
23+
24+
25+
class AbstractLlamaIndexService(ABC):
26+
"""
27+
Abstract class for llama index services.
28+
"""
29+
30+
@abstractmethod
31+
def vector_store_index(
32+
self, twin_id: str, source_name: str, file_uuid: str, documents: list
33+
) -> str:
34+
"""
35+
Abstract method to indexing documents and store vectors in OpenSearch.
36+
37+
Args:
38+
twin_id (str): Identifier for the twin.
39+
source_name (str): Name of the data source.
40+
file_uuid (str): UUID of the file containing the documents.
41+
documents (list): List of dictionaries representing documents.
42+
43+
Returns:
44+
str: Index summary
45+
"""
46+
pass

core/abstracts/usescases.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
from abc import ABC, abstractmethod
2+
3+
4+
class AbstractVectorizeUsecase(ABC):
5+
"""
6+
Abstract class for use vectorize use cases.
7+
8+
"""
9+
10+
@abstractmethod
11+
def vectorize_and_index(self, bucket_name: str, object_key: str) -> str:
12+
"""
13+
Abstract method to vectorize and index documents.
14+
15+
Args:
16+
bucket_name (str): Name of the S3 bucket containing the document.
17+
object_key (str): Key of the document object in the S3 bucket.
18+
19+
Returns:
20+
str: The indexed document.
21+
"""
22+
pass
File renamed without changes.

core/controller/vector.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
from logging import Logger
2+
from typing import Any, Dict, Tuple
3+
4+
from flask import jsonify
5+
6+
from core.abstracts.controller import AbstractVectorController
7+
from core.abstracts.usescases import AbstractVectorizeUsecase
8+
9+
10+
class VectorController(AbstractVectorController):
11+
"""
12+
Controller for vectorization operations.
13+
"""
14+
15+
def __init__(self, usecase: AbstractVectorizeUsecase, logger: Logger):
16+
"""
17+
Initialize the Controller.
18+
19+
Args:
20+
usecase (AbstractUsecase): An instance of a class implementing the AbstractUsecase interface.
21+
"""
22+
self.usecase = usecase
23+
self.logger = logger
24+
25+
def vectoring(self, request: Dict[str, Any]) -> Tuple[Dict[str, str], int]:
26+
"""
27+
Handle vectorization requests.
28+
29+
This method expects a POST request with JSON data containing S3 bucket and object key information.
30+
It delegates vectorization and indexing tasks to the use case, and returns appropriate responses.
31+
32+
Args:
33+
request (Dict[str, Any]): Request body.
34+
35+
Returns:
36+
Tuple[Dict[str, str], int]: Tuple containing a JSON response indicating success or failure of the vectorization process and an HTTP status code.
37+
"""
38+
record = request["Records"][0]["s3"]
39+
s3_bucket = record["bucket"]["name"]
40+
s3_object_key = record["object"]["key"]
41+
42+
try:
43+
self.usecase.vectorize_and_index(s3_bucket, s3_object_key)
44+
return jsonify({"message": "Object vectorization succeeded!"}), 200
45+
except Exception as e:
46+
self.logger.error(f"Failed to vectorize object {s3_object_key}")
47+
return jsonify({"error": str(e)}), 500
File renamed without changes.

0 commit comments

Comments
 (0)