wizeline
diff --git a/‎.env.dist
Lines changed: 3 additions & 1 deletion b/‎.env.dist
Lines changed: 3 additions & 1 deletion
diff --git a/‎.flake8
Lines changed: 3 additions & 0 deletions b/‎.flake8
Lines changed: 3 additions & 0 deletions
diff --git a/‎.flaskenv
Lines changed: 1 addition & 0 deletions b/‎.flaskenv
Lines changed: 1 addition & 0 deletions
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎Makefile
Lines changed: 14 additions & 0 deletions b/‎Makefile
Lines changed: 14 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 36 additions & 24 deletions b/‎README.md
Lines changed: 36 additions & 24 deletions
diff --git a/‎config.py
Lines changed: 20 additions & 2 deletions b/‎config.py
Lines changed: 20 additions & 2 deletions
diff --git a/‎controller/controller.py
Lines changed: 0 additions & 28 deletions b/‎controller/controller.py
Lines changed: 0 additions & 28 deletions
diff --git a/‎controller/__init__.py renamed to ‎core/abstracts/__init__.py b/‎controller/__init__.py renamed to ‎core/abstracts/__init__.py
diff --git a/‎core/abstracts/controller.py
Lines changed: 21 additions & 0 deletions b/‎core/abstracts/controller.py
Lines changed: 21 additions & 0 deletions
diff --git a/‎core/abstracts/services.py
Lines changed: 46 additions & 0 deletions b/‎core/abstracts/services.py
Lines changed: 46 additions & 0 deletions
diff --git a/‎core/abstracts/usescases.py
Lines changed: 22 additions & 0 deletions b/‎core/abstracts/usescases.py
Lines changed: 22 additions & 0 deletions
diff --git a/‎service/__init__.py renamed to ‎core/controller/__init__.py b/‎service/__init__.py renamed to ‎core/controller/__init__.py
diff --git a/‎core/controller/vector.py
Lines changed: 47 additions & 0 deletions b/‎core/controller/vector.py
Lines changed: 47 additions & 0 deletions
diff --git a/‎usecase/__init__.py renamed to ‎core/service/__init__.py b/‎usecase/__init__.py renamed to ‎core/service/__init__.py
@@ -4,4 +4,6 @@ S3_BUCKET = "clone-ingestion-messages"
 OPENSEARCH_INDEX = "clone-vector-index"
 OPENSEARCH_CLUSTER_URL = "https://"
 IS_LOCAL = True
-HF_HOME=/tmp/
+HF_HOME = /tmp/
+OPENSEARCH_USER = "user"
+OPENSEARCH_PASS = "pass"
@@ -0,0 +1,3 @@
+[flake8]
+extend-ignore=E203,E501
+extend-exclude= env/
@@ -1,2 +1,3 @@
 FLASK_APP = "main"
 FLASK_RUN_PORT = "8000"
+FLASK_DEBUG=true
@@ -158,3 +158,4 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+.envrc
@@ -0,0 +1,14 @@
+lint:
+	@pip install black isort flake8
+	@echo "\n--->Sorting imports"
+	@isort .
+	@echo "\n----->Formating code"
+	@black .
+	@echo "\n------>Linting code"
+	@flake8 .
+
+test:
+	@pip3 install coverage pytest
+	@coverage run -m pytest
+	@coverage report
+.PHONY: test 
@@ -1,31 +1,38 @@
 # Clone Vector Search
+
 ## Overview
 
 This service provides endpoints for handling and vectorizing S3 objects, and consume and populate an OpenSearch index, inspired by hexagonal architecture principles.
 
 ## Table of Contents
-* [Project Structure](#project-structure)
-* [Tech Stack](#tech-stack)
-* [Installation](#installation)
-* [Running the service](#running-the-service)
-* [Building the Docker image](#building-the-docker-image)
-* [Code Contribution](#code-contribution)
+
+- [Clone Vector Search](#clone-vector-search)
+  - [Overview](#overview)
+  - [Table of Contents](#table-of-contents)
+  - [Project structure](#project-structure)
+  - [Tech Stack](#tech-stack)
+  - [Installation](#installation)
+  - [Running the Service](#running-the-service)
+  - [Building the Docker Image](#building-the-docker-image)
+  - [Code Contribution](#code-contribution)
 
 ## Project structure
 
-* **service**: Contains the third party services access logic.
-* **usecase**: Contains business logic layer.
-* **controller**: Contains the Flask API endpoint handlers.
+- **service**: Contains the third party services access logic.
+- **usecase**: Contains business logic layer.
+- **controller**: Contains the Flask API endpoint handlers.
 [⇧ back to top](#table-of-contents)
 
 ## Tech Stack
-* Python
-* Flask
-* boto3
-* Llama-Index
+
+- Python
+- Flask
+- boto3
+- Llama-Index
 [⇧ back to top](#table-of-contents)
 
 ## Installation
+
 1. Clone the repository
 
 ```Bash
@@ -44,39 +51,44 @@ source env/bin/activate
 ```Bash
 pip install -r requirements.txt
 ```
+
 [⇧ back to top](#table-of-contents)
 
 ## Running the Service
+
 1. Set Environment Variables (if applicable) in [.env](.env) and [.flaskenv](.flaskenv) files:
 2. Create the opensearch index. The application will create the needed mapping.
-3. In order to run this service locally, you'll need localstack in order to mock some AWS Services. 
-   * Once you have localstack installed and running, create a `clone-ingestion-messages` bucket:
+3. In order to run this service locally, you'll need localstack in order to mock some AWS Services.
+   - Once you have localstack installed and running, create a `clone-ingestion-messages` bucket:
    `aws --endpoint-url=http://localhost:4566 s3 mb s3://clone-ingestion-messages`
-   * Add the required test files by running:
+   - Add the required test files by running:
    `aws --endpoint-url=http://localhost:4566 s3 cp /path/to/your/file/filename.json s3://clone-ingestion-messages/key/to/file.json`
 4. Start the Flask Server:
 
 ```Bash
 flask run
 ```
+
 [⇧ back to top](#table-of-contents)
 
 ## Building the Docker Image
+
 ```Bash
 docker compose up --build
 ```
+
 [⇧ back to top](#table-of-contents)
 
 ## Code Contribution
 
 Ensure you adhere to the following conventions when working with code in the Clone Vector Search project:
 
-* **Relate every commit to a ticket**: If the commit is not related to a ticket, the branch name contains the related ticket.
-* **Work on one feature for each PR**: Do not crowd unrelated features in one PR.
-* **Every line of code in your commits must be production-ready**: Do not create incomplete, work-in-progress commits.
-* **Ensure the branching strategy is simple**:
-  * Create a feature branch and then merge it with the main branch.
-  * Do not create extra branches beside the feature or fix branches to merge with the main.
-  * Remove any feature or fix branches after you merge the changes.
+- **Relate every commit to a ticket**: If the commit is not related to a ticket, the branch name contains the related ticket.
+- **Work on one feature for each PR**: Do not crowd unrelated features in one PR.
+- **Every line of code in your commits must be production-ready**: Do not create incomplete, work-in-progress commits.
+- **Ensure the branching strategy is simple**:
+  - Create a feature branch and then merge it with the main branch.
+  - Do not create extra branches beside the feature or fix branches to merge with the main.
+  - Remove any feature or fix branches after you merge the changes.
 
-[⇧ back to top](#table-of-contents)
+[⇧ back to top](#table-of-contents)
@@ -1,27 +1,45 @@
 from os import environ
+
 from dotenv import load_dotenv
 
 load_dotenv()
 
 
+IS_LOCAL = environ.get("IS_LOCAL")
+
+
 class Config:
+    """
+    Configuration class
+    """
+
     DEBUG = environ.get("DEBUG")
     LOG_LEVEL = environ.get("LOG_LEVEL")
     S3_BUCKET = environ.get("S3_BUCKET")
     OPENSEARCH_INDEX = environ.get("OPENSEARCH_INDEX")
     OPENSEARCH_CLUSTER_URL = environ.get("OPENSEARCH_CLUSTER_URL")
     IS_LOCAL = environ.get("IS_LOCAL")
     S3_URL = None
+    S3_INDEX_PATH = environ.get("S3_INDEX_PATH")
+    OPENSEARCH_USER = environ.get("OPENSEARCH_USER")
+    OPENSEARCH_PASS = environ.get("OPENSEARCH_PASS")
 
 
 class DevelopmentConfig(Config):
+    """
+    Development configuration
+    """
+
+    DEBUG = True
     LOG_LEVEL = "DEBUG"
     OPENSEARCH_CLUSTER_URL = "http://host.docker.internal:9200"
     OPENSEARCH_INDEX = "clone-vector-index"
-    S3_BUCKET = "clone-ingestion-messages"
+    OPENSEARCH_USER = "clonAISearch"
+    OPENSEARCH_PASS = "user"
+    S3_BUCKET = "pass"
     IS_LOCAL = True
     S3_URL = "http://host.docker.internal:4566"
     AWS_ACCESS_KEY_ID = "test"
     AWS_SECRET_ACCESS_KEY = "test"
     AWS_DEFAULT_REGION = "us-east-1"
-
+    S3_INDEX_PATH = "/indexes"
@@ -0,0 +1,21 @@
+from abc import ABC, abstractmethod
+from typing import Any, Dict, Tuple
+
+
+class AbstractVectorController(ABC):
+    """
+    Abstract class for main controller class.
+    """
+
+    @abstractmethod
+    def vectoring(self, request: Dict[str, Any]) -> Tuple[Dict[str, str], int]:
+        """
+        Abstract method to handle vectorization requests.
+
+        Args:
+            request (Dict[str, Any]): Request body.
+
+        Returns:
+            Tuple[Dict[str, str], int]: Tuple containing a JSON response indicating success or failure of the vectorization process and an HTTP status code.
+        """
+        pass
@@ -0,0 +1,46 @@
+from abc import ABC, abstractmethod
+
+
+# Abstract base class for S3 service
+class AbstractS3Service(ABC):
+    """
+    Abstract class for s3 services.
+    """
+
+    @abstractmethod
+    def get_object(self, bucket_name: str, object_key: str) -> dict:
+        """
+        Abstract method to get an object from S3.
+
+        Args:
+            bucket_name (str): Name of the S3 bucket.
+            object_key (str): Key of the object in the S3 bucket.
+
+        Returns:
+            dict: Dictionary containing the loaded JSON content of the S3 object.
+        """
+        pass
+
+
+class AbstractLlamaIndexService(ABC):
+    """
+    Abstract class for llama index services.
+    """
+
+    @abstractmethod
+    def vector_store_index(
+        self, twin_id: str, source_name: str, file_uuid: str, documents: list
+    ) -> str:
+        """
+        Abstract method to indexing documents and store vectors in OpenSearch.
+
+        Args:
+            twin_id (str): Identifier for the twin.
+            source_name (str): Name of the data source.
+            file_uuid (str): UUID of the file containing the documents.
+            documents (list): List of dictionaries representing documents.
+
+        Returns:
+            str: Index summary
+        """
+        pass
@@ -0,0 +1,22 @@
+from abc import ABC, abstractmethod
+
+
+class AbstractVectorizeUsecase(ABC):
+    """
+    Abstract class for use vectorize use cases.
+
+    """
+
+    @abstractmethod
+    def vectorize_and_index(self, bucket_name: str, object_key: str) -> str:
+        """
+        Abstract method to vectorize and index documents.
+
+        Args:
+            bucket_name (str): Name of the S3 bucket containing the document.
+            object_key (str): Key of the document object in the S3 bucket.
+
+        Returns:
+            str: The indexed document.
+        """
+        pass
@@ -0,0 +1,47 @@
+from logging import Logger
+from typing import Any, Dict, Tuple
+
+from flask import jsonify
+
+from core.abstracts.controller import AbstractVectorController
+from core.abstracts.usescases import AbstractVectorizeUsecase
+
+
+class VectorController(AbstractVectorController):
+    """
+    Controller for vectorization operations.
+    """
+
+    def __init__(self, usecase: AbstractVectorizeUsecase, logger: Logger):
+        """
+        Initialize the Controller.
+
+        Args:
+            usecase (AbstractUsecase): An instance of a class implementing the AbstractUsecase interface.
+        """
+        self.usecase = usecase
+        self.logger = logger
+
+    def vectoring(self, request: Dict[str, Any]) -> Tuple[Dict[str, str], int]:
+        """
+        Handle vectorization requests.
+
+        This method expects a POST request with JSON data containing S3 bucket and object key information.
+        It delegates vectorization and indexing tasks to the use case, and returns appropriate responses.
+
+        Args:
+            request (Dict[str, Any]): Request body.
+
+        Returns:
+            Tuple[Dict[str, str], int]: Tuple containing a JSON response indicating success or failure of the vectorization process and an HTTP status code.
+        """
+        record = request["Records"][0]["s3"]
+        s3_bucket = record["bucket"]["name"]
+        s3_object_key = record["object"]["key"]
+
+        try:
+            self.usecase.vectorize_and_index(s3_bucket, s3_object_key)
+            return jsonify({"message": "Object vectorization succeeded!"}), 200
+        except Exception as e:
+            self.logger.error(f"Failed to vectorize object {s3_object_key}")
+            return jsonify({"error": str(e)}), 500
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+[flake8]`
	`2`	`+extend-ignore=E203,E501`
	`3`	`+extend-exclude= env/`
Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,3 @@`
`1`	`1`	`FLASK_APP = "main"`
`2`	`2`	`FLASK_RUN_PORT = "8000"`
	`3`	`+FLASK_DEBUG=true`