removed extensive code and add reranker in retriever in model client

liyin2015 · liyin2015 · commit 6ddd56838e89 · 2024-07-10T23:44:40.000-07:00
diff --git a/docs/source/developer_notes/base_data_class.rst b/docs/source/developer_notes/base_data_class.rst
@@ -300,7 +300,7 @@ The ``exclude`` parameter works the same across all methods.
 
 **DataClassFormatType**
 
-For data class format, we have :class:``core.base_data_class.DataClassFormatType`` along with ``format_class_str`` method to specify the format type for the data format methods.
+For data class format, we have :class:`DataClassFormatType<core.base_data_class.DataClassFormatType>` along with ``format_class_str`` method to specify the format type for the data format methods.
 
 .. code-block:: python
 
diff --git a/docs/source/developer_notes/index.rst b/docs/source/developer_notes/index.rst
@@ -52,6 +52,11 @@ A `Prompt` will work with `DataClass` to ease data interaction with the LLM mode
 A `Retriever` will work with databases to retrieve context and overcome the hallucination and knowledge limitations of LLM, following the paradigm of Retrieval-Augmented Generation (RAG).
 An `Agent` will work with tools and an LLM planner for enhanced ability to reason, plan, and act on real-world tasks.
 
+
+Additionally, what shines in LightRAG is that all orchestrator components, like `Retriever`, `Embedder`, `Generator`, and `Agent`, are model-agnostic.
+You can easily make each component work with different models from different providers by switching out the `ModelClient` and its `model_kwargs`.
+
+
 We will introduce the libraries starting from the core base classes, then move to the RAG essentials, and finally to the agent essentials.
 With these building blocks, we will further introduce optimizing, where the optimizer uses building blocks such as Generator for auto-prompting and retriever for dynamic few-shot in-context learning (ICL).
 
diff --git a/docs/source/developer_notes/model_client.rst b/docs/source/developer_notes/model_client.rst
@@ -6,32 +6,37 @@ ModelClient
 
 ..    `Li Yin <https://github.com/liyin2015>`_
 
-What you will learn?
+.. What you will learn?
 
-1. What is ``ModelClient`` and why is it designed this way?
-2. How to intergrate your own ``ModelClient``?
-3. How to use ``ModelClient`` directly?
+.. 1. What is ``ModelClient`` and why is it designed this way?
+.. 2. How to intergrate your own ``ModelClient``?
+.. 3. How to use ``ModelClient`` directly?
+
+
+:ref:`ModelClient<core-model_client>` is the standardized protocol and base class for all model inference SDKs (either via APIs or local) to communicate with LightRAG internal components.
+Therefore, by switching out the ``ModelClient`` in a ``Generator``, ``Embedder``, or ``Retriever`` (those components that take models), you can make these functional components model-agnostic.
 
-:ref:`ModelClient<core-model_client>` is the standardized protocol and base class for all model inference SDKs (either via APIs or local) to communicate with LightRAG internal components/classes.
-Because so, by switching off ``ModelClient``  in a ``Generator`` or ``Embedder`` component, you can make your prompt or ``Retriever`` model-agnostic.
 
 
 .. figure:: /_static/images/model_client.png
     :align: center
     :alt: ModelClient
     :width: 400px
 
-    The interface to internal components in LightRAG
+    The bridge between all model inference SDKs and internal components in LightRAG
 
 .. note::
 
-    All users are encouraged to customize your own ``ModelClient`` whenever you need to do so. You can refer our code in ``components.model_client`` dir.
+    All users are encouraged to customize their own ``ModelClient`` whenever needed. You can refer to our code in ``components.model_client`` directory.
+
 
 Model Inference SDKs
 ------------------------
-With cloud API providers like OpenAI, Groq, Anthropic, it often comes with a `sync` and an `async` client via their SDKs.
+
+With cloud API providers like OpenAI, Groq, and Anthropic, it often comes with a `sync` and an `async` client via their SDKs.
 For example:
 
+
 .. code-block:: python
 
     from openai import OpenAI, AsyncOpenAI
@@ -42,128 +47,32 @@ For example:
     # sync call using APIs
     response = sync_client.chat.completions.create(...)
 
-For local models, such as using `huggingface transformers`, you need to create this model inference SDKs yourself.
-How you do this is highly flexible. Here is an example to use local embedding model (e.g. ``thenlper/gte-base``) as a model (Refer :class:`components.model_client.transformers_client.TransformerEmbedder` for details).
+For local models, such as using `huggingface transformers`, you need to create these model inference SDKs yourself.
+How you do this is highly flexible.
+Here is an example of using a local embedding model (e.g., ``thenlper/gte-base``) as a model (Refer to :class:`TransformerEmbedder<components.model_client.transformers_client.TransformerEmbedder>` for details).
 It really is just normal model inference code.
 
-.. code-block:: python
-
-    from transformers import AutoTokenizer, AutoModel
-
-    class TransformerEmbedder:
-        models: Dict[str, type] = {}
-
-        def __init__(self, model_name: Optional[str] = "thenlper/gte-base"):
-            super().__init__()
-
-            if model_name is not None:
-                self.init_model(model_name=model_name)
-
-        @lru_cache(None)
-        def init_model(self, model_name: str):
-            try:
-                self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-                self.model = AutoModel.from_pretrained(model_name)
-                # register the model
-                self.models[model_name] = self.model
-
-            except Exception as e:
-                log.error(f"Error loading model {model_name}: {e}")
-                raise e
-
-        def infer_gte_base_embedding(
-            self,
-            input=Union[str, List[str]],
-            tolist: bool = True,
-        ):
-            model = self.models.get("thenlper/gte-base", None)
-            if model is None:
-                # initialize the model
-                self.init_model("thenlper/gte-base")
-
-            if isinstance(input, str):
-                input = [input]
-            # Tokenize the input texts
-            batch_dict = self.tokenizer(
-                input, max_length=512, padding=True, truncation=True, return_tensors="pt"
-            )
-            outputs = model(**batch_dict)
-            embeddings = average_pool(
-                outputs.last_hidden_state, batch_dict["attention_mask"]
-            )
-            # (Optionally) normalize embeddings
-            embeddings = F.normalize(embeddings, p=2, dim=1)
-            if tolist:
-                embeddings = embeddings.tolist()
-            return embeddings
-
-        def __call__(self, **kwargs):
-            if "model" not in kwargs:
-                raise ValueError("model is required")
-            # load files and models, cache it for the next inference
-            model_name = kwargs["model"]
-            # inference the model
-            if model_name == "thenlper/gte-base":
-                return self.infer_gte_base_embedding(kwargs["input"])
-            else:
-                raise ValueError(f"model {model_name} is not supported")
-
-
 
 
 
 ModelClient Protocol
 -----------------------------------------------------------------------------------------------------------
-A model client can be used to manage different types of models, we defined a ``ModelType`` to categorize the model type.
+A model client can be used to manage different types of models, we defined a :class:`ModelType<core.types.ModelType>` to categorize the model type.
 
 .. code-block:: python
 
     class ModelType(Enum):
         EMBEDDER = auto()
         LLM = auto()
+        RERANKER = auto()
         UNDEFINED = auto()
 
-We designed 6 abstract methods in the ``ModelClient`` class to be implemented by the subclass model type.
-We will use :class:`components.model_client.OpenAIClient` along with the above ``TransformerEmbedder`` as examples.
-
-First, we offer two methods to initialize the model SDKs:
-
-.. code-block:: python
-
-    def init_sync_client(self):
-        raise NotImplementedError(
-            f"{type(self).__name__} must implement _init_sync_client method"
-        )
-
-    def init_async_client(self):
-        raise NotImplementedError(
-            f"{type(self).__name__} must implement _init_async_client method"
-        )
+We designed 6 abstract methods in the `ModelClient` class that can be implemented by subclasses to integrate with different model inference SDKs.
+We will use :class:`OpenAIClient<components.model_client.OpenAIClient>` as the cloud API example and :class:`TransformersClient<components.model_client.transformers_client.TransformersClient>` along with the local inference code :class:`TransformerEmbedder<components.model_client.transformers_client.TransformerEmbedder>` as an example for local model clients.
 
-This is how `OpenAIClient` implements these methods along with ``__init__`` method:
-
-.. code-block:: python
 
-    class OpenAIClient(ModelClient):
-
-        def __init__(self, api_key: Optional[str] = None):
-
-            super().__init__()
-            self._api_key = api_key
-            self.sync_client = self.init_sync_client()
-            self.async_client = None  # only initialize if the async call is called
-
-        def init_sync_client(self):
-            api_key = self._api_key or os.getenv("OPENAI_API_KEY")
-            if not api_key:
-                raise ValueError("Environment variable OPENAI_API_KEY must be set")
-            return OpenAI(api_key=api_key)
-
-        def init_async_client(self):
-            api_key = self._api_key or os.getenv("OPENAI_API_KEY")
-            if not api_key:
-                raise ValueError("Environment variable OPENAI_API_KEY must be set")
-            return AsyncOpenAI(api_key=api_key)
+First, we offer two methods, `init_async_client` and `init_sync_client`, for subclasses to initialize the SDK client.
+You can refer to :class:`OpenAIClient<components.model_client.OpenAIClient>` to see how these methods, along with the `__init__` method, are implemented:
 
 This is how ``TransformerClient`` does the same thing:
 
@@ -183,8 +92,7 @@ This is how ``TransformerClient`` does the same thing:
         def init_sync_client(self):
             return TransformerEmbedder()
 
-
-Second. we use `convert_inputs_to_api_kwargs` for subclass to convert LightRAG inputs into the `api_kwargs` (SDKs arguments).
+Second, we use `convert_inputs_to_api_kwargs` for subclasses to convert LightRAG inputs into the `api_kwargs` (SDK arguments).
 
 .. code-block:: python
 
@@ -228,6 +136,15 @@ This is how `OpenAIClient` implements this method:
             raise ValueError(f"model_type {model_type} is not supported")
         return final_model_kwargs
 
+.. For embedding, as `Embedder` takes both `str` and `List[str]` as input, we need to convert the input to a list of strings.
+.. For LLM, as `Generator` takes a `prompt_kwargs` (dict) and converts it into a single string, we need to convert the input to a list of messages.
+.. For Rerankers, you can refer to :class:`CohereAPIClient<components.model_client.cohere_client.CohereAPIClient>` for an example.
+
+
+For embedding, as ``Embedder`` takes both `str` and `List[str]` as input, we need to convert the input to a list of strings that is acceptable by the SDK.
+For LLM, as ``Generator`` will takes a `prompt_kwargs`(dict) and convert it into a single string, thus we need to convert the input to a list of messages.
+For Rerankers, you can refer to :class:`CohereAPIClient<components.model_client.cohere_client.CohereAPIClient>` for an example.
+
 This is how ``TransformerClient`` does the same thing:
 
 .. code-block:: python
@@ -245,37 +162,15 @@ This is how ``TransformerClient`` does the same thing:
             else:
                 raise ValueError(f"model_type {model_type} is not supported")
 
-In addition, you can add any method that parse the SDK specific output to a format compatible with LightRAG components.
-Typically an LLM needs to use `parse_chat_completion` to parse the completion to texts and `parse_embedding_response` to parse the embedding response to a structure LightRAG components can understand.
-
 
-.. code-block:: python
-
-    def parse_chat_completion(self, completion: Any) -> str:
-        raise NotImplementedError(
-            f"{type(self).__name__} must implement parse_chat_completion method"
-        )
+In addition, you can add any method that parses the SDK-specific output to a format compatible with LightRAG components.
+Typically, an LLM needs to use `parse_chat_completion` to parse the completion to text and `parse_embedding_response` to parse the embedding response to a structure that LightRAG components can understand.
+You can refer to :class:`OpenAIClient<components.model_client.openai_client.OpenAIClient>` for API embedding model integration and :class:`TransformersClient<components.model_client.transformers_client.TransformersClient>` for local embedding model integration.
 
-    def parse_embedding_response(self, response: Any) -> EmbedderOutput:
-    r"""Parse the embedding response to a structure LightRAG components can understand."""
-    raise NotImplementedError(
-        f"{type(self).__name__} must implement parse_embedding_response method"
-    )
 
-You can refer to :class:`components.model_client.openai_client.OpenAIClient` for API embedding model integration and :class:`components.model_client.transformers_client.TransformersClient` for local embedding model integration.
+Lastly, the `call` and `acall` methods are used to call model inference via their own arguments.
+We encourage subclasses to provide error handling and retry mechanisms in these methods.
 
-Then `call` and `acall` methods to call Model inference via their own arguments.
-We encourage the subclass provides error handling and retry mechanism in these methods.
-
-.. code-block:: python
-
-    def call(self, api_kwargs: Dict = {}, model_type: ModelType = ModelType.UNDEFINED):
-        raise NotImplementedError(f"{type(self).__name__} must implement _call method")
-
-    async def acall(
-        self, api_kwargs: Dict = {}, model_type: ModelType = ModelType.UNDEFINED
-    ):
-        pass
 
 The `OpenAIClient` example:
 
@@ -296,21 +191,28 @@ The `TransformerClient` example:
     def call(self, api_kwargs: Dict = {}, model_type: ModelType = ModelType.UNDEFINED):
             return self.sync_client(**api_kwargs)
 
-
-Our library currently integrated with 5 providers: OpenAI, Groq, Anthropic, Huggingface, and Google.
+O
+ur library currently integrates with six providers: OpenAI, Groq, Anthropic, Huggingface, Google, and Cohere.
 Please check out :ref:`ModelClient Integration<components-model_client>`.
 
+
+
 Use ModelClient directly
 -----------------------------------------------------------------------------------------------------------
-Though ``ModelClient`` is often managed in a ``Generator`` or ``Embedder`` component, you can use it directly if you ever plan to write your own component.
-Here is an example to use ``OpenAIClient`` directly, first on LLM model:
+
+
+Though ``ModelClient`` is often managed in a ``Generator``, ``Embedder``, or ``Retriever`` component, you can use it directly if you plan to write your own component.
+Here is an example of using ``OpenAIClient`` directly, first on an LLM model:
+
 
 .. code-block:: python
 
     from lightrag.components.model_client import OpenAIClient
     from lightrag.core.types import ModelType
     from lightrag.utils import setup_env
 
+    setup_env()
+
     openai_client = OpenAIClient()
 
     query = "What is the capital of France?"
@@ -361,6 +263,10 @@ The output will be:
     api_kwargs: {'model': 'text-embedding-3-small', 'dimensions': 8, 'encoding_format': 'float', 'input': ['What is the capital of France?', 'What is the capital of France?']}
     reponse_embedder_output: EmbedderOutput(data=[Embedding(embedding=[0.6175549, 0.24047995, 0.4509756, 0.37041178, -0.33437008, -0.050995983, -0.24366009, 0.21549304], index=0), Embedding(embedding=[0.6175549, 0.24047995, 0.4509756, 0.37041178, -0.33437008, -0.050995983, -0.24366009, 0.21549304], index=1)], model='text-embedding-3-small', usage=Usage(prompt_tokens=14, total_tokens=14), error=None, raw_response=None)
 
+
+.. TODO: add optional package introduction here
+
+
 .. admonition:: API reference
    :class: highlight
 
@@ -370,3 +276,4 @@ The output will be:
    - :class:`components.model_client.groq_client.GroqAPIClient`
    - :class:`components.model_client.anthropic_client.AnthropicAPIClient`
    - :class:`components.model_client.google_client.GoogleGenAIClient`
+   - :class:`components.model_client.cohere_client.CohereAPIClient`
diff --git a/docs/source/developer_notes/output_parsers.rst b/docs/source/developer_notes/output_parsers.rst
@@ -1,7 +1,9 @@
 Parser
 =============
 
-In this note, we will explain LightRAG parser and output parsers.
+Parser is the `interpreter` of the LLM output.
+
+
 
 Context
 ----------------
@@ -21,7 +23,6 @@ It is an important step for the LLM applications to interact with the external w
 - to list to support multiple choice selection.
 - to json/yaml  which will be extracted to dict, and optional further to data class instance to support support cases like function calls.
 
-Parsing is the `interpreter` of the LLM output.
 
 Scope and Design
 ------------------
diff --git a/docs/source/get_started/installation.rst b/docs/source/get_started/installation.rst
@@ -53,7 +53,7 @@ Or, you can load it yourself with ``python-dotenv``:
 
 This setup ensures that LightRAG can access all necessary configurations during runtime.
 
-1. Install Optional Packages
+4. Install Optional Packages
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~