codefuse-ai
diff --git a/Diff for: ‎README.md
+142-85 b/Diff for: ‎README.md
+142-85
diff --git a/Diff for: ‎README_CN.md
+95-48 b/Diff for: ‎README_CN.md
+95-48
diff --git a/Diff for: ‎docs/1.what-is-model-cache.md
+132 b/Diff for: ‎docs/1.what-is-model-cache.md
+132
diff --git a/Diff for: ‎docs/2.model-cache-features.md
+29 b/Diff for: ‎docs/2.model-cache-features.md
+29
@@ -15,72 +15,98 @@ ModelCache
 </div>
 
 ## Contents
+
+- [Contents](#contents)
 - [新闻](#新闻)
 - [项目简介](#项目简介)
+- [架构大图](#架构大图)
 - [快速部署](#快速部署)
+  - [环境依赖](#环境依赖)
+  - [启动服务](#启动服务)
+    - [启动 Demo](#启动-demo)
+    - [启动标准服务](#启动标准服务)
 - [服务访问](#服务访问)
+  - [写入 cache](#写入-cache)
+  - [查询 cache](#查询-cache)
+  - [清空 cache](#清空-cache)
 - [文章](#文章)
-- [架构大图](#架构大图)
+- [功能对比](#功能对比)
 - [核心功能](#核心功能)
+- [Todo List](#todo-list)
+  - [Adapter](#adapter)
+  - [Embedding model\&inference](#embedding-modelinference)
+  - [Scalar Storage](#scalar-storage)
+  - [Vector Storage](#vector-storage)
+  - [Ranking](#ranking)
+  - [Service](#service)
 - [致谢](#致谢)
-- [Contributing](#Contributing)
+- [Contributing](#contributing)
+
 ## 新闻
-- 🔥🔥[2024.10.22] 增加1024程序员节任务。 
+
+- 🔥🔥[2024.10.22] 增加1024程序员节任务。
 - 🔥🔥[2024.04.09] 增加了多租户场景中Redis Search存储和检索embedding的能力，可以将Cache和向量数据库的交互耗时降低至10ms内。
 - 🔥🔥[2023.12.10] 增加llmEmb、onnx、paddlenlp、fasttext等LLM embedding框架，并增加timm 图片embedding框架，用于提供更丰富的embedding能力。
 - 🔥🔥[2023.11.20] codefuse-ModelCache增加本地存储能力, 适配了嵌入式数据库sqlite、faiss，方便用户快速启动测试。
 - [2023.10.31] codefuse-ModelCache...
+
 ## 项目简介
+
 Codefuse-ModelCache 是一个开源的大模型语义缓存系统，通过缓存已生成的模型结果，降低类似请求的响应时间，提升用户体验。该项目从服务优化角度出发，引入缓存机制，在资源有限和对实时性要求较高的场景下，帮助企业和研究机构降低推理部署成本、提升模型性能和效率、提供规模化大模型服务。我们希望通过开源，分享交流大模型语义Cache的相关技术。
+
+## 架构大图
+
+![modelcache modules](docs/modelcache_modules_20240409.png)
+
 ## 快速部署
-项目中启动服务脚本分为flask4modelcache.py 和 flask4modelcache_demo.py，其中：
 
-- flask4modelcache_demo.py 为快速测试服务，内嵌了sqlite和faiss，用户无需关心数据库相关事宜。
-- flask4modelcache.py 为正常服务，需用户具备mysql和milvus等数据库服务。
+项目中启动服务脚本分为 `flask4modelcache.py` 和 `flask4modelcache_demo.py`，其中：
+
+- `flask4modelcache_demo.py` 为快速测试服务，内嵌了 SQLite 和 FAISS，用户无需关心数据库相关事宜。
+- `flask4modelcache.py` 为正常服务，需用户具备 MySQL 和 Milvus 等数据库服务。
+
 ### 环境依赖
-- python版本: 3.8及以上
+
+- python版本: 3.8 及以上
 - 依赖包安装：
-```shell
-pip install -r requirements.txt 
-```
-### 服务启动
-#### 方式一：Demo服务启动
-- 离线模型bin文件下载， 参考地址：[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main)，并将下载的bin文件，放到 model/text2vec-base-chinese 文件夹中。
-- 执行flask4modelcache_demo.py启动服务。
-```shell
-cd CodeFuse-ModelCache
-```
-```shell
-python flask4modelcache_demo.py
-```
 
-#### 方式二：通过 docker-compose 启动服务
-- 离线模型bin文件下载， 参考地址：[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main)，并将下载的bin文件，放到 model/text2vec-base-chinese 文件夹中。
+  ```shell
+  pip install -r requirements.txt 
+  ```
 
-- 配置 docker network，只需执行一次
-```shell
-docker network create modelcache
-```
-- 执行 docker-compose 命令
-```shell
-# 首次运行本地不存在 modelcache 镜像、或 Dockerfile 变更时
-docker-compose up --build
+### 启动服务
 
-# 非首次运行，且 Dockerfile 无变更
-docker-compose up
-```
-#### 方式三：不通过 docker-compose 启动服务
-在启动服务前，应该进行如下环境配置：
-1. 安装关系数据库 mysql， 导入sql创建数据表，sql文件:```reference_doc/create_table.sql```
-2. 安装向量数据库milvus
+#### 启动 Demo
+
+- 离线模型 bin 文件下载， 参考地址：[Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main)，并将下载的 bin 文件，放到 `model/text2vec-base-chinese` 文件夹中。
+- 执行 `flask4modelcache_demo.py` 启动服务。
+
+  ```shell
+  cd CodeFuse-ModelCache
+  ```
+
+  ```shell
+  python flask4modelcache_demo.py
+  ```
+
+#### 启动标准服务
+
+在启动标准服务前，应该进行如下环境配置：
+
+1. 安装关系数据库 MySQL， 导入 SQL 创建数据表，MySQL 文件:```reference_doc/create_table.sql```。
+2. 安装向量数据库 Milvus。
 3. 在配置文件中添加数据库访问信息，配置文件为：
    1. ```modelcache/config/milvus_config.ini```
    2. ```modelcache/config/mysql_config.ini```
-4. 离线模型bin文件下载， 参考地址：[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main)，并将下载的bin文件，放到 model/text2vec-base-chinese 文件夹中
+4. 离线模型 bin 文件下载， 参考地址：[Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main)，并将下载的 bin 文件，放到 `model/text2vec-base-chinese` 文件夹中。
 5. 通过flask4modelcache.py脚本启动后端服务。
+
 ## 服务访问
-当前服务以restful API方式提供3个核心功能：数据写入，cache查询和cache数据清空。请求demo 如下：
-### cache写入
+
+当前服务以 restful API 方式提供 3 个核心功能：数据写入，cache 查询和 cache 数据清空。请求 demo 如下：
+
+### 写入 cache
+
 ```python
 import json
 import requests
@@ -93,7 +119,9 @@ data = {'type': type, 'scope': scope, 'chat_info': chat_info}
 headers = {"Content-Type": "application/json"}
 res = requests.post(url, headers=headers, json=json.dumps(data))
 ```
-### cache查询
+
+### 查询 cache
+
 ```python
 import json
 import requests
@@ -106,7 +134,9 @@ data = {'type': type, 'scope': scope, 'query': query}
 headers = {"Content-Type": "application/json"}
 res = requests.post(url, headers=headers, json=json.dumps(data))
 ```
-### cache清空
+
+### 清空 cache
+
 ```python
 import json
 import requests
@@ -119,12 +149,14 @@ data = {'type': type, 'scope': scope, 'remove_type': remove_type}
 headers = {"Content-Type": "application/json"}
 res = requests.post(url, headers=headers, json=json.dumps(data))
 ```
+
 ## 文章
+
 https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ
-## 架构大图
-![modelcache modules](docs/modelcache_modules_20240409.png)
+
 ## 功能对比
-功能方面，为了解决huggingface网络问题并提升推理速度，增加了embedding本地推理能力。鉴于SqlAlchemy框架存在一些限制，我们对关系数据库交互模块进行了重写，以更灵活地实现数据库操作。在实践中，大型模型产品需要与多个用户和多个模型对接，因此在ModelCache中增加了对多租户的支持，同时也初步兼容了系统指令和多轮会话。
+
+功能方面，为了解决 Hugging Face 网络问题并提升推理速度，增加了 embedding 本地推理能力。鉴于 SqlAlchemy 框架存在一些限制，我们对关系数据库交互模块进行了重写，以更灵活地实现数据库操作。在实践中，大型模型产品需要与多个用户和多个模型对接，因此在 ModelCache 中增加了对多租户的支持，同时也初步兼容了系统指令和多轮会话。
 
 <table>
   <tr>
@@ -248,7 +280,8 @@ https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ
 </table>
 
 ## 核心功能
-在ModelCache中，沿用了GPTCache的主要思想，包含了一系列核心模块：adapter、embedding、similarity和data_manager。adapter模块主要功能是处理各种任务的业务逻辑，并且能够将embedding、similarity、data_manager等模块串联起来；embedding模块主要负责将文本转换为语义向量表示，它将用户的查询转换为向量形式，并用于后续的召回或存储操作；rank模块用于对召回的向量进行相似度排序和评估；data_manager模块主要用于管理数据库。同时，为了更好的在工业界落地，我们做了架构和功能上的升级，如下：
+
+在ModelCache  中，沿用了 GPTCache 的主要思想，包含了一系列核心模块：adapter、embedding、similarity 和 data_manager。adapter模块主要功能是处理各种任务的业务逻辑，并且能够将  embedding、similarity、data_manager等模块串联起来；embedding  模块主要负责将文本转换为语义向量表示，它将用户的查询转换为向量形式，并用于后续的召回或存储操作；rank 模块用于对召回的向量进行相似度排序和评估；data_manager 模块主要用于管理数据库。同时，为了更好的在工业界落地，我们做了架构和功能上的升级，如下：
 
 - [x] 架构调整（轻量化集成）：以类redis的缓存模式嵌入到大模型产品中，提供语义缓存能力，不会干扰LLM调用和安全审核等功能，适配所有大模型服务。
 - [x] 多种模型加载方案：
@@ -267,24 +300,38 @@ https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ
    - 增加model字段和数据统计字段，用于功能拓展。
 
 ## Todo List
+
 ### Adapter
-- [ ] register adapter for Milvus：根据scope中的model参数，初始化对应Collection 并且执行load操作。
+
+- [ ] register adapter for Milvus：根据 scope 中的 model 参数，初始化对应 Collection 并且执行 load 操作。
+
 ### Embedding model&inference
-- [ ] inference优化：优化embedding推理速度，适配fastertransformer, TurboTransformers, ByteTransformer等推理引擎。
+
+- [ ] inference 优化：优化 embedding 推理速度，适配fastertransformer、TurboTransformers 和 ByteTransformer 等推理引擎。
 - [ ] 兼容huggingface模型和modelscope模型，提供更多模型加载方式。
+
 ### Scalar Storage
+
 - [ ] Support MongoDB。
 - [ ] Support ElasticSearch。
+
 ### Vector Storage
+
 - [ ] 在多模态场景中适配faiss存储。
+
 ### Ranking
+
 - [ ] 增加Rank模型，对embedding召回后的数据，进行精排。
+
 ### Service
+
 - [ ] 支持fastapi。
 - [ ] 增加前端界面，用于测试。
 
 ## 致谢
+
 本项目参考了以下开源项目，在此对相关项目和研究开发人员表示感谢。<br />[GPTCache](https://github.com/zilliztech/GPTCache)
 
 ## Contributing
+
 ModelCache是一个非常有趣且有用的项目，我们相信这个项目有很大的潜力，无论你是经验丰富的开发者，还是刚刚入门的新手，都欢迎你为这个项目做出一些贡献，包括但不限于：提交问题和建议，参与代码编写，完善文档和示例。你的参与将会使这个项目变得更好，同时也会为开源社区做出贡献。
@@ -0,0 +1,132 @@
+# What is ModelCache
+
+In ModelCache, we adopted the main idea of GPTCache,  includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:
+
+## Architecture
+
+![modelcache modules](modelcache_modules_20240409.png)
+
+## Function comparison
+
+We've implemented several key updates to our repository. We've resolved network issues with Hugging Face and improved inference speed by introducing local embedding capabilities. Due to limitations in SqlAlchemy, we've redesigned our relational database interaction module for more flexible operations. We've added multi-tenancy support to ModelCache, recognizing the need for multiple users and models in LLM products. Lastly, we've made initial adjustments for better compatibility with system commands and multi-turn dialogues.
+
+<table>
+  <tr>
+    <th rowspan="2">Module</th>
+    <th rowspan="2">Function</th>
+
+  </tr>
+  <tr>
+    <th>ModelCache</th>
+    <th>GPTCache</th>
+  </tr>
+  <tr>
+    <td rowspan="2">Basic Interface</td>
+    <td>Data query interface</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td>Data writing interface</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td rowspan="3">Embedding</td>
+    <td>Embedding model configuration</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td>Large model embedding layer</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>BERT model long text processing</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">Large model invocation</td>
+    <td>Decoupling from large models</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Local loading of embedding model</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">Data isolation</td>
+    <td>Model data isolation</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td>Hyperparameter isolation</td>
+    <td></td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="3">Databases</td>
+    <td>MySQL</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td>Milvus</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td>OceanBase</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="3">Session management</td>
+    <td>Single-turn dialogue</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td>System commands</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Multi-turn dialogue</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">Data management</td>
+    <td>Data persistence</td>
+    <td class="checkmark">&#9745; </td>
+    <td class="checkmark">&#9745; </td>
+  </tr>
+  <tr>
+    <td>One-click cache clearance</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td rowspan="2">Tenant management</td>
+    <td>Support for multi-tenancy</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Milvus multi-collection capability</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>Other</td>
+    <td>Long-short dialogue distinction</td>
+    <td class="checkmark">&#9745; </td>
+    <td></td>
+  </tr>
+</table>
@@ -0,0 +1,29 @@
+# ModelCache features
+
+This topic describes ModelCache features. In ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.
+
+- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.
+- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.
+- The rank module ranks and evaluate the similarity of recalled vectors.
+- The data_manager module manages the databases.
+
+To make ModelCache more suitable for industrial use, we made several improvements to its architecture and functionality:
+
+- [x] Architectural adjustment (lightweight integration):
+  - Embedded into LLM products using a Redis-like caching mode.
+  - Provided semantic caching without interfering with LLM calls, security audits, and other functions.
+  - Compatible with all LLM services.
+- [x] Multiple model loading:
+  - Supported local embedding model loading, and resolved Hugging Face network connectivity issues.
+  - Supported loading embedding layers from various pre-trained models.
+- [x] Data isolation
+  - Environment isolation: Read different database configurations based on the environment. Isolate  development, staging, and production environments.
+  - Multi-tenant data isolation: Dynamically create collections based on models for data isolation, addressing data separation issues in multi-model/service scenarios within large language model products.
+- [x] Supported system instruction: Adopted a concatenation approach to resolve issues with system instructions in the prompt paradigm.
+- [x] Long and short text differentiation: Long texts bring more challenges for similarity assessment. Added differentiation between long and short texts, allowing for separate threshold configurations.
+- [x] Milvus performance optimization: Adjusted Milvus consistency level to "Session" level for better performance.
+- [x] Data management:
+  - One-click cache clearing to enable easy data management after model upgrades.
+  - Recall of hit queries for subsequent data analysis and model iteration reference.
+  - Asynchronous log write-back for data analysis and statistics.
+  - Added model field and data statistics field to enhance features.