Skip to content

Commit 7dd1e03

Browse files
committed
merge main branch
2 parents 848e764 + 31da4b7 commit 7dd1e03

File tree

93 files changed

+3396
-186
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+3396
-186
lines changed

.gitignore

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,6 @@ share/python-wheels/
2727
*.egg
2828
MANIFEST
2929
*.DS_Store
30-
# PyInstaller
31-
# Usually these files are written by a python script from a template
32-
# before PyInstaller builds the exe, so as to inject date/other infos into it.
3330
*.manifest
3431
*.spec
3532

@@ -85,14 +82,6 @@ ipython_config.py
8582
# pyenv
8683
.python-version
8784

88-
# pipenv
89-
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
90-
# However, in case of collaboration, if having platform-specific dependencies or dependencies
91-
# having no cross-platform support, pipenv may install dependencies that don't work, or not
92-
# install all needed dependencies.
93-
#Pipfile.lock
94-
95-
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
9685
__pypackages__/
9786

9887
# Celery stuff
@@ -146,4 +135,11 @@ dmypy.json
146135
/embedding_npy
147136
/flask_server
148137
*.bin
149-
*ini
138+
**/maya_embedding_service
139+
140+
*.ini
141+
142+
**/multicache_serving.py
143+
**/modelcache_serving.py
144+
145+
**/model/

README.md

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<div align="center">
22
<h1>
3-
Codefuse-ModelCache
3+
ModelCache
44
</h1>
55
</div>
66

@@ -25,6 +25,7 @@ Codefuse-ModelCache
2525
- [Acknowledgements](#Acknowledgements)
2626
- [Contributing](#Contributing)
2727
## news
28+
- 🔥🔥[2024.04.09] Add Redis Search to store and retrieve embeddings in multi-tenant scene, this can reduce the interaction time between Cache and vector databases to 10ms.
2829
- 🔥🔥[2023.12.10] we integrate LLM embedding frameworks such as 'llmEmb', 'ONNX', 'PaddleNLP', 'FastText', alone with the image embedding framework 'timm', to bolster embedding functionality.
2930
- 🔥🔥[2023.11.20] codefuse-ModelCache has integrated local storage, such as sqlite and faiss, providing users with the convenience of quickly initiating tests.
3031
- [2023.08.26] codefuse-ModelCache...
@@ -39,20 +40,26 @@ The project's startup scripts are divided into flask4modelcache.py and flask4mod
3940
- Python version: 3.8 and above
4041
- Package Installation
4142
```shell
42-
pip install requirements.txt
43+
pip install -r requirements.txt
4344
```
4445
### Service Startup
4546
#### Demo Service Startup
4647
1. Download the embedding model bin file from the following address: [https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place the downloaded bin file in the model/text2vec-base-chinese folder.
4748
2. Start the backend service using the flask4modelcache_dome.py script.
49+
```shell
50+
cd CodeFuse-ModelCache
51+
```
52+
```shell
53+
python flask4modelcache_demo.py
54+
```
4855

4956
#### Normal Service Startup
5057
Before starting the service, the following environment configurations should be performed:
51-
1. Install the relational database MySQL and import the SQL file to create the data tables. The SQL file can be found at: reference_doc/create_table.sql
58+
1. Install the relational database MySQL and import the SQL file to create the data tables. The SQL file can be found at: ```reference_doc/create_table.sql```
5259
2. Install the vector database Milvus.
5360
3. Add the database access information to the configuration files:
54-
1. modelcache/config/milvus_config.ini
55-
2. modelcache/config/mysql_config.ini
61+
1. ```modelcache/config/milvus_config.ini ```
62+
2. ```modelcache/config/mysql_config.ini```
5663
4. Download the embedding model bin file from the following address: [https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main). Place the downloaded bin file in the model/text2vec-base-chinese folder.
5764
5. Start the backend service using the flask4modelcache.py script.
5865
## Service-Access
@@ -99,7 +106,7 @@ res = requests.post(url, headers=headers, json=json.dumps(data))
99106
## Articles
100107
https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ
101108
## modules
102-
![modelcache modules](docs/modelcache_modules_20231114.png)
109+
![modelcache modules](docs/modelcache_modules_20240409.png)
103110
## Function-Comparison
104111
In terms of functionality, we have made several changes to the git repository. Firstly, we have addressed the network issues with huggingface and enhanced the inference speed by introducing local inference capabilities for embeddings. Additionally, considering the limitations of the SqlAlchemy framework, we have completely revamped the module responsible for interacting with relational databases, enabling more flexible database operations. In practical scenarios, LLM products often require integration with multiple users and multiple models. Hence, we have added support for multi-tenancy in the ModelCache, while also making preliminary compatibility adjustments for system commands and multi-turn dialogue.
105112

@@ -244,11 +251,23 @@ In ModelCache, we adopted the main idea of GPTCache, includes core modules: ada
244251
- Asynchronous log write-back capability for data analysis and statistics.
245252
- Added model field and data statistics field for feature expansion.
246253

247-
Future Features Under Development:
254+
## Todo List
255+
### Adapter
256+
- [ ] Register adapter for Milvus:Based on the "model" parameter in the scope, initialize the corresponding Collection and perform the load operation.
257+
### Embedding model&inference
258+
- [ ] Inference Optimization: Optimizing the speed of embedding inference, compatible with inference engines such as FasterTransformer, TurboTransformers, and ByteTransformer.
259+
- [ ] Compatibility with Hugging Face models and ModelScope models, offering more methods for model loading.
260+
### Scalar Storage
261+
- [ ] Support MongoDB
262+
- [ ] Support ElasticSearch
263+
### Vector Storage
264+
- [ ] Adapts Faiss storage in multimodal scenarios.
265+
### Ranking
266+
- [ ] Add ranking model to refine the order of data after embedding recall.
267+
### Service
268+
- [ ] Supports FastAPI.
269+
- [ ] Add visual interface to offer a more direct user experience.
248270

249-
- [ ] Data isolation based on hyperparameters.
250-
- [ ] System prompt partitioning storage capability to enhance accuracy and efficiency of similarity matching.
251-
- [ ] More versatile embedding models and similarity evaluation algorithms.
252271
## Acknowledgements
253272
This project has referenced the following open-source projects. We would like to express our gratitude to the projects and their developers for their contributions and research.<br />[GPTCache](https://github.com/zilliztech/GPTCache)
254273

README_CN.md

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<div align="center">
22
<h1>
3-
Codefuse-ModelCache
3+
ModelCache
44
</h1>
55
</div>
66

@@ -25,6 +25,7 @@ Codefuse-ModelCache
2525
- [致谢](#致谢)
2626
- [Contributing](#Contributing)
2727
## 新闻
28+
- 🔥🔥[2024.04.09] 增加了多租户场景中Redis Search存储和检索embedding的能力,可以将Cache和向量数据库的交互耗时降低至10ms内。
2829
- 🔥🔥[2023.12.10] 增加llmEmb、onnx、paddlenlp、fasttext等LLM embedding框架,并增加timm 图片embedding框架,用于提供更丰富的embedding能力。
2930
- 🔥🔥[2023.11.20] codefuse-ModelCache增加本地存储能力, 适配了嵌入式数据库sqlite、faiss,方便用户快速启动测试。
3031
- [2023.10.31] codefuse-ModelCache...
@@ -36,24 +37,29 @@ Codefuse-ModelCache 是一个开源的大模型语义缓存系统,通过缓存
3637
- flask4modelcache_demo.py 为快速测试服务,内嵌了sqlite和faiss,用户无需关心数据库相关事宜。
3738
- flask4modelcache.py 为正常服务,需用户具备mysql和milvus等数据库服务。
3839
### 环境依赖
39-
4040
- python版本: 3.8及以上
4141
- 依赖包安装:
4242
```shell
43-
pip install requirements.txt
43+
pip install -r requirements.txt
4444
```
4545
### 服务启动
4646
#### Demo服务启动
4747
- 离线模型bin文件下载, 参考地址:[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main),并将下载的bin文件,放到 model/text2vec-base-chinese 文件夹中。
48-
- 执行flask4modelcache_demo.py脚本即可启动。
48+
- 执行flask4modelcache_demo.py启动服务。
49+
```shell
50+
cd CodeFuse-ModelCache
51+
```
52+
```shell
53+
python flask4modelcache_demo.py
54+
```
4955

5056
#### 正常服务启动
5157
在启动服务前,应该进行如下环境配置:
52-
1. 安装关系数据库 mysql, 导入sql创建数据表,sql文件: reference_doc/create_table.sql
58+
1. 安装关系数据库 mysql, 导入sql创建数据表,sql文件:```reference_doc/create_table.sql```
5359
2. 安装向量数据库milvus
5460
3. 在配置文件中添加数据库访问信息,配置文件为:
55-
1. modelcache/config/milvus_config.ini
56-
2. modelcache/config/mysql_config.ini
61+
1. ```modelcache/config/milvus_config.ini```
62+
2. ```modelcache/config/mysql_config.ini```
5763
4. 离线模型bin文件下载, 参考地址:[https://huggingface.co/shibing624/text2vec-base-chinese/tree/main](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main),并将下载的bin文件,放到 model/text2vec-base-chinese 文件夹中
5864
5. 通过flask4modelcache.py脚本启动后端服务。
5965
## 服务访问
@@ -100,7 +106,7 @@ res = requests.post(url, headers=headers, json=json.dumps(data))
100106
## 文章
101107
https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ
102108
## 架构大图
103-
![modelcache modules](docs/modelcache_modules_20231114.png)
109+
![modelcache modules](docs/modelcache_modules_20240409.png)
104110
## 功能对比
105111
功能方面,为了解决huggingface网络问题并提升推理速度,增加了embedding本地推理能力。鉴于SqlAlchemy框架存在一些限制,我们对关系数据库交互模块进行了重写,以更灵活地实现数据库操作。在实践中,大型模型产品需要与多个用户和多个模型对接,因此在ModelCache中增加了对多租户的支持,同时也初步兼容了系统指令和多轮会话。
106112

@@ -244,11 +250,23 @@ https://mp.weixin.qq.com/s/ExIRu2o7yvXa6nNLZcCfhQ
244250
- 异步日志回写能力,用于数据分析和统计
245251
- 增加model字段和数据统计字段,用于功能拓展。
246252

247-
未来会持续建设的功能:
253+
## Todo List
254+
### Adapter
255+
- [ ] register adapter for Milvus:根据scope中的model参数,初始化对应Collection 并且执行load操作。
256+
### Embedding model&inference
257+
- [ ] inference优化:优化embedding推理速度,适配fastertransformer, TurboTransformers, ByteTransformer等推理引擎。
258+
- [ ] 兼容huggingface模型和modelscope模型,提供更多模型加载方式。
259+
### Scalar Storage
260+
- [ ] Support MongoDB。
261+
- [ ] Support ElasticSearch。
262+
### Vector Storage
263+
- [ ] 在多模态场景中适配faiss存储。
264+
### Ranking
265+
- [ ] 增加Rank模型,对embedding召回后的数据,进行精排。
266+
### Service
267+
- [ ] 支持fastapi。
268+
- [ ] 增加前端界面,用于测试。
248269

249-
- [ ] 基于超参数的数据隔离
250-
- [ ] system promt分区存储能力,以提高相似度匹配的准确度和效率
251-
- [ ] 更通用的embedding模型和相似度评估算法
252270
## 致谢
253271
本项目参考了以下开源项目,在此对相关项目和研究开发人员表示感谢。<br />[GPTCache](https://github.com/zilliztech/GPTCache)
254272

docs/modelcache_modules_20240409.png

494 KB
Loading

examples/embedding/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# -*- coding: utf-8 -*-

examples/flask/llms_cache/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# -*- coding: utf-8 -*-

examples/flask/data_insert.py renamed to examples/flask/llms_cache/data_insert.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ def run():
1313
headers = {"Content-Type": "application/json"}
1414
res = requests.post(url, headers=headers, json=json.dumps(data))
1515
res_text = res.text
16-
print('res_text: {}'.format(res_text))
1716

1817

1918
if __name__ == '__main__':

examples/flask/data_query.py renamed to examples/flask/llms_cache/data_query.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ def run():
1313
headers = {"Content-Type": "application/json"}
1414
res = requests.post(url, headers=headers, json=json.dumps(data))
1515
res_text = res.text
16-
print('res_text: {}'.format(res_text))
1716

1817

1918
if __name__ == '__main__':

examples/flask/data_query_long.py renamed to examples/flask/llms_cache/data_query_long.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ def run():
1818
headers = {"Content-Type": "application/json"}
1919
res = requests.post(url, headers=headers, json=json.dumps(data))
2020
res_text = res.text
21-
print('res_text: {}'.format(res_text))
2221

2322

2423
if __name__ == '__main__':

examples/flask/llms_cache/register.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
register index for redis
4+
"""
5+
import json
6+
import requests
7+
8+
9+
def run():
10+
url = 'http://127.0.0.1:5000/modelcache'
11+
type = 'register'
12+
scope = {"model": "CODEGPT-1117"}
13+
data = {'type': type, 'scope': scope}
14+
headers = {"Content-Type": "application/json"}
15+
res = requests.post(url, headers=headers, json=json.dumps(data))
16+
res_text = res.text
17+
18+
19+
if __name__ == '__main__':
20+
run()

0 commit comments

Comments
 (0)