You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- 离线模型 bin 文件下载, 参考地址:[Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main),并将下载的 bin 文件,放到 `model/text2vec-base-chinese` 文件夹中。
4.离线模型 bin 文件下载, 参考地址:[Hugging Face](https://huggingface.co/shibing624/text2vec-base-chinese/tree/main),并将下载的 bin 文件,放到 `model/text2vec-base-chinese` 文件夹中。
In ModelCache, we adopted the main idea of GPTCache, includes core modules: adapter, embedding, similarity, and data_manager. The adapter module is responsible for handling the business logic of various tasks and can connect the embedding, similarity, and data_manager modules. The embedding module is mainly responsible for converting text into semantic vector representations, it transforms user queries into vector form.The rank module is used for sorting and evaluating the similarity of the recalled vectors. The data_manager module is primarily used for managing the database. In order to better facilitate industrial applications, we have made architectural and functional upgrades as follows:
We've implemented several key updates to our repository. We've resolved network issues with Hugging Face and improved inference speed by introducing local embedding capabilities. Due to limitations in SqlAlchemy, we've redesigned our relational database interaction module for more flexible operations. We've added multi-tenancy support to ModelCache, recognizing the need for multiple users and models in LLM products. Lastly, we've made initial adjustments for better compatibility with system commands and multi-turn dialogues.
This topic describes ModelCache features. In ModelCache, we incorporated the core principles of GPTCache. ModelCache has four modules: adapter, embedding, similarity, and data_manager.
4
+
5
+
- The adapter module orchestrates the business logic for various tasks, integrate the embedding, similarity, and data_manager modules.
6
+
- The embedding module converts text into semantic vector representations, and transforms user queries into vectors.
7
+
- The rank module ranks and evaluate the similarity of recalled vectors.
8
+
- The data_manager module manages the databases.
9
+
10
+
To make ModelCache more suitable for industrial use, we made several improvements to its architecture and functionality:
- Embedded into LLM products using a Redis-like caching mode.
14
+
- Provided semantic caching without interfering with LLM calls, security audits, and other functions.
15
+
- Compatible with all LLM services.
16
+
-[x] Multiple model loading:
17
+
- Supported local embedding model loading, and resolved Hugging Face network connectivity issues.
18
+
- Supported loading embedding layers from various pre-trained models.
19
+
-[x] Data isolation
20
+
- Environment isolation: Read different database configurations based on the environment. Isolate development, staging, and production environments.
21
+
- Multi-tenant data isolation: Dynamically create collections based on models for data isolation, addressing data separation issues in multi-model/service scenarios within large language model products.
22
+
-[x] Supported system instruction: Adopted a concatenation approach to resolve issues with system instructions in the prompt paradigm.
23
+
-[x] Long and short text differentiation: Long texts bring more challenges for similarity assessment. Added differentiation between long and short texts, allowing for separate threshold configurations.
24
+
-[x] Milvus performance optimization: Adjusted Milvus consistency level to "Session" level for better performance.
25
+
-[x] Data management:
26
+
- One-click cache clearing to enable easy data management after model upgrades.
27
+
- Recall of hit queries for subsequent data analysis and model iteration reference.
28
+
- Asynchronous log write-back for data analysis and statistics.
29
+
- Added model field and data statistics field to enhance features.
0 commit comments