Skip to content

Commit 6aaf56e

Browse files
Merge pull request #145 from gomate-community/pipeline
Pipeline
2 parents b1ccf2d + b5c552f commit 6aaf56e

File tree

7 files changed

+82
-8
lines changed

7 files changed

+82
-8
lines changed

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,11 +290,16 @@ llm_reranker = PairWiseReranker(reranker_config)
290290
Waiting to implement...
291291
</details>
292292

293+
<details>
294+
<summary>TourRank</summary>
295+
Waiting to implement...
296+
</details>
297+
293298
<details>
294299
<summary>SetWise-Rerank</summary>
295300
We have one setwise method so far:
296301

297-
`setwise likelihood`: LLMs are prompted to judge which document is the most relevant to the given query. Candidate documents are reranked based on the likelihood of generating the label as the most relevant document by LLMs. It is the base rerank method used in (https://arxiv.org/pdf/2310.09497).
302+
`setwise likelihood`: LLMs are prompted to judge which document is the most relevant to the given query. Candidate documents are reranked based on the likelihood of generating the label as the most relevant document by LLMs. It is the base rerank method used in [A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models](https://arxiv.org/pdf/2310.09497).
298303

299304
```python
300305
from trustrag.modules.reranker.llm_reranker import LLMRerankerConfig, SetWiseReranker
@@ -427,6 +432,7 @@ If the group is full or for cooperation and exchange, please contact:
427432
>This project thanks the following open-source projects for their support and contributions:
428433
- Document parsing: [infiniflow/ragflow](https://github.com/infiniflow/ragflow/blob/main/deepdoc/README.md)
429434
- PDF file parsing: [opendatalab/MinerU](https://github.com/opendatalab/MinerU)
435+
- Document rerank: [ielab/llm-rankers](https://github.com/ielab/llm-rankers)
430436

431437

432438
## 👉 Citation

README_zh.md

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,12 +249,81 @@ for result in results:
249249
```
250250

251251
### 5 排序模型
252+
<details>
253+
<summary>Bge-Rerank</summary>
254+
255+
我们使用 [bge-reranker](https://github.com/FlagOpen/FlagEmbedding)作为我们的基础重排序模型。
252256
```python
257+
from trustrag.modules.reranker.bge_reranker import BgeReranker, BgeRerankerConfig
253258
reranker_config = BgeRerankerConfig(
254-
model_name_or_path=reranker_model_path
259+
model_name_or_path='llms/bge-reranker-large'
255260
)
256261
bge_reranker = BgeReranker(reranker_config)
257262
```
263+
</details>
264+
265+
<details>
266+
<summary>PointWise-Rerank</summary>
267+
我们目前实现了2种Pointwise排序方法:
268+
269+
`相关性生成`: 提示LLMs判断给定查询和文档是否相关。基于LLMs生成"是"响应的可能性对候选文档进行重排序。该方法源于[Holistic Evaluation of Language Models](https://arxiv.org/pdf/2211.09110).
270+
271+
`查询生成`: 提示LLMs根据给定文档生成伪查询。基于LLMs生成目标查询的可能性对候选文档进行重排序。该方法源于[Improving Passage Retrieval with Zero-Shot Question Generation](https://arxiv.org/pdf/2204.07496).
272+
273+
我们已实现[flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5)作为我们的Pointwise重排序模型。
274+
```python
275+
from trustrag.modules.reranker.llm_reranker import LLMRerankerConfig, PointWiseReranker
276+
reranker_config = LLMRerankerConfig(
277+
model_name_or_path="flan-t5-small"
278+
)
279+
llm_reranker = PointWiseReranker(reranker_config)
280+
```
281+
</details>
282+
283+
<details>
284+
<summary>PairWise-Rerank</summary>
285+
我们目前实现了2种Pairwise排序方法:
286+
287+
`全排序`: 提示LLMs判断哪个文档比另一个文档与给定查询更相关。候选文档基于他们赢得的次数进行排序。该方法源于[Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting](https://arxiv.org/pdf/2306.17563).
288+
289+
`冒泡排序`: 提示LLMs判断哪个文档比另一个文档与给定查询更相关。候选文档使用冒泡排序算法重新排序。该方法源于[Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting](https://arxiv.org/pdf/2306.17563).
290+
291+
```python
292+
from trustrag.modules.reranker.llm_reranker import LLMRerankerConfig, PairWiseReranker
293+
reranker_config = LLMRerankerConfig(
294+
model_name_or_path="qwen2-7B-instruct"
295+
)
296+
llm_reranker = PairWiseReranker(reranker_config)
297+
```
298+
</details>
299+
300+
<details>
301+
<summary>ListWise-Rerank</summary>
302+
正在实施...
303+
</details>
304+
305+
<details>
306+
<summary>TourRank</summary>
307+
正在实施...
308+
</details>
309+
310+
<details>
311+
<summary>SetWise-Rerank</summary>
312+
我们目前实现了1种Setwise排序方法:
313+
314+
`概率重排`: 提示LLMs判断哪个文档是与给定查询最相关的。基于LLMs生成作为最相关文档的标签的可能性对候选文档进行重排序。该方法源于[A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models](https://arxiv.org/pdf/2310.09497).
315+
316+
```python
317+
from trustrag.modules.reranker.llm_reranker import LLMRerankerConfig, SetWiseReranker
318+
reranker_config = LLMRerankerConfig(
319+
model_name_or_path="qwen2-7B-instruct"
320+
)
321+
llm_reranker = SetWiseReranker(reranker_config)
322+
```
323+
</details>
324+
325+
欲了解更多详情,请参考[reranker inference](./examples/rerankers/).
326+
258327
### 6 生成器配置
259328
```python
260329
glm4_chat = GLM4Chat(llm_model_path)

trustrag/applications/rag.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from trustrag.modules.reranker.bge_reranker import BgeReranker
1616
from trustrag.modules.retrieval.dense_retriever import DenseRetriever
1717
from trustrag.modules.document.chunk import TextChunker
18-
from trustrag.modules.retrieval.embedding import FlagModelEmbedding
18+
from trustrag.modules.vector.embedding import FlagModelEmbedding
1919
class ApplicationConfig():
2020
def __init__(self):
2121
self.retriever_config = None

trustrag/modules/engine/chroma.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
from typing import List, Dict, Any, Union
22
import numpy as np
33
import chromadb
4-
from chromadb.config import Settings
5-
from trustrag.modules.retrieval.embedding import EmbeddingGenerator
4+
from trustrag.modules.vector.embedding import EmbeddingGenerator
65

76

87
class ChromaEngine:

trustrag/modules/engine/milvus.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
from typing import List, Dict, Any, Optional
33
import numpy as np
44
from openai import OpenAI
5-
from trustrag.modules.retrieval.embedding import EmbeddingGenerator
5+
from trustrag.modules.vector.embedding import EmbeddingGenerator
66
from typing import Union
77
class MilvusEngine:
88
def __init__(

trustrag/modules/engine/qdrant.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from abc import ABC, abstractmethod
55
import numpy as np
66
from openai import OpenAI
7-
from trustrag.modules.retrieval.embedding import EmbeddingGenerator
7+
from trustrag.modules.vector.embedding import EmbeddingGenerator
88

99

1010
class QdrantEngine:

trustrag/modules/engine/weaviate_cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from weaviate.collections import Collection
66
import weaviate.classes.config as wc
77
from weaviate.classes.config import Property, DataType
8-
from trustrag.modules.retrieval.embedding import EmbeddingGenerator
8+
from trustrag.modules.vector.embedding import EmbeddingGenerator
99
from weaviate.classes.query import MetadataQuery
1010

1111
class WeaviateEngine:

0 commit comments

Comments
 (0)