Skip to content

Commit 8a6256f

Browse files
Merge pull request #138 from gomate-community/pipeline
Pipeline
2 parents c16d5c8 + 134514a commit 8a6256f

File tree

6 files changed

+21
-8
lines changed

6 files changed

+21
-8
lines changed

README.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -254,9 +254,9 @@ bge_reranker = BgeReranker(reranker_config)
254254
<summary>PointWise-Rerank</summary>
255255
We have two pointwise methods so far:
256256

257-
`relevance generation`: LLMs are prompted to judge whether the given query and document are relevant. Candidate documents are reranked based on the likelihood of generating a "yes" response by LLMs. It is the rerank method used in (https://arxiv.org/pdf/2211.09110).
257+
`relevance generation`: LLMs are prompted to judge whether the given query and document are relevant. Candidate documents are reranked based on the likelihood of generating a "yes" response by LLMs. It is the rerank method used in [Holistic Evaluation of Language Models](https://arxiv.org/pdf/2211.09110).
258258

259-
`query generation`: LLMs are prompted to generate a pseudo-query based on the given document. Candidate documents are reranked based on the likelihood of generating the target query by LLMs. It is the rerank method used in (https://arxiv.org/pdf/2204.07496).
259+
`query generation`: LLMs are prompted to generate a pseudo-query based on the given document. Candidate documents are reranked based on the likelihood of generating the target query by LLMs. It is the rerank method used in [Improving Passage Retrieval with Zero-Shot Question Generation](https://arxiv.org/pdf/2204.07496).
260260

261261
We have implemented [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) as our pointwise reranker model.
262262
```python
@@ -270,7 +270,19 @@ llm_reranker = PointWiseReranker(reranker_config)
270270

271271
<details>
272272
<summary>PairWise-Rerank</summary>
273-
Waiting to implement...
273+
We have two pairwise method so far:
274+
275+
`allpair`: LLMs are prompted to judge which document is the more relevant to the given query than the other. Candidate documents are based on the number of that they win. It is the rerank method used in [Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting](https://arxiv.org/pdf/2306.17563).
276+
277+
`bubblesort`: LLMs are prompted to judge which document is the more relevant to the given query than the other. Candidate documents are reranked using bubblesort algorithm. It is the other rerank method used in [Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting](https://arxiv.org/pdf/2306.17563).
278+
279+
```python
280+
from trustrag.modules.reranker.llm_reranker import LLMRerankerConfig, PairWiseReranker
281+
reranker_config = LLMRerankerConfig(
282+
model_name_or_path="qwen2-7B-instruct"
283+
)
284+
llm_reranker = PairWiseReranker(reranker_config)
285+
```
274286
</details>
275287

276288
<details>

config_online.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
},
88
"siliconflow": {
99
"base_url": "https://api.siliconflow.cn/v1",
10-
"api_key": "sk-yfgjndsavpwcnnedlhllyfunxwsckfguirokexokstbvwnjf",
10+
"api_key": "sk-xxxx",
1111
"description": "SiliconFlow API 服务"
1212
},
1313
"rerank": {

docs/dify.md

Whitespace-only changes.

docs/git.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
## git常见命令
22
- 合并分支
33
```bash
4-
git pull main
4+
git pull origin main
55
git merge main
66
```
77

docs/xinference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
docker run --name xinference -d -p 9997:9997 -e XINFERENCE_HOME=/data -v $(pwd):/data --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0

trustrag/modules/chunks/semantic_chunk.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
import re
2-
import numpy as np
32
from sklearn.metrics.pairwise import cosine_similarity
43
from trustrag.modules.document import rag_tokenizer
54
from trustrag.modules.chunks.base import BaseChunker
65
from sentence_transformers import SentenceTransformer
76
from langchain.embeddings import OpenAIEmbeddings
7+
from langchain_experimental.text_splitter import SemanticChunker
88

99
class SemanticChunker(BaseChunker):
1010
"""
@@ -136,7 +136,7 @@ def get_chunks(self, paragraphs: list[str]) -> list[str]:
136136

137137
# Determine breakpoints based on the similarity threshold
138138
breakpoint_indices = [i for i, distance in enumerate(distances) if distance > (1 - self.similarity_threshold)]
139-
139+
print(breakpoint_indices)
140140
# Combine sentences into chunks
141141
chunks = []
142142
start_index = 0
@@ -181,7 +181,7 @@ def process_text_chunks(self, chunks: list[str]) -> list[str]:
181181
return processed_chunks
182182

183183
if __name__ == '__main__':
184-
with open("../../../data/docs/news.txt", "r", encoding="utf-8") as f:
184+
with open("../../../data/docs/伊朗总统罹难事件.txt", "r", encoding="utf-8") as f:
185185
content = f.read()
186186

187187
# Example 1: Use SentenceTransformer

0 commit comments

Comments
 (0)