knowledge parent-child segmentation #24505
Replies: 3 comments 1 reply
-
The current backend logic for parent-child segmentation in Dify uses your chosen delimiters (like If you want the parent block to exclude child segment content after import, you would need to modify the backend segmentation processor to strip out child ranges from the parent chunk. This would involve changes to the splitting logic in the As a workaround, you can preprocess your input files to ensure parent and child content are clearly separated before import, or post-process the imported data to remove child content from parent blocks. The frontend configuration for delimiters and segmentation rules only affects how the backend splits the text, not the actual content included in each chunk. If you want to customize this behavior, focus on the backend splitting algorithm after the initial parent and child chunking steps. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
@dosu Will this modification affect recall performance? |
Beta Was this translation helpful? Give feedback.
-
Yes, modifying the segmentation logic so that parent blocks exclude child segment content will affect recall performance. Currently, parent chunks include all child content, which increases the chance that a query will retrieve relevant context from either the parent or child chunk. If you strip child content from parent blocks, recall may decrease for queries that rely on the broader context provided by the parent, since only the child chunks will contain that information. This makes retrieval more precise but can fragment context, so unless your retrieval strategy always fetches both parent and child chunks as needed, you may lose some recall and coverage for hierarchical documents. For maximum recall, it's generally beneficial for parent chunks to retain child content unless you have a specific use case for strict separation source. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
Dify version
1.4.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
In the parent-child segmentation of the knowledge base, ### is used for child segment segmentation, but the parent block also contains field content. How should this be handled?
After import, the content of the parent block is similar to:
Parent block content ### Child segment 1 ### Child segment 2
✔️ Expected Behavior
In the parent-child segmentation of the knowledge base, ### is used for child segment segmentation, and the parent block should not contain any child segment content.
After import, the content of the parent block is similar to:
Parent block content ### Child segment 1 ### Child segment 2
❌ Actual Behavior
After import, the content of the parent block is similar to:
Parent block content ### Child segment 1 ### Child segment 2
Beta Was this translation helpful? Give feedback.
All reactions