[Bug]i could not able to perform the recursive chunking with PDFKnowledgebase #2012

mahendra867 · 2025-02-05T09:36:19Z

Description

Briefly describe the issue you’re experiencing or the bug you’ve found.

when i performed the recursive chunking by these code

code

pdf_knowledge_base = PDFKnowledgeBase(
path="D:\Projects\agentic_new_rag\pdfs",
vector_db=PgVector(
table_name="updated_rag009",
schema='ai',
db_url=db_url,
search_type=SearchType.hybrid,
vector_index=HNSW(),
embedder = OpenAIEmbedder(
api_key=OPENAI_API_KEY,
id="text-embedding-ada-002",
dimensions=1536,
encoding_format="float"
)
),
reader=PDFReader(chunk=False), # Use a default reader,
chunking_strategy=RecursiveChunking(chunk_size=4000,overlap = 800),
documents=3,
)

Steps to Reproduce

List the steps needed to encounter this bug or issue.

Agent Configuration (if applicable)

Provide relevant agent configuration.

Agentic Rag configuration iam using

Expected Behavior

What did you expect to happen?

based on the above code it should perform recursive chunking but it did not add the last 800 characters data to each content

Actual Behavior

What actually happened instead?

recursive chunking did not happened

this is the result i got

page 11 of pdf content

DoDM 5200.45, January 17, 2025

SECTION 3: OCA 11
(1) Combatant Commands will send all OCA delegation requests to the Chief, Joint Staff
Security Office for endorsement and Joint Staff submission to the DDI(CL&S).
(2) The designated SAOs in the MILDEPs manage the delegation of OCAs within their
respective MILDEPs and provide an updated list of OCA positions to DDI(CL&S) annually.
i. OCA delegation requests will:
(1) Identify the official by position title and classification level requested.
(2) Include a description of why the official requires OCA and why the classification
level requested is necessary and appropriate. If the position’s next-level supervisor has OCA,
requests will include a justification why it is either inappropriate or impractical for that official to
exercise OCA.
(3) Be submitted and endorsed by an official at least one supervisory level above the
official to whom the request seeks delegation of OCA. See Figure 1 for an example of an OCA
delegation request memorandum.
Figure 1. Example of Request for OCA

j.
All DoD Component requests for these changes to OCAs, excluding those from the
MILDEPs, require submission of a memorandum from the Component SAO to the DDI(CL&S)
detailing the reason for the change, which includes:
(1) Move of the position delegated OCA due to reorganization or realignment.
(2) Position title change.

page no 12 content of pdf

DoDM 5200.45, January 17, 2025

SECTION 3: OCA 12
(3) Removal or downgrade of OCA delegation.
k. All DoD Components, including the MILDEPs, must verify the officials by position
delegated OCA to the DDI(CL&S) annually. See Figure 2 for an example of an OCA
verification submission. This verification will include:
(1) A list of OCAs by position title and the classification level delegated.
(2) The date each OCA received the initial or annual refresher training.
Figure 2. OCA Verification Example

TRAINING REQUIREMENTS.
a. Before exercising OCA, and annually thereafter, officials delegated OCA will certify in
writing that they have received the required training. This acknowledgement can be
accomplished through any method determined by the DoD Component (e.g., signing a certificate
or form). Individuals who hold OCA for multiple positions must complete the certification for
each position delegated OCA.
b. Personnel assigned responsibility for creating, reviewing, and managing SCGs and
submitting information to OCAs for original classification decisions require additional
knowledge of the original classification decision process and will take the OCA training annually
in addition to all other security training requirements. Completion of this training will be tracked
by the activity security manager and will not give the individual the authority to make original
classification decisions.
c. The activity security manager, or other designated personnel, will ensure:
(1) OCA training is conducted as required. Training may be completed through
individualized Component-developed training or through the OCA training course available on
the Defense Counterintelligence and Security Agency Center for Development of Security
Excellence Website at https://www.cdse.edu/Training/eLearning/IF102/.
(2) Copies of the training records are maintained in accordance with DoDI 5015.02.
(3) Training records are available when requested by appropriate authorities.
d. At a minimum, the OCA training will address:

Screenshots or Logs (if applicable)

Include any relevant screenshots or error logs that demonstrate the issue.

Environment

OS: (e.g. macOS, Windows 11)
Browser (if relevant): (e.g. Chrome 108, Firefox 107)
Agno Version: (e.g. v1.0.0)
External Dependency Versions: (e.g., yfinance 0.2.52)
Additional Environment Details: (e.g., Python 3.10)

Possible Solutions (optional)

Suggest any ideas you might have to fix or address the issue.

Additional Context

Add any other context or details about the problem here.
solve this issue as quick as possible

manthanguptaa · 2025-02-06T05:34:57Z

Hey @mahendra867! I found a small detail that might have been causing the issue. I see that you put chunk=False in your PDFReader class, making it skip the chunking process. If you turn it back to True or remove it altogether. It should work as expected

mahendra867 · 2025-02-07T09:11:10Z

Hi @manthanguptaa thanks for the response but i still could not able to create

when i make the chunk=True its taking lot of time for reading for almost more than 20 mintues its keep reading for small 500kb pdf when i put chunk=True and use RecursiveChunking(chunk_size=4000, overlap=800) and its consuming too much resources in such a way that atlast my pc is getting shut down but its not resulting to even create documents as well as documents which has content overlap in the pg vector database

but when i use fixed chunking stratergy which is default provied by PdfReader params the documents are getting created quickly

could you please try out from your side by taking small pdf text content file and use recursive chunking with must include these 2 params RecursiveChunking(chunk_size=4000, overlap=800) and make chunk=True in PdfReader and run the code and tell me the results

This is my code PDFKnowledgeBase Params i have used

pdf_knowledge_base = PDFKnowledgeBase(
path="D:\Projects\agentic_new_rag\pdfs",
vector_db=PgVector(
table_name="updated_rag6201",
schema='aii',
db_url=db_url,
search_type=SearchType.hybrid,
vector_index=HNSW(),
embedder = OpenAIEmbedder(
api_key=OPENAI_API_KEY,
id="text-embedding-ada-002",
dimensions=1536,
encoding_format="float"
)
),
reader=PDFReader(chunk=True),
chunking_strategy=RecursiveChunking(chunk_size=4000,overlap = 800),
documents=3,
)

mahendra867 added the bug Something isn't working label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]i could not able to perform the recursive chunking with PDFKnowledgebase #2012

[Bug]i could not able to perform the recursive chunking with PDFKnowledgebase #2012

mahendra867 commented Feb 5, 2025 •

edited

Loading

manthanguptaa commented Feb 6, 2025

mahendra867 commented Feb 7, 2025 •

edited

Loading

[Bug]i could not able to perform the recursive chunking with PDFKnowledgebase #2012

[Bug]i could not able to perform the recursive chunking with PDFKnowledgebase #2012

Comments

mahendra867 commented Feb 5, 2025 • edited Loading

Description

when i performed the recursive chunking by these code

code

Steps to Reproduce

Agent Configuration (if applicable)

Expected Behavior

Actual Behavior

page 11 of pdf content

page no 12 content of pdf

Screenshots or Logs (if applicable)

Environment

Possible Solutions (optional)

Additional Context

manthanguptaa commented Feb 6, 2025

mahendra867 commented Feb 7, 2025 • edited Loading

mahendra867 commented Feb 5, 2025 •

edited

Loading

mahendra867 commented Feb 7, 2025 •

edited

Loading