bullet point content overflowing to next page in the PDF #4240
Unanswered
krish-tech02
asked this question in
Looking for help
Replies: 1 comment 1 reply
-
@JorjMcKie can you please help? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Uploading Arthralgia 01-09-2025-bullet-overflow-to-next-page.pdf…
In the attached PDF, on page 1, the last bullet point starts with "Warmth or redness:". The content overflows and continues onto the next page. I am using the PyMuPDF library and the
page.get_text("blocks", flags=1+2+8)
method to extract PDF content and convert it into HTML.I want to wrap each bullet point in an
<li>
tag, but since the last bullet point's content spans two pages, it gets extracted as separate blocks on different pages. Is there a way to identify that the content on the next page belongs to the same bullet point from the previous page? I considered using thex
andy
coordinates, but they don't seem to change enough to differentiate between a continued bullet point and new paragraph content.Could someone please help me figure out how to handle this?
Beta Was this translation helpful? Give feedback.
All reactions