getting IB A©MENT for IB PAYMENT - introduced using repair on open #1262
Replies: 4 comments
-
Without access to the PDF, this will be difficult to debug. But as an initial foray: If you open the PDF in your browser, copy the relevant section of text, and paste it into a text editor, do you see |
Beta Was this translation helpful? Give feedback.
-
Great suggestion to help narrow down the issue. When I copy/paste I get the expected text: Details
BALANCE BROUGHT FORWARD
IB PAYMENT TO
SJ BARNES
IB PAYMENT TO
NEDBANK CREDIT CA NEDC/CARD PN IB PAYMENT TO ... I will put together a min script that replicates the issue. |
Beta Was this translation helpful? Give feedback.
-
In a min script I get what we expect:
In my fully loaded script I get:
... weird. I'm only posting for ideas for how encoding might be an issue? regex filters? |
Beta Was this translation helpful? Give feedback.
-
Found it: When I set with pdfplumber.open(self.pdf_path, repair=False, unicode_norm="NFC") as pdf: If this is helpful, I can scrub the pdf of personal information and provide it here. |
Beta Was this translation helpful? Give feedback.
-
I have been working with pdfplumber for a few months now. I don't yet have a rigorous testing infrastructure up and running. I'm parsing 100's of different versions of bank statements. Anyway, somewhere along the way, this pdf that was working a few weeks ago, started to puch out a copy right glyph and some:
IB PAYMENT TO
->IB A©MENT
. Any ideas on where to problem solve would be greatly appreciated.Beta Was this translation helpful? Give feedback.
All reactions