WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont= #12

jackyetz · 2019-03-03T17:01:29Z

When extracting text from pdf (https://www.aanda.org/articles/aa/pdf/2006/02/aa3061-05.pdf), I got a lot of warning and the extraction failed.

My code is as:
import os
import sys
import importlib
importlib.reload(sys)
from pdfminer.pdfparser import PDFParser,PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LTTextBoxHorizontal,LAParams
from pdfminer.pdfinterp import PDFTextExtractionNotAllowed
def parse(path,target):
if (os.path.exists(target)):
os.remove(target)
fp = open(path, 'rb')
praser = PDFParser(fp)
doc = PDFDocument()
praser.set_document(doc)
doc.set_parser(praser)

doc.initialize()

if not doc.is_extractable:
    raise PDFTextExtractionNotAllowed
else:
    rsrcmgr = PDFResourceManager()
    laparams = LAParams(all_texts = True)
    device = PDFPageAggregator(rsrcmgr, laparams=laparams)
    interpreter = PDFPageInterpreter(rsrcmgr, device)

    for page in doc.get_pages(): # doc.get_pages() 获取page列表
        interpreter.process_page(page)
        layout = device.get_result()
        for x in layout:
            if (isinstance(x, LTTextBoxHorizontal)):
                with open(target, 'a', encoding='utf-8') as f:
                    results = x.get_text()
                    # print(results)
                    f.write(results + '\n')

if name == 'main':
path = r'./pdf/aa3061-05.pdf'
parse(path,path.replace('.pdf','.txt'))

the warnings:
......
WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont='BIBNJI+txsy'>, 5
WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont='BIBNJI+txsy'>, 5
WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont='BICMGG+txex'>, 4
WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont='BIBNJI+txsy'>, 5
WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont='BICMGG+txex'>, 5
WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont='BIBNJI+txsy'>, 5
WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont='BIBNJI+txsy'>, 5
......

The text was updated successfully, but these errors were encountered:

paulfwb · 2020-05-03T14:35:53Z

I'm getting tem same problem.
I'll let you know if I fix it.

rocket2016 · 2021-01-11T17:09:33Z

Could you share your solution, please! I have the same problem.

rocket2016 · 2021-01-11T17:11:19Z

I'm getting tem same problem.
I'll let you know if I fix it.

Could you share your solution, please! I have the same problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont= #12

WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont= #12

jackyetz commented Mar 3, 2019

paulfwb commented May 3, 2020

rocket2016 commented Jan 11, 2021

rocket2016 commented Jan 11, 2021

WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont= #12

WARNING:pdfminer.converter:undefined: <PDFType1Font: basefont= #12

Comments

jackyetz commented Mar 3, 2019

paulfwb commented May 3, 2020

rocket2016 commented Jan 11, 2021

rocket2016 commented Jan 11, 2021