-
Notifications
You must be signed in to change notification settings - Fork 155
Open
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
Hello, this issue seems very similar to #136 , but I just can't make it work: the word and line order inside table cells is not preserved when invoking the get_text method.
The json attached is a reslt of running Textract start_document_analysis
with parameters [TextractFeatures.TABLES, TextractFeatures.LAYOUT]
.
When running
import json
import textractor
from textractor.entities.document import Document
j = json.load(open('../data/processed/6e2ab4b2a234e0410205db117803203a1be55a3fc766d56083c62512d71e556e.json'))
doc = Document.open(j)
print(doc.tables[1].get_text())
print(textractor.__version__)
I get as output for example
...
of adolescent and girls
6.1.2.4 the Ensure
...
But the actual lines are "of adolescent girls and" and "6.1.2.4 Ensure the" and the line order is different.
Blocks seem fine and the child order in "Relationships" also seem correct.
What am i doing wrong?
6e2ab4b2a234e0410205db117803203a1be55a3fc766d56083c62512d71e556e.json
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request