Version 1.7.0
What's Changed
- Loosen XlsxWriter version constraints by @mdscruggs in #292
- Rework the linearization heuristic to ensure that no words are missing or duplicated
- Fix KeyValues being assigned twice on overlapping table cells, going forward KVs inside a tables are ignored (table structure takes precedence)
- Hardens parser code against missing children in layouts or KeyValues with missing keys
- Fix markdown tables not having header rows when one of the cell is empty
- Add support for Python 3.11 and 3.12 in the GitHub action workflows
- Add
textractor.__version__to allow easier identification of the installed Textractor version in code - Added hide_table_layout
- Remove amazon-textract-response-parser as a dependency as its use for validating the input schema could add +200 ms of latency in some cases. Textractor-only parsing takes <30ms.
Breaking changes
- Remove
linearize_tableandlinearize_key_valuefromTextLinearizationConfigas both were not used - Remove the
s3_output_pathparameter fromanalyze_expenseas the API does not support outputting to S3
New Contributors
- @mdscruggs made their first contribution in #292
Full Changelog: v1.6.1...v1.7.0