Skip to content

Version 1.7.0

Choose a tag to compare

@Belval Belval released this 31 Jan 00:13
· 193 commits to master since this release

What's Changed

  • Loosen XlsxWriter version constraints by @mdscruggs in #292
  • Rework the linearization heuristic to ensure that no words are missing or duplicated
  • Fix KeyValues being assigned twice on overlapping table cells, going forward KVs inside a tables are ignored (table structure takes precedence)
  • Hardens parser code against missing children in layouts or KeyValues with missing keys
  • Fix markdown tables not having header rows when one of the cell is empty
  • Add support for Python 3.11 and 3.12 in the GitHub action workflows
  • Add textractor.__version__ to allow easier identification of the installed Textractor version in code
  • Added hide_table_layout
  • Remove amazon-textract-response-parser as a dependency as its use for validating the input schema could add +200 ms of latency in some cases. Textractor-only parsing takes <30ms.

Breaking changes

  • Remove linearize_table and linearize_key_value from TextLinearizationConfig as both were not used
  • Remove the s3_output_path parameter from analyze_expense as the API does not support outputting to S3

New Contributors

Full Changelog: v1.6.1...v1.7.0