Releases: VikParuchuri/surya
Releases · VikParuchuri/surya
Minor bugfixes
- Small bugfix after the table recognition release
Table recognition model release!
- Add a new table recognition model that detects rows/columns and cells
- Add benchmarks for accuracy and speed (seems to be very accurate wrt to current state of the art open model)
- Improve memory efficiency of layout and text detection (hopefully no more memory leaks)
- Improve resolution handling for layout/text detection/ocr, which should improve accuracy quite a bit
OCR v2
A new version of the OCR model with a custom architecture.
- 20% faster
- Automatic language detection, with support for optional language hints
- Better accuracy on old/noisy documents
- Basic english handwriting support (to be improved soon)
Faster text detection + layout
Switched model architecture for the text detection and layout models:
- 30% faster on GPU
- 4x faster on CPU
- 12x faster on MPS (M series macs)
Accuracy should be about the same, or slightly better, from my benchmarks.
v0.4.14: Merge pull request #141 from VikParuchuri/dev
New transformers version added a new kwarg to donut embeddings. This now handles and ignores that kwarg, and also slightly future-proofs in case this happens again.
Minor bugfixes
- Fix rotation and copy bugs
Fix image bugs
- Fix bugs with RGBA images
- Fix assert bug
- Add back in thumbnail method for resizing
- Slightly optimize segformer code
Change image resize
- Image resize from cv2 to PIL - cv2 caused benchmark regressions
OCR speedups
- Speed up base OCR model ~15-20%, and reduce memory usage by ~25% (can do higher batch sizes)
- Add static cache for compilation - torch.compile will result in another 15% speedup
- Other optimizations, like faster image resizing
- Bugfixes, like enabling different length language inputs for OCR (batching different docs with different languages together)
Processor improvements
- Remove unneeded format conversions
- Fix bug in OCR, where only one color channel was used for OCR - results should be better now
- Speed up layout/text detection a bit