Skip to content

Conversation

@danielclough
Copy link
Contributor

Donut, Swin, and BART (models and examples)

This PR adds three interconnected models for document understanding and text generation:

  • BART (candle-transformers/src/models/bart/) - Encoder-decoder transformer for summarization and translation
  • Swin Transformer (candle-transformers/src/models/swin.rs) - Hierarchical vision transformer for image processing
  • Donut (candle-transformers/src/models/donut.rs) - OCR-free document understanding (Swin encoder + BART decoder)

I am submitting the PR together because Donut relies upon Swin (encoder) and BART (decoder).

Each model also comes with examples demonstrating essential features.

Features

Model Capabilities
BART Text summarization, mBART translation, beam search decoding
Swin ImageNet classification (Tiny/Small/Base/Large variants)
Donut Receipt parsing, document VQA, document classification

Key Implementation Details

  • Beam Search: Both single-sequence and batched beam search with efficient KV cache reordering
  • Weight Tying: Proper handling of tied embeddings between encoder/decoder/lm_head
  • Layer Norm Order: Auto-detection of pre-norm (mBART) vs post-norm (BART) architectures
  • Shifted Window Attention: Full Swin v1 implementation with attention masks

- Full encoder-decoder architecture
- Beam search decoding with configurable parameters
- Causal language modeling head
- Support for BART, mBART, and MBart50 variants
- Shifted window multi-head self-attention
- Patch merging for hierarchical feature maps
Copy link
Member

@ivarflakstad ivarflakstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +4 to +6
mBART models use SentencePiece tokenization which isn't directly supported
by the Rust tokenizers crate. This script converts the tokenizer to the
tokenizer.json format that can be loaded by the Rust example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific model you've encountered where they don't provide a tokenizer.json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at: https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt/tree/main

The README.md explains converting the SentencePiece tokenizer first:

# Step 1: Convert tokenizer
cd candle-examples/examples/bart
pip install transformers sentencepiece
python convert_mbart_tokenizer.py --model-id facebook/mbart-large-50-many-to-many-mmt

# Step 2: Run translation (English to French)
cargo run --example bart --release -- \
    --model-id facebook/mbart-large-50-many-to-many-mmt \
    --prompt "Hello, how are you today?" \
    --source-lang en_XX \
    --target-lang fr_XX \
    --sample-len 50

@danielclough
Copy link
Contributor Author

I noticed I forgot to run clippy with -D warnings. 🤦 I'll fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants