MT-TexTableX is a lightweight multitask learning framework for table-to-text (T2T) generation, designed to improve content selection, structural alignment, and factual consistency. Built on a T5-base encoderβdecoder, MT-TexTableX jointly learns five tasks:
- Table-to-Text Generation
- Content Selection
- Table Reconstruction
- Text-to-Table Generation
- Fact Verification
-
Clone the repository
git clone https://github.com/yourusername/mt-textablex.git cd mt-textablex -
Install dependencies We recommend using Python 3.10+ and a virtual environment.
pip install -r requirements.txt
-
Download a tokenizer You can use the default
t5-basetokenizer or fine-tune your own if needed:from transformers import T5Tokenizer tokenizer = T5Tokenizer.from_pretrained('t5-base') tokenizer.save_pretrained('mtl_t5_tokenizer')
Individual scripts are also available under preprocessing/ for each task if you want fine-grained control:
preprocess_totto_table_to_text.pypreprocess_totto_content_selection.pypreprocess_totto_table_reconstruction.pypreprocess_totto_text_to_table.pypreprocess_tabfact.py
Once preprocessing is complete:
python model/train.py --batch_size 4 --epochs 5 --accumulation_steps 2 --save_dir checkpoints/You can modify task weights and other hyperparameters directly in train.py.
We use the official ToTTo evaluation suite for computing BLEU, PARENT, and BLEURT scores. Our model outputs are compatible with this format.
See the ToTTo GitHub evaluation instructions for more details.
If you use MT-TexTableX or our code in your work, please cite:
@article{mohammadalizadeh2025mttextablex,
title = {MT-TexTableX: a multitask learning approach to table-to-text generation},
author = {Parman Mohammadalizadeh and Leila Safari},
journal = {Expert Systems with Applications},
year = {2025},
issn = {0957-4174},
doi = {10.1016/j.eswa.2025.129060},
url = {https://www.sciencedirect.com/science/article/pii/S0957417425026776}
}
π Pre-proof article
- Outperforms GPTβ3.5, GPTβ4.1, LLaMAβ3, and Phiβ3 on ToTTo
- Achieves BLEU 40.9, PARENT 54.8, BLEURT 0.1465 on ToTTo test set
- Fully supervised, no prompt tuning, no in-context learning
- Jointly optimized multi-task framework using only T5-base
- Human evaluation confirms improvements in fluency and faithfulness