Reminder
System Info
[Bug] TypeError when saving predictions with HuggingFace Dataset
Description
When running evaluation with a local dataset, saving predictions fails with:
TypeError: argument 'ids': 'list' object cannot be interpreted as an integer
Line 179 in ‘LLaMA-Factory/src/llamafactory/train/sft/trainer.py`:
decoded_inputs = self.processing_class.batch_decode(dataset["input_ids"], skip_special_tokens=False)
Root Cause
It seems that with local datasets, LLaMA-Factory converts them to HuggingFace Dataset format internally. So dataset["input_ids"] returns a Column object instead of a Python list, and batch_decode() expects a list.
Solution
Convert Column to list:
input_ids_list = dataset["input_ids"].to_pylist()
decoded_inputs = self.processing_class.batch_decode(input_ids_list, skip_special_tokens=False)
Question
Is this a bug? If it's a bug, I'd be happy to submit a fix.
Thanks!
Reproduction
Reproduce
- Use a local JSON dataset
- Run evaluation with
"stage: sft
do_predict: true
finetuning_type: lora
adapter_name_or_path:...
"
- Error occurs when saving predictions
Others
No response
Reminder
System Info
[Bug] TypeError when saving predictions with HuggingFace Dataset
Description
When running evaluation with a local dataset, saving predictions fails with:
Line 179 in ‘LLaMA-Factory/src/llamafactory/train/sft/trainer.py`:
Root Cause
It seems that with local datasets, LLaMA-Factory converts them to HuggingFace Dataset format internally. So dataset["input_ids"] returns a Column object instead of a Python list, and batch_decode() expects a list.
Solution
Convert Column to list:
Question
Is this a bug? If it's a bug, I'd be happy to submit a fix.
Thanks!
Reproduction
Reproduce
"stage: sft
do_predict: true
finetuning_type: lora
adapter_name_or_path:...
"
Others
No response