Skip to content

s1 swift微调效果极差 #24

@Jack-ctrl6

Description

@Jack-ctrl6
export NPROC_PER_NODE=2
export CUDA_VISIBLE_DEVICES=0,1
export MASTER_PORT=39312
export PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True'
export MAX_NUM=1

GPUS=2
BATCH_SIZE=8
PER_DEVICE_BATCH_SIZE=4
GRADIENT_ACC=$((BATCH_SIZE / PER_DEVICE_BATCH_SIZE / GPUS))
LR=0.00001
EPOCHS=20.0
SAVE_STEPS=250
MAX_LEN=8192
OUTPUR_DIR='outputmini'
DATASET_DIR='train.jsonl'
model_type='interns1'
model_path='Shanghai_AI_Laboratory/Intern-S1-mini'
dataset=''

swift sft \
    --model $model_path \
    --train_type lora \
    --dataset $DATASET_DIR \
    --freeze_vit true \
    --freeze_aligner true \
    --freeze_llm false \
    --learning_rate 1e-4 \
    --lora_rank 16 \
    --lora_alpha 32 \
    --torch_dtype bfloat16 \
    --num_train_epochs $EPOCHS \
    --gradient_checkpointing true \
    --per_device_train_batch_size $PER_DEVICE_BATCH_SIZE \
    --per_device_eval_batch_size 1 \
    --learning_rate $LR \
    --gradient_accumulation_steps $GRADIENT_ACC \
    --save_strategy steps \
    --save_steps $SAVE_STEPS \
    --split_dataset_ratio 0.01 \
    --eval_strategy steps \
    --eval_steps $SAVE_STEPS \
    --save_total_limit 100 \
    --logging_steps 1 \
    --max_length $MAX_LEN \
    --output_dir $OUTPUR_DIR \
    --warmup_ratio 0.1 \
    --dataloader_num_workers 32 \
    --dataset_num_proc 16 \
    --deepspeed zero3 \
    --report_to tensorboard \
    --use_liger_kernel true \
    --attn_impl flash_attn \
    --truncation_strategy delete \
    2>&1 | tee ${OUTPUR_DIR}/training_log.txt

lora微调s1后效果极差,连简单的ocr识别都错了

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions