Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Phi4 #2197

Merged
merged 51 commits into from
Feb 11, 2025
Merged
Changes from 4 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
1a43259
Add Phi4 support
krammnic Dec 21, 2024
3630908
Add Phi4
krammnic Dec 21, 2024
18f8bc5
fix names
krammnic Jan 11, 2025
e69a77c
More fixes. Able to do forward
krammnic Jan 11, 2025
a94b742
Update torchtune/models/phi4/_tokenizer.py
krammnic Jan 14, 2025
1d03294
Update torchtune/_recipe_registry.py
krammnic Jan 14, 2025
bdf478f
Update torchtune/models/phi4/_model_builders.py
krammnic Jan 14, 2025
78cd1e6
more fixes
Feb 2, 2025
d8b2ea3
nit SPM -> TikToken
Feb 2, 2025
3d55e55
fixed tokenizer + fix model loading problem (credits: ebsmothers)
Feb 8, 2025
7ee22b6
remove useless comments
Feb 8, 2025
e515f06
gpt2 tokenizer
Feb 8, 2025
d1cae68
gpt2 tokenizer
Feb 8, 2025
3c1780d
fixed configs
krammnic Feb 8, 2025
18c0033
fix docstring in tokenizer
krammnic Feb 8, 2025
fc1d2db
fix lint and docstrings
krammnic Feb 8, 2025
99a1ce5
fix lint and docstrings
krammnic Feb 8, 2025
ce626a4
cover gpt2 tokenizer with test
krammnic Feb 8, 2025
e3768ee
fix lint
krammnic Feb 8, 2025
c84c74c
fix phi4tokenizer tests
krammnic Feb 8, 2025
cbc5ca1
fix tests
krammnic Feb 8, 2025
dc64290
Update torchtune/models/phi4/_model_builders.py
krammnic Feb 10, 2025
46bede4
Update torchtune/models/phi4/_model_builders.py
krammnic Feb 10, 2025
cc36700
Update torchtune/modules/tokenizers/_gpt2.py
krammnic Feb 10, 2025
c9a483c
fix eval configs
Feb 10, 2025
c1b6394
remove nnodes from configs
Feb 10, 2025
47dd749
naming fixes
Feb 10, 2025
146cac3
fix lint
Feb 10, 2025
6e50261
fixes
Feb 10, 2025
55d7ae0
fix test
Feb 10, 2025
e7b43d6
phi4 -> phi4_14b
Feb 10, 2025
b4de41d
resolve conflict
Feb 10, 2025
4440768
resolve conflict
Feb 10, 2025
d39e717
update __init__
Feb 10, 2025
54d477d
update __init__
Feb 10, 2025
0be4b8e
update __init__
Feb 10, 2025
ad8562e
Merge branch 'main' into main
ebsmothers Feb 10, 2025
518a769
add GPT2BaseTokenizer in transforms/tokenizers/__init__.py + fix lint
Feb 10, 2025
e29aca6
fix imports
Feb 10, 2025
d533355
fix __init__ and namings
Feb 10, 2025
012f433
swap encode decode
Feb 11, 2025
ebcd1d6
correct eval recipe
Feb 11, 2025
d4435b0
fix docstring
Feb 11, 2025
7f5ccd8
remove useless argument
Feb 11, 2025
36eeaa8
nit: unk token
Feb 11, 2025
af5a824
fixes tokenizer
Feb 11, 2025
2002f50
fix gpt2tokenizer test
Feb 11, 2025
01ac202
fix lora config
Feb 11, 2025
6003044
renamings
Feb 11, 2025
7aea0ca
fix phi4 drop eos + test
Feb 11, 2025
4f38c14
recipe registry
Feb 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions recipes/configs/phi3/evaluation.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

folder is phi3, but args are phi4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping this comment. Please go through both Phi-3 and Phi-4 eval files to make sure they contain the correct model references

Original file line number Diff line number Diff line change
@@ -7,25 +7,25 @@ output_dir: ./ # Not needed

# Model Arguments
model:
_component_: torchtune.models.phi3.phi3_mini
_component_: torchtune.models.phi4.phi4_mini

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Phi-3-mini-4k-instruct
checkpoint_dir: /tmp/phi-4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure this matches the format of other directories

checkpoint_files: [
model-00001-of-00002.safetensors,
model-00002-of-00002.safetensors
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: PHI3_MINI
model_type: PHI4_MINI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still not correct? (Same for L28)

resume_from_checkpoint: False

# Tokenizer
tokenizer:
_component_: torchtune.models.phi3.phi3_mini_tokenizer
path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model
_component_: torchtune.models.phi4.phi4_mini_tokenizer
path: /tmp/phi-4/tokenizer.model
max_seq_len: null

# Environment
44 changes: 44 additions & 0 deletions recipes/configs/phi4/evaluation.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you made a copy from phi3, but made the changes in phi3/evaluation, instead of here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think these two eval files need to be swapped

Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Config for EleutherEvalRecipe in eleuther_eval.py
#
# To launch, run the following command:
# tune run eleuther_eval --config phi3/evaluation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/phi3/phi4


output_dir: ./ # Not needed

# Model Arguments
model:
_component_: torchtune.models.phi3.phi3_mini

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Phi-3-mini-4k-instruct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Phi-3/Phi-4

checkpoint_files: [
model-00001-of-00002.safetensors,
model-00002-of-00002.safetensors
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: PHI3_MINI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/PHI3_MINI/PHI4_MINI

resume_from_checkpoint: False

# Tokenizer
tokenizer:
_component_: torchtune.models.phi3.phi3_mini_tokenizer
path: /tmp/Phi-3-mini-4k-instruct/tokenizer.model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/torchtune.models.phi3.phi3_mini_tokenizer/torchtune.models.phi4.phi4_mini_tokenizer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Phi-3/Phi-4

max_seq_len: null

# Environment
device: cuda
dtype: bf16
seed: 1234 # It is not recommended to change this seed, b/c it matches EleutherAI's default seed

# EleutherAI specific eval args
tasks: ["truthfulqa_mc2"]
limit: null
max_seq_length: 4096
batch_size: 8
enable_kv_cache: True

# Quantization specific args
quantizer: null
109 changes: 109 additions & 0 deletions recipes/configs/phi4/mini_full.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n00b question: Is "mini" the right nomenclature? Or do they have a family of model sizes like phi4_7b, phi4_13B, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty arguing moment, in the description of Phi4 it is "mini model" in real life it is not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should drop the mini and just stick to model sizes, since its more informative. @ebsmothers , any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems like they are mostly using model sizes instead of "mini" in public docs, so maybe let's go with 14B instead of mini?

Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Config for multi-device full finetuning in full_finetune_distributed.py
# using a Phi4 16K Instruct
#
# This config assumes that you've run the following command before launching
# this run:
# tune download microsoft/phi-4 --output-dir /tmp/phi-4 --hf-token <HF_TOKEN>
#
# Run this config on 4 GPUs using the following:
# tune run --nproc_per_node 4 full_finetune_distributed --config phi4/mini_full
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run --nproc_per_node 4 full_finetune_distributed --config phi4/mini_full checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works best when the model is being fine-tuned on 2+ GPUs.
# Single device full finetuning requires more memory optimizations. It's
# best to use mini_low_memory.yaml for those cases

output_dir: /tmp/torchtune/phi4_mini/full # /tmp may be deleted by your system. Change it to your preference.

# Model arguments
model:
_component_: torchtune.models.phi4.phi4_mini

# Tokenizer
tokenizer:
_component_: torchtune.models.phi4.phi4_mini_tokenizer
path: /tmp/phi-4/tokenizer.model
max_seq_len: null

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/phi-4
checkpoint_files: [
model-00001-of-00006.safetensors,
model-00002-of-00006.safetensors,
model-00003-of-00006.safetensors,
model-00004-of-00006.safetensors,
model-00005-of-00006.safetensors,
model-00006-of-00006.safetensors,
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: PHI3_MINI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n00b question: Are there are differences between PHI3 and PHI4? Even if there arent, should we update the model_type for clarity? I believe that this is used in the checkpointer to map the HF format to torchtune format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to tech report there is difference in tokenizer and in attention in such way that it is not touching us. But some observations that I made upper might get us to different conclusion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am tempted to say that even if PHI3_MINI == PHI4_MINI, every model should have its own nomenclature, so there is less cognitive load for the user. @ebsmothers , what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I would stick with the precedent we've set, which is to only use a new model type when the arch changes. This is what we do for the Llama family, where we have LLAMA3, LLAMA3_2, but not LLAMA3_1 or LLAMA3_3. I do agree with your point though @felipemello1 -- we can consider the renaming in a follow-up (at that time I would also probably drop the MINI from Phi model names too)

resume_from_checkpoint: False

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
packed: False # True increases speed
seed: null
shuffle: True

# Fine-tuning arguments
epochs: 1
max_steps_per_epoch: null
batch_size: 2
gradient_accumulation_steps: 8 # Use to increase effective batch size
optimizer:
_component_: torch.optim.AdamW
fused: True
lr: 5e-6
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
compile: False # torch.compile the model + loss, True increases speed + decreases memory
optimizer_in_bwd: False # True saves memory. Requires gradient_accumulation_steps=1

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True # True reduces memory
enable_activation_offloading: False # True reduces memory
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}/logs
log_every_n_steps: 1
log_peak_memory_stats: True


# Profiler (disabled)
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 5
warmup_steps: 3
active_steps: 2
num_cycles: 1
110 changes: 110 additions & 0 deletions recipes/configs/phi4/mini_full_low_memory.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this was already the naming convention for Phi3, but we should probably add "single_device" to the config name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phi3 uses low_memory. Personally I would like to change full_low_memory -> full_single_device across the board, but again would prioritize consistency with Phi3 in this PR.

Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Config for single device full finetuning in full_finetune_single_device.py
# using a Phi4 16K Instruct
#
# This config assumes that you've run the following command before launching
# this run:
# tune download microsoft/phi-4 --output-dir /tmp/phi-4 --hf-token <HF_TOKEN>
#
# The default config uses an optimizer from bitsandbytes. If you do not have it installed,
# you can install it with
# pip install bitsandbytes
#
# To launch on a single device, run the following command from root:
# tune run full_finetune_single_device --config phi4/mini_full_low_memory
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run full_finetune_single_device --config phi4/mini_full_low_memory checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

output_dir: /tmp/torchtune/phi4_mini/full_low_memory # /tmp may be deleted by your system. Change it to your preference.

# Model arguments
model:
_component_: torchtune.models.phi4.phi4_mini

# Tokenizer
tokenizer:
_component_: torchtune.models.phi4.phi4_mini_tokenizer
path: /tmp/phi-4/tokenizer.model
max_seq_len: null

# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/phi-4
checkpoint_files: [
model-00001-of-00006.safetensors,
model-00002-of-00006.safetensors,
model-00003-of-00006.safetensors,
model-00004-of-00006.safetensors,
model-00005-of-00006.safetensors,
model-00006-of-00006.safetensors,
]
recipe_checkpoint: null
output_dir: ${output_dir}
model_type: PHI3_MINI
resume_from_checkpoint: False

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
packed: False # True increases speed
seed: null
shuffle: True

# Fine-tuning arguments
epochs: 1
max_steps_per_epoch: null
batch_size: 2
gradient_accumulation_steps: 1 # Use to increase effective batch size
optimizer:
_component_: bitsandbytes.optim.PagedAdamW
lr: 5e-6
optimizer_in_bwd: True # True saves memory. Requires gradient_accumulation_steps=1
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
compile: False # torch.compile the model + loss, True increases speed + decreases memory

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True # True reduces memory
enable_activation_offloading: True # True reduces memory
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}/logs
log_every_n_steps: 1
log_peak_memory_stats: True


# Profiler (disabled)
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 5
warmup_steps: 3
active_steps: 2
num_cycles: 1
Loading