Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Phi4 #2197

Merged
merged 51 commits into from
Feb 11, 2025
Merged
Changes from 1 commit
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
1a43259
Add Phi4 support
krammnic Dec 21, 2024
3630908
Add Phi4
krammnic Dec 21, 2024
18f8bc5
fix names
krammnic Jan 11, 2025
e69a77c
More fixes. Able to do forward
krammnic Jan 11, 2025
a94b742
Update torchtune/models/phi4/_tokenizer.py
krammnic Jan 14, 2025
1d03294
Update torchtune/_recipe_registry.py
krammnic Jan 14, 2025
bdf478f
Update torchtune/models/phi4/_model_builders.py
krammnic Jan 14, 2025
78cd1e6
more fixes
Feb 2, 2025
d8b2ea3
nit SPM -> TikToken
Feb 2, 2025
3d55e55
fixed tokenizer + fix model loading problem (credits: ebsmothers)
Feb 8, 2025
7ee22b6
remove useless comments
Feb 8, 2025
e515f06
gpt2 tokenizer
Feb 8, 2025
d1cae68
gpt2 tokenizer
Feb 8, 2025
3c1780d
fixed configs
krammnic Feb 8, 2025
18c0033
fix docstring in tokenizer
krammnic Feb 8, 2025
fc1d2db
fix lint and docstrings
krammnic Feb 8, 2025
99a1ce5
fix lint and docstrings
krammnic Feb 8, 2025
ce626a4
cover gpt2 tokenizer with test
krammnic Feb 8, 2025
e3768ee
fix lint
krammnic Feb 8, 2025
c84c74c
fix phi4tokenizer tests
krammnic Feb 8, 2025
cbc5ca1
fix tests
krammnic Feb 8, 2025
dc64290
Update torchtune/models/phi4/_model_builders.py
krammnic Feb 10, 2025
46bede4
Update torchtune/models/phi4/_model_builders.py
krammnic Feb 10, 2025
cc36700
Update torchtune/modules/tokenizers/_gpt2.py
krammnic Feb 10, 2025
c9a483c
fix eval configs
Feb 10, 2025
c1b6394
remove nnodes from configs
Feb 10, 2025
47dd749
naming fixes
Feb 10, 2025
146cac3
fix lint
Feb 10, 2025
6e50261
fixes
Feb 10, 2025
55d7ae0
fix test
Feb 10, 2025
e7b43d6
phi4 -> phi4_14b
Feb 10, 2025
b4de41d
resolve conflict
Feb 10, 2025
4440768
resolve conflict
Feb 10, 2025
d39e717
update __init__
Feb 10, 2025
54d477d
update __init__
Feb 10, 2025
0be4b8e
update __init__
Feb 10, 2025
ad8562e
Merge branch 'main' into main
ebsmothers Feb 10, 2025
518a769
add GPT2BaseTokenizer in transforms/tokenizers/__init__.py + fix lint
Feb 10, 2025
e29aca6
fix imports
Feb 10, 2025
d533355
fix __init__ and namings
Feb 10, 2025
012f433
swap encode decode
Feb 11, 2025
ebcd1d6
correct eval recipe
Feb 11, 2025
d4435b0
fix docstring
Feb 11, 2025
7f5ccd8
remove useless argument
Feb 11, 2025
36eeaa8
nit: unk token
Feb 11, 2025
af5a824
fixes tokenizer
Feb 11, 2025
2002f50
fix gpt2tokenizer test
Feb 11, 2025
01ac202
fix lora config
Feb 11, 2025
6003044
renamings
Feb 11, 2025
7aea0ca
fix phi4 drop eos + test
Feb 11, 2025
4f38c14
recipe registry
Feb 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
resolve conflict
Mark Obozov authored and Mark Obozov committed Feb 10, 2025
commit 4440768d013dfb7447bf0812d557fbae2f6c978c
18 changes: 16 additions & 2 deletions torchtune/modules/transforms/tokenizers/__init__.py
Original file line number Diff line number Diff line change
@@ -4,6 +4,20 @@
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from torchtune.modules.transforms.tokenizers._gpt2 import GPT2BaseTokenizer
from ._sentencepiece import SentencePieceBaseTokenizer
from ._tiktoken import TikTokenBaseTokenizer
from ._utils import (
BaseTokenizer,
ModelTokenizer,
parse_hf_tokenizer_json,
tokenize_messages_no_special_tokens,
)

__all__ = ["GPT2BaseTokenizer"]
__all__ = [
"SentencePieceBaseTokenizer",
"TikTokenBaseTokenizer",
"ModelTokenizer",
"BaseTokenizer",
"tokenize_messages_no_special_tokens",
"parse_hf_tokenizer_json",
]