question regarding training stability

I have a question regarding training stability. I downloaded the complete dataset of Redpajama v1 from Hugging Face and followed the parameter settings from the Llama1 paper for data mixture and model tuning. I trained two model sizes, 1.8B and 7B. Unfortunately, the 7B model experienced a rise in loss after 300 billion tokens, and the 1.8B model showed a similar increase after 250 billion tokens. How can I address this issue of training instability?

![1 8B](https://github.com/openlm-research/open_llama/assets/28682527/bd1d49f3-d6fb-42b3-854c-3989cdc05574)
![7B](https://github.com/openlm-research/open_llama/assets/28682527/72e37fa7-18fa-4fff-85f9-b35d469ec187)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

question regarding training stability #97

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

question regarding training stability #97

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions