v0.6.0
Pre-release
Pre-release
In this update I lay foundation for better compatibility with existing LLM finetuning stack by changing the data schema to be more compliant with OpenAI API. You can now easily export data (new Export as OpenAI dataset
option) and use it as is with existing training pipelines.
It also includes various QoL improvements.
Breaking changes
- The underlying data format has been massively overhauled. Meaning that if you have been collecting data using LIMA-GUI, you won't be able to load it using the newer version. To update your data use
python -m lima_gui.update_data
script. It takes in a path to a target input file (or folder with multiple files) and a path to a target output file (or folder). Although the script can't handle function calling data. If needed, I can update the script (just make a corresponding issue). - When using
completion API
the chat is formatted inChatML
. That means that you can usecompletion
mode to generate (and steer) partial answers ofChatML
compliant models, such ascognitivecomputations/dolphin-2.6-mistral-7b-dpo
. transformers
library is removed from dependencies.tokenizers
are used instead. Sorry for that stupid mistake.
Major changes
- You can now export your dataset in OpenAI finetuning API compliant format (
jsonl
file with a lot of{"messages": [...]}
). Click File -> Export as OpenAI dataset. Ctrl + S
now saves into the last opened file and no longer opens file selection window.- LIMA-GUI will track changes and:
- Ask you to save the data if you haven't done so and trying to close the program.
- Ask you to save the data if you haven't done so and trying to open another file.
- All of the prints are replaced with
loguru
library. All of the calls are being logged. As of nowDEBUG
level is set by default.
Fixes
- LIMA-GUI now works with the latest version of
openai
library.