Skip to content

v0.6.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@oKatanaaa oKatanaaa released this 02 Mar 21:45
· 11 commits to master since this release
5ca663b

In this update I lay foundation for better compatibility with existing LLM finetuning stack by changing the data schema to be more compliant with OpenAI API. You can now easily export data (new Export as OpenAI dataset option) and use it as is with existing training pipelines.
It also includes various QoL improvements.

Breaking changes

  • The underlying data format has been massively overhauled. Meaning that if you have been collecting data using LIMA-GUI, you won't be able to load it using the newer version. To update your data use python -m lima_gui.update_data script. It takes in a path to a target input file (or folder with multiple files) and a path to a target output file (or folder). Although the script can't handle function calling data. If needed, I can update the script (just make a corresponding issue).
  • When using completion API the chat is formatted in ChatML. That means that you can use completion mode to generate (and steer) partial answers of ChatML compliant models, such as cognitivecomputations/dolphin-2.6-mistral-7b-dpo.
  • transformers library is removed from dependencies. tokenizers are used instead. Sorry for that stupid mistake.

Major changes

  • You can now export your dataset in OpenAI finetuning API compliant format (jsonl file with a lot of {"messages": [...]}). Click File -> Export as OpenAI dataset.
  • Ctrl + S now saves into the last opened file and no longer opens file selection window.
  • LIMA-GUI will track changes and:
    • Ask you to save the data if you haven't done so and trying to close the program.
    • Ask you to save the data if you haven't done so and trying to open another file.
  • All of the prints are replaced with loguru library. All of the calls are being logged. As of now DEBUG level is set by default.

Fixes

  • LIMA-GUI now works with the latest version of openai library.