Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve default ollama configuration #32

Closed
wants to merge 8 commits into from

Conversation

olimorris
Copy link
Owner

Hey @cleong14

I'm really keen to improve base Ollama configuration in the plugin.

Have you found any particular models that excel at coding? Would be great to add an Ollama section to the Readme to help fellow users.

@cleong14
Copy link

Hey @olimorris,

Apologies, I just noticed your comment.

First off, I just want to say thank you for the great plugin you've developed here. Also, I'm a fan of a few other projects you've developed such as tmux-pomodoro-plus. Thanks for building such great OSS projects!

Currently, I'm still figuring out my own workflow/preferences, but I can definitely provide some context regarding some of the craziness you may have noticed in my fork.

Finding a model that really excels at coding has proven to be more challenging than I expected. I've tried models that claim to excel at coding or have been fine-tuned for coding tasks, but the results almost always have been poor. This could have been due to a few different reasons though such as incorrect template formatting, bad choice of parameter values, or just a poor prompt on my end.

I've been pretty happy with the dolphin-mistral model though. Happy enough to make it my base model for most, if not all, my modelfiles.

A few reasons why I like using dolphin-mistral:

  1. It is an uncensored model
  2. Results for both code and non-code related tasks have been pretty good; better than any other model I've tried so far anyway
  3. It's been the best fit and most performant model given my limited resources

Additionally, I knew I wanted to use danielmiessler's tool fabric in my workflow. Specifically, I wanted to leverage the System prompts from patterns found in fabric. I've found the dolphin-mistral model and the style of System prompts from the patterns in fabric work pretty well together which is another reason why I've chosen dolphin-mistral as my base model for most of my modelfiles.

Hopefully this helps answer your question. I'm still pretty green and very much still learning what works best but happy to help answer additional questions you might have or share more of my own lessons learned thus far.

@mrjones2014
Copy link
Contributor

In my experience dolphin-mixtral is much much better than dolphin-mistral. If I understand correctly, dolphin-mistral isn't actually fully uncensored but dolphin-mixtral is.

@cleong14
Copy link

In my experience dolphin-mixtral is much much better than dolphin-mistral. If I understand correctly, dolphin-mistral isn't actually fully uncensored but dolphin-mixtral is.

dolphin-mixtral is actually the model I wanted to use but due to a lack of resources needed to run a model of that size, I ended up going with dolphin-mistral.

@olimorris
Copy link
Owner Author

That would explain why dolphin-mixtral was sooooo slow on my M1 Mac.

I'm conscious that because I don't use Ollama, the defaults may not be that useful for anyone that's trying it out the plugin so completely open to any change and a section in the README about getting started with Ollama and Code Companion.

@mrjones2014
Copy link
Contributor

First usage can be quite slow since dolphin-mixtral is 26 GB lol but once downloaded it shouldn't really be that slow.

@cleong14
Copy link

@olimorris

That would explain why dolphin-mixtral was sooooo slow on my M1 Mac.

lol ya I had a similar reaction too when I first tried dolphin-mixtral on my M2 Mac.

I'm conscious that because I don't use Ollama, the defaults may not be that useful for anyone that's trying it out the plugin so completely open to any change and a section in the README about getting started with Ollama and Code Companion.

I'd say definitely do not use the changes I've been pushing my to forked main branch. Initially, I intended to fork just to fix the issues I had on my end. I figured these issues were specific to me and a result of trying to do things that were just outside of the expected default plugin behavior.

That being said if you're looking to simply improve the Ollama adapter defaults to something more sensible, I don't mind making those updates then pushing those changes either here or to a clean branch with a new PR.


@mrjones2014

First usage can be quite slow since dolphin-mixtral is 26 GB lol but once downloaded it shouldn't really be that slow.

Do you typically start Ollama then leave the same instance running? What are you running Ollama on and what type of impact on your resources are you seeing when you run dolphin-mixtral?

I agree that the performance differences are quite noticeable between cold vs warm vs hot starts. I find myself typically moving between cold/warm/hot starts somewhat fluidly depending on what I'm doing which also affects how I sometimes choose to start Ollama. E.g. Sometimes I'll start and run Ollama directly, sometimes I'll start it from within Neovim or a different tool, etc.

I do want to find a place for dolphin-mixtral in my workflow though. I'll probably give it another go and see if I can figure out a way to dampen the impact on performance so it at least "feel" slow or like I am running a 26 GB model locally. lol

@cleong14
Copy link

I just tried to run dolphin-mixtral again and it was painfully slow. My resources immediately spike and I can barely get passed the first run.

Alternatively, I did try dolphin-mistral:7b-v2.6-dpo-laser-q5_K_M (Q5_K_M supposed use case is 'large, very low quality loss - recommended') and it appears to be yielding better results compared to dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M (Q4_K_M supposed use case is 'medium, balanced quality - recommended').

@olimorris
Copy link
Owner Author

olimorris commented Mar 30, 2024

Are there any Ollama models that have been fine-tuned for specific languages or use cases?

I'm thinking a way to programmatically switch models based on a buffer type or set of conditions could be useful. So if I'm working in a Ruby file it would load a Ruby/Rails specific model.

Edit: I've started exploring this with OpenAI Assistants. It's been cool to send it specific knowledge for Neovim, Tree-sitter etc. But...the implementation is so unique that I can't workout how to implement it in the product neatly.

@mrjones2014
Copy link
Contributor

Do you typically start Ollama then leave the same instance running? What are you running Ollama on and what type of impact on your resources are you seeing when you run dolphin-mixtral?

I run it as a background service on NixOS so it’s always running. If you’re not using it, it doesn’t impact performance. I haven’t noticed too much usage when running dolphin-mixtral, but maybe that’s just my hardware. I’m running it on a RTX-2080 Ti and a M2 Max when I’m on macOS.

But actually I plan to (just haven’t gotten around to it yet) run Ollama as a service on my home server where I run Jellyfin and connect to it remotely. It’s quite nice with open-webui which provides a ChatGPT-like web UI for ollama.

@lazymaniac
Copy link
Contributor

For ollama models I like deepseek-coder (7B model) and nous-hermes2 (10B model) which are producing very decent results for me compared to others, or opencodeinterpreter (7, 13, 33B model) - fine-tuned deepseek-coder but very fresh and still under tests

@lazymaniac
Copy link
Contributor

Sorry for double post, but take a look: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard

@olimorris olimorris closed this Apr 1, 2024
@olimorris
Copy link
Owner Author

Thanks all. I'll move this to #38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants