-
-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve default ollama configuration #32
Conversation
Signed-off-by: Chaz Leong <[email protected]>
Signed-off-by: Chaz Leong <[email protected]>
Signed-off-by: Chaz Leong <[email protected]>
Signed-off-by: Chaz Leong <[email protected]>
Signed-off-by: Chaz Leong <[email protected]>
Hey @olimorris, Apologies, I just noticed your comment. First off, I just want to say thank you for the great plugin you've developed here. Also, I'm a fan of a few other projects you've developed such as tmux-pomodoro-plus. Thanks for building such great OSS projects! Currently, I'm still figuring out my own workflow/preferences, but I can definitely provide some context regarding some of the craziness you may have noticed in my fork. Finding a model that really excels at coding has proven to be more challenging than I expected. I've tried models that claim to excel at coding or have been fine-tuned for coding tasks, but the results almost always have been poor. This could have been due to a few different reasons though such as incorrect template formatting, bad choice of parameter values, or just a poor prompt on my end. I've been pretty happy with the dolphin-mistral model though. Happy enough to make it my base model for most, if not all, my modelfiles. A few reasons why I like using dolphin-mistral:
Additionally, I knew I wanted to use danielmiessler's tool fabric in my workflow. Specifically, I wanted to leverage the System prompts from patterns found in fabric. I've found the dolphin-mistral model and the style of System prompts from the patterns in fabric work pretty well together which is another reason why I've chosen dolphin-mistral as my base model for most of my modelfiles. Hopefully this helps answer your question. I'm still pretty green and very much still learning what works best but happy to help answer additional questions you might have or share more of my own lessons learned thus far. |
In my experience dolphin-mixtral is much much better than dolphin-mistral. If I understand correctly, dolphin-mistral isn't actually fully uncensored but dolphin-mixtral is. |
dolphin-mixtral is actually the model I wanted to use but due to a lack of resources needed to run a model of that size, I ended up going with dolphin-mistral. |
That would explain why dolphin-mixtral was sooooo slow on my M1 Mac. I'm conscious that because I don't use Ollama, the defaults may not be that useful for anyone that's trying it out the plugin so completely open to any change and a section in the README about getting started with Ollama and Code Companion. |
First usage can be quite slow since dolphin-mixtral is 26 GB lol but once downloaded it shouldn't really be that slow. |
lol ya I had a similar reaction too when I first tried dolphin-mixtral on my M2 Mac.
I'd say definitely do not use the changes I've been pushing my to forked main branch. Initially, I intended to fork just to fix the issues I had on my end. I figured these issues were specific to me and a result of trying to do things that were just outside of the expected default plugin behavior. That being said if you're looking to simply improve the Ollama adapter defaults to something more sensible, I don't mind making those updates then pushing those changes either here or to a clean branch with a new PR.
Do you typically start Ollama then leave the same instance running? What are you running Ollama on and what type of impact on your resources are you seeing when you run dolphin-mixtral? I agree that the performance differences are quite noticeable between cold vs warm vs hot starts. I find myself typically moving between cold/warm/hot starts somewhat fluidly depending on what I'm doing which also affects how I sometimes choose to start Ollama. E.g. Sometimes I'll start and run Ollama directly, sometimes I'll start it from within Neovim or a different tool, etc. I do want to find a place for dolphin-mixtral in my workflow though. I'll probably give it another go and see if I can figure out a way to dampen the impact on performance so it at least "feel" slow or like I am running a 26 GB model locally. lol |
I just tried to run dolphin-mixtral again and it was painfully slow. My resources immediately spike and I can barely get passed the first run. Alternatively, I did try dolphin-mistral:7b-v2.6-dpo-laser-q5_K_M (Q5_K_M supposed use case is 'large, very low quality loss - recommended') and it appears to be yielding better results compared to dolphin-mistral:7b-v2.6-dpo-laser-q4_K_M (Q4_K_M supposed use case is 'medium, balanced quality - recommended'). |
Are there any Ollama models that have been fine-tuned for specific languages or use cases? I'm thinking a way to programmatically switch models based on a buffer type or set of conditions could be useful. So if I'm working in a Ruby file it would load a Ruby/Rails specific model. Edit: I've started exploring this with OpenAI Assistants. It's been cool to send it specific knowledge for Neovim, Tree-sitter etc. But...the implementation is so unique that I can't workout how to implement it in the product neatly. |
I run it as a background service on NixOS so it’s always running. If you’re not using it, it doesn’t impact performance. I haven’t noticed too much usage when running dolphin-mixtral, but maybe that’s just my hardware. I’m running it on a RTX-2080 Ti and a M2 Max when I’m on macOS. But actually I plan to (just haven’t gotten around to it yet) run Ollama as a service on my home server where I run Jellyfin and connect to it remotely. It’s quite nice with open-webui which provides a ChatGPT-like web UI for ollama. |
For ollama models I like deepseek-coder (7B model) and nous-hermes2 (10B model) which are producing very decent results for me compared to others, or opencodeinterpreter (7, 13, 33B model) - fine-tuned deepseek-coder but very fresh and still under tests |
Sorry for double post, but take a look: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard |
Thanks all. I'll move this to #38 |
Hey @cleong14
I'm really keen to improve base Ollama configuration in the plugin.
Have you found any particular models that excel at coding? Would be great to add an Ollama section to the Readme to help fellow users.