Skip to content

Commit 8ffb595

Browse files
committed
fix: by default, try it in ollama
1 parent 77f3755 commit 8ffb595

File tree

2 files changed

+12
-18
lines changed

2 files changed

+12
-18
lines changed

README.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -76,28 +76,28 @@ Use Claude 3 with Vision to see how it stacks up to GPT-4-Vision at operating a
7676
operate -m claude-3
7777
```
7878

79-
#### Try LLaVa Hosted Through Ollama `-m llava`
80-
If you wish to experiment with the Self-Operating Computer Framework using LLaVA on your own machine, you can with Ollama!
79+
#### Try a model Hosted Through Ollama `-m llama3.2-vision`
80+
If you wish to experiment with the Self-Operating Computer Framework using e.g. LLaVA on your own machine, you can with Ollama!
8181
*Note: Ollama currently only supports MacOS and Linux. Windows now in Preview*
8282

8383
First, install Ollama on your machine from https://ollama.ai/download.
8484

85-
Once Ollama is installed, pull the LLaVA model:
85+
Once Ollama is installed, pull the vision model:
8686
```
87-
ollama pull llava
87+
ollama pull llama3.2-vision
8888
```
8989
This will download the model on your machine which takes approximately 5 GB of storage.
9090

91-
When Ollama has finished pulling LLaVA, start the server:
91+
When Ollama has finished pulling llama3.2-vision, start the server:
9292
```
9393
ollama serve
9494
```
9595

96-
That's it! Now start `operate` and select the LLaVA model:
96+
That's it! Now start `operate` and select the model:
9797
```
98-
operate -m llava
98+
operate -m llama3.2-vision
9999
```
100-
**Important:** Error rates when using LLaVA are very high. This is simply intended to be a base to build off of as local multimodal models improve over time.
100+
**Important:** Error rates when using self-hosted models are very high. This is simply intended to be a base to build off of as local multimodal models improve over time.
101101

102102
Learn more about Ollama at its [GitHub Repository](https://www.github.com/ollama/ollama)
103103

operate/models/apis.py

+4-10
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,11 @@ async def get_next_action(model, messages, objective, session_id):
5050
return "coming soon"
5151
if model == "gemini-pro-vision":
5252
return call_gemini_pro_vision(messages, objective), None
53-
if "llava" in model:
54-
operation = call_ollama_llava(messages, model)
55-
return operation, None
5653
if model == "claude-3":
5754
operation = await call_claude_3_with_ocr(messages, objective, model)
5855
return operation, None
59-
raise ModelNotRecognizedException(model)
60-
56+
operation = call_ollama_llava(model, messages)
57+
return operation, None
6158

6259
def call_gpt_4o(messages):
6360
if config.verbose:
@@ -557,10 +554,7 @@ async def call_gpt_4o_labeled(messages, objective, model):
557554
traceback.print_exc()
558555
return call_gpt_4o(messages)
559556

560-
561-
def call_ollama_llava(messages, model):
562-
if model == "":
563-
model = "llava"
557+
def call_ollama_llava(model, messages):
564558
if config.verbose:
565559
print(f"[call_ollama_llava] model {model}")
566560
time.sleep(1)
@@ -635,7 +629,7 @@ def call_ollama_llava(messages, model):
635629
)
636630
if config.verbose:
637631
traceback.print_exc()
638-
return call_ollama_llava(messages, model)
632+
return call_ollama_llava(model, messages)
639633

640634

641635
async def call_claude_3_with_ocr(messages, objective, model):

0 commit comments

Comments
 (0)