Skip to content

Commit 21c3b00

Browse files
committed
2 parents 18d8474 + a4b0e93 commit 21c3b00

File tree

1 file changed

+13
-14
lines changed

1 file changed

+13
-14
lines changed

README.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1+
ome
12
<h1 align="center">Self-Operating Computer Framework</h1>
23

34
<p align="center">
45
<strong>A framework to enable multimodal models to operate a computer.</strong>
56
</p>
67
<p align="center">
7-
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.
8+
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Self-Operating Computer was the first project to use a VLM to operate a computer.
89
</p>
910

1011
<div align="center">
@@ -19,19 +20,10 @@
1920

2021
## Key Features
2122
- **Compatibility**: Designed for various multimodal models.
22-
- **Integration**: Currently integrated with **GPT-4o, Gemini Pro Vision, Claude 3 and LLaVa.**
23+
- **Integration**: Currently integrated with **GPT-4o, o1, Gemini Pro Vision, Claude 3 and LLaVa.**
2324
- **Future Plans**: Support for additional models.
2425

25-
## Ongoing Development
26-
At [HyperwriteAI](https://www.hyperwriteai.com/), we are developing Agent-1-Vision a multimodal model with more accurate click location predictions.
27-
28-
## Agent-1-Vision Model API Access
29-
We will soon be offering API access to our Agent-1-Vision model.
30-
31-
If you're interested in gaining access to this API, sign up [here](https://othersideai.typeform.com/to/FszaJ1k8?typeform-source=www.hyperwriteai.com).
32-
3326
## Demo
34-
3527
https://github.com/OthersideAI/self-operating-computer/assets/42594239/9e8abc96-c76a-46fb-9b13-03678b3c67e0
3628

3729

@@ -60,10 +52,17 @@ operate
6052

6153
## Using `operate` Modes
6254

63-
### Multimodal Models `-m`
64-
An additional model is now compatible with the Self Operating Computer Framework. Try Google's `gemini-pro-vision` by following the instructions below.
55+
#### OpenAI models
56+
57+
The default model for the project is gpt-4o which you can use by simply typing `operate`. To try running OpenAI's new `o1` model, use the command below.
6558

66-
Start `operate` with the Gemini model
59+
```
60+
operate -m o1-with-ocr
61+
```
62+
63+
64+
### Multimodal Models `-m`
65+
Try Google's `gemini-pro-vision` by following the instructions below. Start `operate` with the Gemini model
6766
```
6867
operate -m gemini-pro-vision
6968
```

0 commit comments

Comments
 (0)