Skip to content

Commit 0d9796c

Browse files
committed
chore: formatted README
Gets rid of a bunch of linting errors
1 parent e21e451 commit 0d9796c

File tree

1 file changed

+41
-14
lines changed

1 file changed

+41
-14
lines changed

README.md

+41-14
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<strong>A framework to enable multimodal models to operate a computer.</strong>
55
</p>
66
<p align="center">
7-
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.
7+
Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.
88
</p>
99

1010
<div align="center">
@@ -16,35 +16,40 @@
1616
**This model is currently experiencing an outage so the self-operating computer may not work as expected.**
1717
-->
1818

19-
2019
## Key Features
20+
2121
- **Compatibility**: Designed for various multimodal models.
2222
- **Integration**: Currently integrated with **GPT-4v** as the default model, with extended support for Gemini Pro Vision.
2323
- **Future Plans**: Support for additional models.
2424

2525
## Ongoing Development
26+
2627
At [HyperwriteAI](https://www.hyperwriteai.com/), we are developing Agent-1-Vision a multimodal model with more accurate click location predictions.
2728

2829
## Agent-1-Vision Model API Access
30+
2931
We will soon be offering API access to our Agent-1-Vision model.
3032

3133
If you're interested in gaining access to this API, sign up [here](https://othersideai.typeform.com/to/FszaJ1k8?typeform-source=www.hyperwriteai.com).
3234

3335
## Demo
3436

35-
https://github.com/OthersideAI/self-operating-computer/assets/42594239/9e8abc96-c76a-46fb-9b13-03678b3c67e0
36-
37+
<https://github.com/OthersideAI/self-operating-computer/assets/42594239/9e8abc96-c76a-46fb-9b13-03678b3c67e0>
3738

3839
## Run `Self-Operating Computer`
3940

4041
1. **Install the project**
42+
4143
```
4244
pip install self-operating-computer
4345
```
46+
4447
2. **Run the project**
48+
4549
```
4650
operate
4751
```
52+
4853
3. **Enter your OpenAI Key**: If you don't have one, you can obtain an OpenAI key [here](https://platform.openai.com/account/api-keys)
4954

5055
<div align="center">
@@ -61,16 +66,18 @@ operate
6166
### Alternatively installation with `.sh`
6267

6368
1. **Clone the repo** to a directory on your computer:
69+
6470
```
6571
git clone https://github.com/OthersideAI/self-operating-computer.git
6672
```
73+
6774
2. **Cd into directory**:
6875

6976
```
7077
cd self-operating-computer
7178
```
7279

73-
3. **Run the installation script**:
80+
3. **Run the installation script**:
7481

7582
```
7683
./run.sh
@@ -79,23 +86,27 @@ cd self-operating-computer
7986
## Using `operate` Modes
8087

8188
### Multimodal Models `-m`
82-
An additional model is now compatible with the Self Operating Computer Framework. Try Google's `gemini-pro-vision` by following the instructions below.
89+
90+
An additional model is now compatible with the Self Operating Computer Framework. Try Google's `gemini-pro-vision` by following the instructions below.
8391

8492
Start `operate` with the Gemini model
93+
8594
```
8695
operate -m gemini-pro-vision
8796
```
8897

8998
**Enter your Google AI Studio API key when terminal prompts you for it** If you don't have one, you can obtain a key [here](https://makersuite.google.com/app/apikey) after setting up your Google AI Studio account. You may also need [authorize credentials for a desktop application](https://ai.google.dev/palm_docs/oauth_quickstart). It took me a bit of time to get it working, if anyone knows a simpler way, please make a PR:
9099

91100
### Optical Character Recognition Mode `-m gpt-4-with-ocr`
92-
The Self-Operating Computer Framework now integrates Optical Character Recognition (OCR) capabilities with the `gpt-4-with-ocr` mode. This mode gives GPT-4 a hash map of clickable elements by coordinates. GPT-4 can decide to `click` elements by text and then the code references the hash map to get the coordinates for that element GPT-4 wanted to click.
93101

94-
Based on recent tests, OCR performs better than `som` and vanilla GPT-4 so we made it the default for the project. To use the OCR mode you can simply write:
102+
The Self-Operating Computer Framework now integrates Optical Character Recognition (OCR) capabilities with the `gpt-4-with-ocr` mode. This mode gives GPT-4 a hash map of clickable elements by coordinates. GPT-4 can decide to `click` elements by text and then the code references the hash map to get the coordinates for that element GPT-4 wanted to click.
103+
104+
Based on recent tests, OCR performs better than `som` and vanilla GPT-4 so we made it the default for the project. To use the OCR mode you can simply write:
95105

96-
`operate` or `operate -m gpt-4-with-ocr` will also work.
106+
`operate` or `operate -m gpt-4-with-ocr` will also work.
97107

98108
### Set-of-Mark Prompting `-m gpt-4-with-som`
109+
99110
The Self-Operating Computer Framework now supports Set-of-Mark (SoM) Prompting with the `gpt-4-with-som` command. This new visual prompting method enhances the visual grounding capabilities of large multimodal models.
100111

101112
Learn more about SoM Prompting in the detailed arXiv paper: [here](https://arxiv.org/abs/2310.11441).
@@ -109,56 +120,72 @@ operate -m gpt-4-with-som
109120
```
110121

111122
### Voice Mode `--voice`
112-
The framework supports voice inputs for the objective. Try voice by following the instructions below.
123+
124+
The framework supports voice inputs for the objective. Try voice by following the instructions below.
113125
**Clone the repo** to a directory on your computer:
126+
114127
```
115128
git clone https://github.com/OthersideAI/self-operating-computer.git
116129
```
130+
117131
**Cd into directory**:
132+
118133
```
119134
cd self-operating-computer
120135
```
136+
121137
Install the additional `requirements-audio.txt`
138+
122139
```
123140
pip install -r requirements-audio.txt
124141
```
142+
125143
**Install device requirements**
126144
For mac users:
145+
127146
```
128147
brew install portaudio
129148
```
149+
130150
For Linux users:
151+
131152
```
132153
sudo apt install portaudio19-dev python3-pyaudio
133154
```
155+
134156
Run with voice mode
157+
135158
```
136159
operate --voice
137160
```
138161

139-
## Contributions are Welcomed!:
162+
## Contributions are Welcomed
140163

141164
If you want to contribute yourself, see [CONTRIBUTING.md](https://github.com/OthersideAI/self-operating-computer/blob/main/CONTRIBUTING.md).
142165

143166
## Feedback
144167

145-
For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter.
168+
For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter.
146169

147170
## Join Our Discord Community
148171

149-
For real-time discussions and community support, join our Discord server.
172+
For real-time discussions and community support, join our Discord server.
173+
150174
- If you're already a member, join the discussion in [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
151175
- If you're new, first [join our Discord Server](https://discord.gg/YqaKtyBEzM) and then navigate to the [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
152176

153177
## Follow HyperWriteAI for More Updates
154178

155179
Stay updated with the latest developments:
180+
156181
- Follow HyperWriteAI on [Twitter](https://twitter.com/HyperWriteAI).
157182
- Follow HyperWriteAI on [LinkedIn](https://www.linkedin.com/company/othersideai/).
158183

159184
## Compatibility
185+
160186
- This project is compatible with Mac OS, Windows, and Linux (with X server installed).
161187

162188
## OpenAI Rate Limiting Note
163-
The ```gpt-4-vision-preview``` model is required. To unlock access to this model, your account needs to spend at least \$5 in API credits. Pre-paying for these credits will unlock access if you haven't already spent the minimum \$5.
189+
190+
The ```gpt-4-vision-preview``` model is required. To unlock access to this model, your account needs to spend at least \$5 in API credits. Pre-paying for these credits will unlock access if you haven't already spent the minimum \$5.
164191
Learn more **[here](https://platform.openai.com/docs/guides/rate-limits?context=tier-one)**

0 commit comments

Comments
 (0)