Skip to content

Commit edc2cfb

Browse files
authored
Add multimodal to possible tests (#1382)
* Update multimodal.md Complete markup for testing * Update run-docs Add ability to run on docs/multimodal.md * Update run-readme-pr.yml
1 parent 826c0c6 commit edc2cfb

File tree

3 files changed

+70
-1
lines changed

3 files changed

+70
-1
lines changed

.ci/scripts/run-docs

+20
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,23 @@ if [ "$1" == "evaluation" ]; then
9191
echo "*******************************************"
9292
bash -x ./run-evaluation.sh
9393
fi
94+
95+
if [ "$1" == "multimodal" ]; then
96+
97+
# Expecting that this might fail this test as-is, because
98+
# it's the first on-pr test depending on githib secrets for access with HF token access
99+
100+
echo "::group::Create script to run multimodal"
101+
python3 torchchat/utils/scripts/updown.py --file docs/multimodal.md > ./run-multimodal.sh
102+
# for good measure, if something happened to updown processor,
103+
# and it did not error out, fail with an exit 1
104+
echo "exit 1" >> ./run-multimodal.sh
105+
echo "::endgroup::"
106+
107+
echo "::group::Run multimodal"
108+
echo "*******************************************"
109+
cat ./run-multimodal.sh
110+
echo "*******************************************"
111+
bash -x ./run-multimodal.sh
112+
echo "::endgroup::"
113+
fi

.github/workflows/run-readme-pr.yml

+44-1
Original file line numberDiff line numberDiff line change
@@ -243,4 +243,47 @@ jobs:
243243
echo "::group::Completion"
244244
echo "tests complete"
245245
echo "*******************************************"
246-
echo "::endgroup::"
246+
echo "::endgroup::"
247+
248+
test-multimodal-any:
249+
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
250+
with:
251+
runner: linux.g5.4xlarge.nvidia.gpu
252+
gpu-arch-type: cuda
253+
gpu-arch-version: "12.1"
254+
timeout: 60
255+
script: |
256+
echo "::group::Print machine info"
257+
uname -a
258+
echo "::endgroup::"
259+
260+
echo "::group::Install newer objcopy that supports --set-section-alignment"
261+
yum install -y devtoolset-10-binutils
262+
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
263+
echo "::endgroup::"
264+
265+
.ci/scripts/run-docs multimodal
266+
267+
echo "::group::Completion"
268+
echo "tests complete"
269+
echo "*******************************************"
270+
echo "::endgroup::"
271+
272+
test-multimodal-cpu:
273+
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
274+
with:
275+
runner: linux.g5.4xlarge.nvidia.gpu
276+
gpu-arch-type: cuda
277+
gpu-arch-version: "12.1"
278+
timeout: 60
279+
script: |
280+
echo "::group::Print machine info"
281+
uname -a
282+
echo "::endgroup::"
283+
284+
echo "::group::Install newer objcopy that supports --set-section-alignment"
285+
yum install -y devtoolset-10-binutils
286+
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
287+
echo "::endgroup::"
288+
289+
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs multimodal

docs/multimodal.md

+6
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,11 @@ This page goes over the different commands you can run with LLama 3.2 11B Vision
1414

1515
While we strongly encourage you to use the Hugging Face checkpoint (which is the default for torchchat when utilizing the commands with the argument `llama3.2-11B`), we also provide support for manually providing the checkpoint. This can be done by replacing the `llama3.2-11B` argument in the commands below with the following:
1616

17+
[skip default]: begin
1718
```
1819
--checkpoint-path <file.pth> --tokenizer-path <tokenizer.model> --params-path torchchat/model_params/Llama-3.2-11B-Vision.json
1920
```
21+
[skip default]: end
2022

2123
## Generation
2224
This generates text output based on a text prompt and (optional) image prompt.
@@ -48,6 +50,7 @@ Setting `stream` to "true" in the request emits a response in chunks. If `stream
4850

4951
**Example Input + Output**
5052

53+
[skip default]: begin
5154
```
5255
curl http://127.0.0.1:5000/v1/chat/completions \
5356
-H "Content-Type: application/json" \
@@ -75,6 +78,7 @@ curl http://127.0.0.1:5000/v1/chat/completions \
7578
```
7679
{"id": "chatcmpl-cb7b39af-a22e-4f71-94a8-17753fa0d00c", "choices": [{"message": {"role": "assistant", "content": "The image depicts a simple black and white cartoon-style drawing of an animal face. It features a profile view, complete with two ears, expressive eyes, and a partial snout. The animal looks to the left, with its eye and mouth implied, suggesting that the drawn face might belong to a rabbit, dog, or pig. The graphic face has a bold black outline and a smaller, solid black nose. A small circle, forming part of the face, has a white background with two black quirkly short and long curved lines forming an outline of what was likely a mouth, complete with two teeth. The presence of the curve lines give the impression that the animal is smiling or speaking. Grey and black shadows behind the right ear and mouth suggest that this face is looking left and upwards. Given the prominent outline of the head and the outline of the nose, it appears that the depicted face is most likely from the side profile of a pig, although the ears make it seem like a dog and the shape of the nose makes it seem like a rabbit. Overall, it seems that this image, possibly part of a character illustration, is conveying a playful or expressive mood through its design and positioning."}, "finish_reason": "stop"}], "created": 1727487574, "model": "llama3.2", "system_fingerprint": "cpu_torch.float16", "object": "chat.completion"}%
7780
```
81+
[skip default]: end
7882

7983
</details>
8084

@@ -90,6 +94,8 @@ First, follow the steps in the Server section above to start a local server. The
9094
streamlit run torchchat/usages/browser.py
9195
```
9296

97+
[skip default]: end
98+
9399
---
94100

95101
# Future Work

0 commit comments

Comments
 (0)