diff --git a/ai-quick-actions/multimodel-deployment-tips.md b/ai-quick-actions/multimodel-deployment-tips.md index 08d5d677..eeaea1e1 100644 --- a/ai-quick-actions/multimodel-deployment-tips.md +++ b/ai-quick-actions/multimodel-deployment-tips.md @@ -360,8 +360,13 @@ ads aqua deployment create [OPTIONS] `--models [str]` -The String representation of a JSON array, where each object defines a model’s OCID and the number of GPUs assigned to it. The gpu count should always be a **power of two (e.g., 1, 2, 4, 8)**.
-Example: `'[{"model_id":"", "gpu_count":1},{"model_id":"", "gpu_count":1}]'` for `VM.GPU.A10.2` shape.
+The String representation of a JSON array, where each object defines a model’s OCID, number of GPUs assigned to it.
The gpu count should always be a **power of two (e.g., 1, 2, 4, 8)**. + +Example: `'[{"model_id":"", "gpu_count":1},{"model_id":"", "gpu_count":1}]'` for `VM.GPU.A10.2` shape. + +For deploying embedding models, model_task must be specified. For best practice, model_task should be supplied. (Supported tasks: text_generation, image_text_to_text, code_synthesis, text_embedding) + +Example: `'[{"model_id":"", "gpu_count":1, "model_task": "embedding"},{"model_id":"", "gpu_count":1, "model_task": "image_text_to_text"}]'` for `VM.GPU.A10.2` shape. `--instance_shape [str]` @@ -439,7 +444,8 @@ ads aqua deployment create \ --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \ --models '[{"model_id":"ocid1.log.oc1.iad.", "gpu_count":1}, {"model_id":"ocid1.log.oc1.iad.", "gpu_count":1}]' \ --instance_shape "VM.GPU.A10.2" \ - --display_name "modelDeployment_multmodel_model1_model2" + --display_name "modelDeployment_multmodel_model1_model2" \ + --env_var '{"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions"}' ``` @@ -499,7 +505,8 @@ ads aqua deployment create \ --models '[{"model_id":"ocid1.log.oc1.iad.", "gpu_count":1}, {"model_id":"ocid1.log.oc1.iad.", "gpu_count":1}]' \ --env-var '{"MODEL_DEPLOY_PREDICT_ENDPOINT":"/v1/chat/completions"}' \ --instance_shape "VM.GPU.A10.2" \ - --display_name "modelDeployment_multmodel_model1_model2" + --display_name "modelDeployment_multmodel_model1_model2" \ + --env_var '{"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions"}' ``` @@ -550,7 +557,23 @@ ads aqua deployment create \ "MULTI_MODEL_CONFIG": "{\"models\": [{\"params\": \"--served-model-name mistralai/Mistral-7B-v0.1 --seed 42 --tensor-parallel-size 1 --max-model-len 4096\", \"model_path\": \"service_models/Mistral-7B-v0.1/78814a9/artifact\"}, {\"params\": \"--served-model-name tiiuae/falcon-7b --seed 42 --tensor-parallel-size 1 --trust-remote-code\", \"model_path\": \"service_models/falcon-7b/f779652/artifact\"}]}", "MODEL_DEPLOY_ENABLE_STREAMING": "true", ``` +#### Create MultiModel (1 Embedding Model, 1 LLM) deployment with `/v1/completions` + +Note: will need to pass {"route": "v1/embeddings"} as a header for all inference requests to embedding model + +``` +headers={'route':'/v1/embeddings','Content-Type':'application/json'} +``` +- for /v1/chat/completions, modify "MODEL_DEPLOY_PREDICT_ENDPOINT" +```bash +ads aqua deployment create \ + --container_image_uri "dsmc://odsc-vllm-serving:0.6.4.post1.2" \ + --models '[{"model_id":"ocid1.log.oc1.iad.", "gpu_count":1, "model_task": "embedding"}, {"model_id":"ocid1.log.oc1.iad.", "gpu_count":1, "model_task": "text_generation"}]' \ + --instance_shape "VM.GPU.A10.2" \ + --display_name "modelDeployment_multmodel_model1_model2" \ + --env_var '{"MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions"}' +``` ## Manage MultiModel Deployments @@ -1185,4 +1208,4 @@ For other operations related to **Evaluation**, such as listing evaluations and | mistralai/Mistral-7B-v0.1 | BM.GPU.L40S-NC.4 | 1 | --max-model-len 4096 | | mistralai/Mistral-7B-v0.1 | BM.GPU.L40S-NC.4 | 2 | | | tiiuae/falcon-7b | VM.GPU.A10.2 | 1 | --trust-remote-code | -| tiiuae/falcon-7b | BM.GPU.A10.4 | 1 | --trust-remote-code | \ No newline at end of file +| tiiuae/falcon-7b | BM.GPU.A10.4 | 1 | --trust-remote-code | diff --git a/ai-quick-actions/troubleshooting-tips.md b/ai-quick-actions/troubleshooting-tips.md index 78743502..a8914f7b 100644 --- a/ai-quick-actions/troubleshooting-tips.md +++ b/ai-quick-actions/troubleshooting-tips.md @@ -40,7 +40,7 @@ To successfully debug an issue, always select logging while creating model deplo Once the model deployment is intiated, you can monitor the logs by running on your notebook terminal- -`ads watch --auth resource_principal` +`ads opctl watch --auth resource_principal` To fetch the model deployment ocid - 1. Go to model deployments tab on AI Quick Actions