Make healthcheck dependent on models loaded

Good morning,

I'm using the MMS server to host a single model in AWS Sagemaker.  The model is loaded on MMS startup with `sagemaker_inference.model_server.start_model_server` with a custom handler_service. 

The problem is that loading the model can take significant amount of time (up to 20 minutes). In the time that model is being loaded MMS server `/ping` endpoint returns `200 OK` responses, causing sagemaker endpoint to assume the model is already loaded.

Furthermore, I tested MMS locally to see how the server behaves in the model loading stage and it looks like the model details endpoint returns model worker status as `READY` before the worker is finished loading the model.

In the logs below you can see that workers start and immediately there's:
```
2024-04-23T11:01:33,313 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_MODEL_LOADED
```

after that there's
```
2024-04-23T11:01:33,317 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED
...
2024-04-23T11:01:33,322 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED
```

and only finally:
```
2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED
...
2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED
```

In the meantime `/ping` returns `200 OK` and `localhost:8081/models/test` returns:
```json
{
  "modelName": "test",
  "modelUrl": "test.mar",
  "runtime": "python3",
  "minWorkers": 2,
  "maxWorkers": 2,
  "batchSize": 1,
  "maxBatchDelay": 100,
  "loadedAtStartup": true,
  "workers": [
    {
      "id": "9000-57e7cf49",
      "startTime": "2024-04-23T11:01:33.316Z",
      "status": "READY",
      "gpu": false,
      "memoryUsage": 13099008
    },
    {
      "id": "9000-be19b2af",
      "startTime": "2024-04-23T11:01:33.317Z",
      "status": "READY",
      "gpu": false,
      "memoryUsage": 13434880
    }
  ]
}
```
<details>

<summary>
All logs:
</summary>



```log

Removing orphan pid file.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-04-23T11:01:33,126 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /home/ec2-user/SageMaker/notebooks/environment/user/user-env-base-3-10/user-env-base-3-10/pex_venv/lib/python3.10/site-packages
Current directory: /tmp/models/test
Temp directory: /tmp
Number of GPUs: 0
Number of CPUs: 2
Max heap size: 978 M
Python executable: /home/ec2-user/SageMaker/notebooks/environment/user/user-env-base-3-10/user-env-base-3-10/pex_venv/bin/python3.10
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Model Store: /tmp/models/test/export
Initial Models: test.mar
Log dir: null
Metrics dir: null
Netty threads: 0
Netty client threads: 0
Default workers per model: 2
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2024-04-23T11:01:33,133 [INFO ] main com.amazonaws.ml.mms.ModelServer - Loading initial models: test.mar preload_model: false
2024-04-23T11:01:33,189 [INFO ] main com.amazonaws.ml.mms.archive.ModelArchive - model folder already exists: f05f8ac8472b74dad6310273915eb6a5de447740
2024-04-23T11:01:33,217 [WARN ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-test
2024-04-23T11:01:33,309 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /tmp/.mms.sock.9000 --handler /tmp/models/test/model_handler.py:handle --model-path /tmp/models/f05f8ac8472b74dad6310273915eb6a5de447740 --model-name test --preload-model false --tmp-dir /tmp
2024-04-23T11:01:33,311 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /tmp/.mms.sock.9000
2024-04-23T11:01:33,311 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 30549
2024-04-23T11:01:33,312 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2024-04-23T11:01:33,313 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.10.12
2024-04-23T11:01:33,313 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_MODEL_LOADED
2024-04-23T11:01:33,314 [INFO ] W-9000-test MMS_METRICS - W-9000-test.ms:106|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,314 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model test loaded.
2024-04-23T11:01:33,315 [DEBUG] main com.amazonaws.ml.mms.wlm.ModelManager - updateModel: test, count: 2
2024-04-23T11:01:33,317 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED
2024-04-23T11:01:33,320 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-04-23T11:01:33,322 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED
2024-04-23T11:01:33,334 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000
2024-04-23T11:01:33,333 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000
2024-04-23T11:01:33,415 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://127.0.0.1:8080
2024-04-23T11:01:33,416 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2024-04-23T11:01:33,426 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
2024-04-23T11:01:33,429 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
2024-04-23T11:01:33,441 [INFO ] main com.amazonaws.ml.mms.ModelServer - Management API bind to: http://127.0.0.1:8081
Model server started.
2024-04-23T11:01:33,480 [WARN ] pool-3-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2024-04-23T11:01:33,553 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model initialising, waiting for 2 minutes
2024-04-23T11:01:33,554 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model initialising, waiting for 2 minutes
2024-04-23T11:01:33,592 [INFO ] pool-3-thread-1 MMS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,598 [INFO ] pool-3-thread-1 MMS_METRICS - DiskAvailable.Gigabytes:48.51470947265625|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,599 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUsage.Gigabytes:86.47355270385742|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,600 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUtilization.Percent:64.1|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,600 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryAvailable.Megabytes:2041.51171875|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,600 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUsed.Megabytes:1586.08984375|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,602 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUtilization.Percent:47.8|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:02:33,459 [WARN ] pool-3-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2024-04-23T11:02:33,528 [INFO ] pool-3-thread-1 MMS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,529 [INFO ] pool-3-thread-1 MMS_METRICS - DiskAvailable.Gigabytes:48.51460266113281|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,529 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUsage.Gigabytes:86.47365951538086|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,530 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUtilization.Percent:64.1|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,530 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryAvailable.Megabytes:2014.546875|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,530 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUsed.Megabytes:1613.0546875|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,531 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUtilization.Percent:48.5|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:03:33,445 [WARN ] pool-3-thread-2 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2024-04-23T11:03:33,515 [INFO ] pool-3-thread-2 MMS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,518 [INFO ] pool-3-thread-2 MMS_METRICS - DiskAvailable.Gigabytes:48.51456069946289|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,519 [INFO ] pool-3-thread-2 MMS_METRICS - DiskUsage.Gigabytes:86.47370147705078|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,520 [INFO ] pool-3-thread-2 MMS_METRICS - DiskUtilization.Percent:64.1|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,521 [INFO ] pool-3-thread-2 MMS_METRICS - MemoryAvailable.Megabytes:2015.05078125|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,523 [INFO ] pool-3-thread-2 MMS_METRICS - MemoryUsed.Megabytes:1612.55078125|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,523 [INFO ] pool-3-thread-2 MMS_METRICS - MemoryUtilization.Percent:48.4|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,653 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model finished waiting
2024-04-23T11:03:33,655 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model test loaded io_fd=0242b6fffe39090c-00007712-00000001-85c1e7725a6e3c25-be19b2af
2024-04-23T11:03:33,656 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model finished waiting
2024-04-23T11:03:33,655 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 120099
2024-04-23T11:03:33,656 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model test loaded io_fd=0242b6fffe39090c-00007712-00000002-b0c9e7725a6e3c25-57e7cf49
2024-04-23T11:03:33,655 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 120104
2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-04-23T11:03:33,658 [INFO ] W-9000-test MMS_METRICS - W-9000-test.ms:120341|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,658 [WARN ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-test-2
2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-04-23T11:03:33,659 [INFO ] W-9000-test MMS_METRICS - W-9000-test.ms:120343|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213

```
</details>

Questions:
1. Is that the intended behaviour?
2. Is it possible for `/ping` endpoint to return something else than `200 OK` when workers are still loading the models?

thank you! :) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make healthcheck dependent on models loaded #1027

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make healthcheck dependent on models loaded #1027

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions