Skip to content

Make healthcheck dependent on models loaded #1027

@wojciechrauk-plutoflume

Description

Good morning,

I'm using the MMS server to host a single model in AWS Sagemaker. The model is loaded on MMS startup with sagemaker_inference.model_server.start_model_server with a custom handler_service.

The problem is that loading the model can take significant amount of time (up to 20 minutes). In the time that model is being loaded MMS server /ping endpoint returns 200 OK responses, causing sagemaker endpoint to assume the model is already loaded.

Furthermore, I tested MMS locally to see how the server behaves in the model loading stage and it looks like the model details endpoint returns model worker status as READY before the worker is finished loading the model.

In the logs below you can see that workers start and immediately there's:

2024-04-23T11:01:33,313 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_MODEL_LOADED

after that there's

2024-04-23T11:01:33,317 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED
...
2024-04-23T11:01:33,322 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED

and only finally:

2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED
...
2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED

In the meantime /ping returns 200 OK and localhost:8081/models/test returns:

{
  "modelName": "test",
  "modelUrl": "test.mar",
  "runtime": "python3",
  "minWorkers": 2,
  "maxWorkers": 2,
  "batchSize": 1,
  "maxBatchDelay": 100,
  "loadedAtStartup": true,
  "workers": [
    {
      "id": "9000-57e7cf49",
      "startTime": "2024-04-23T11:01:33.316Z",
      "status": "READY",
      "gpu": false,
      "memoryUsage": 13099008
    },
    {
      "id": "9000-be19b2af",
      "startTime": "2024-04-23T11:01:33.317Z",
      "status": "READY",
      "gpu": false,
      "memoryUsage": 13434880
    }
  ]
}
All logs:

Removing orphan pid file.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-04-23T11:01:33,126 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /home/ec2-user/SageMaker/notebooks/environment/user/user-env-base-3-10/user-env-base-3-10/pex_venv/lib/python3.10/site-packages
Current directory: /tmp/models/test
Temp directory: /tmp
Number of GPUs: 0
Number of CPUs: 2
Max heap size: 978 M
Python executable: /home/ec2-user/SageMaker/notebooks/environment/user/user-env-base-3-10/user-env-base-3-10/pex_venv/bin/python3.10
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Model Store: /tmp/models/test/export
Initial Models: test.mar
Log dir: null
Metrics dir: null
Netty threads: 0
Netty client threads: 0
Default workers per model: 2
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2024-04-23T11:01:33,133 [INFO ] main com.amazonaws.ml.mms.ModelServer - Loading initial models: test.mar preload_model: false
2024-04-23T11:01:33,189 [INFO ] main com.amazonaws.ml.mms.archive.ModelArchive - model folder already exists: f05f8ac8472b74dad6310273915eb6a5de447740
2024-04-23T11:01:33,217 [WARN ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-test
2024-04-23T11:01:33,309 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /tmp/.mms.sock.9000 --handler /tmp/models/test/model_handler.py:handle --model-path /tmp/models/f05f8ac8472b74dad6310273915eb6a5de447740 --model-name test --preload-model false --tmp-dir /tmp
2024-04-23T11:01:33,311 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /tmp/.mms.sock.9000
2024-04-23T11:01:33,311 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 30549
2024-04-23T11:01:33,312 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2024-04-23T11:01:33,313 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.10.12
2024-04-23T11:01:33,313 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_MODEL_LOADED
2024-04-23T11:01:33,314 [INFO ] W-9000-test MMS_METRICS - W-9000-test.ms:106|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,314 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model test loaded.
2024-04-23T11:01:33,315 [DEBUG] main com.amazonaws.ml.mms.wlm.ModelManager - updateModel: test, count: 2
2024-04-23T11:01:33,317 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED
2024-04-23T11:01:33,320 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-04-23T11:01:33,322 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change null -> WORKER_STARTED
2024-04-23T11:01:33,334 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000
2024-04-23T11:01:33,333 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000
2024-04-23T11:01:33,415 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://127.0.0.1:8080
2024-04-23T11:01:33,416 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2024-04-23T11:01:33,426 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
2024-04-23T11:01:33,429 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
2024-04-23T11:01:33,441 [INFO ] main com.amazonaws.ml.mms.ModelServer - Management API bind to: http://127.0.0.1:8081
Model server started.
2024-04-23T11:01:33,480 [WARN ] pool-3-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2024-04-23T11:01:33,553 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model initialising, waiting for 2 minutes
2024-04-23T11:01:33,554 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model initialising, waiting for 2 minutes
2024-04-23T11:01:33,592 [INFO ] pool-3-thread-1 MMS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,598 [INFO ] pool-3-thread-1 MMS_METRICS - DiskAvailable.Gigabytes:48.51470947265625|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,599 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUsage.Gigabytes:86.47355270385742|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,600 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUtilization.Percent:64.1|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,600 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryAvailable.Megabytes:2041.51171875|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,600 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUsed.Megabytes:1586.08984375|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:01:33,602 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUtilization.Percent:47.8|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870093
2024-04-23T11:02:33,459 [WARN ] pool-3-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2024-04-23T11:02:33,528 [INFO ] pool-3-thread-1 MMS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,529 [INFO ] pool-3-thread-1 MMS_METRICS - DiskAvailable.Gigabytes:48.51460266113281|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,529 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUsage.Gigabytes:86.47365951538086|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,530 [INFO ] pool-3-thread-1 MMS_METRICS - DiskUtilization.Percent:64.1|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,530 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryAvailable.Megabytes:2014.546875|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,530 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUsed.Megabytes:1613.0546875|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:02:33,531 [INFO ] pool-3-thread-1 MMS_METRICS - MemoryUtilization.Percent:48.5|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870153
2024-04-23T11:03:33,445 [WARN ] pool-3-thread-2 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2024-04-23T11:03:33,515 [INFO ] pool-3-thread-2 MMS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,518 [INFO ] pool-3-thread-2 MMS_METRICS - DiskAvailable.Gigabytes:48.51456069946289|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,519 [INFO ] pool-3-thread-2 MMS_METRICS - DiskUsage.Gigabytes:86.47370147705078|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,520 [INFO ] pool-3-thread-2 MMS_METRICS - DiskUtilization.Percent:64.1|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,521 [INFO ] pool-3-thread-2 MMS_METRICS - MemoryAvailable.Megabytes:2015.05078125|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,523 [INFO ] pool-3-thread-2 MMS_METRICS - MemoryUsed.Megabytes:1612.55078125|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,523 [INFO ] pool-3-thread-2 MMS_METRICS - MemoryUtilization.Percent:48.4|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,653 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model finished waiting
2024-04-23T11:03:33,655 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model test loaded io_fd=0242b6fffe39090c-00007712-00000001-85c1e7725a6e3c25-be19b2af
2024-04-23T11:03:33,656 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model finished waiting
2024-04-23T11:03:33,655 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 120099
2024-04-23T11:03:33,656 [INFO ] W-9000-test-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model test loaded io_fd=0242b6fffe39090c-00007712-00000002-b0c9e7725a6e3c25-57e7cf49
2024-04-23T11:03:33,655 [INFO ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 120104
2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-04-23T11:03:33,658 [INFO ] W-9000-test MMS_METRICS - W-9000-test.ms:120341|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213
2024-04-23T11:03:33,658 [WARN ] W-9000-test com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-test-2
2024-04-23T11:03:33,657 [DEBUG] W-9000-test com.amazonaws.ml.mms.wlm.WorkerThread - W-9000-test State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-04-23T11:03:33,659 [INFO ] W-9000-test MMS_METRICS - W-9000-test.ms:120343|#Level:Host|#hostname:ip-10-10-24-132.eu-west-1.compute.internal,timestamp:1713870213

Questions:

  1. Is that the intended behaviour?
  2. Is it possible for /ping endpoint to return something else than 200 OK when workers are still loading the models?

thank you! :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions