-
Notifications
You must be signed in to change notification settings - Fork 215
Description
Description
The PR #2040 introduced a great feature allowing users to define custom inference pools per model, instead of sharing a single pool across models with different loads.
However, there's a small bug in mlserver/parallel/registry.py when the environment is not provided or an environment tarball is available:
if not env_tarball:
return (
self._pools.setdefault(
inference_pool_gid,
InferencePool(self._settings, on_worker_stop=self._on_worker_stop),
)
if inference_pool_gid
else self._default_pool
)If inference_pool_gid already exists in self._pools, a new InferencePool instance is still created (and thus spawns N new worker processes) before setdefault checks for an existing key.
From the InferencePool constructor:
def __init__(
self,
settings: Settings,
env: Optional[Environment] = None,
on_worker_stop: List[InferencePoolHook] = [],
):
configure_inference_pool(settings)
...
for _ in range(self._settings.parallel_workers): # spawning Python processes
worker = _spawn_worker(self._settings, self._responses, self._env)
self._workers[worker.pid] = worker # type: ignoreThis leads to redundant process creation even when the pool already exists.
Steps to reproduce
- Create 2 ML models with the same
inference_pool_gid, e.g. inmodel-settings.json
{
"name": "foo",
"implementation": "...",
"parameters": {
"inference_pool_gid": "bar"
}
}
- Set the
parallel_workersinsettings.jsonto 2
{
"debug": "true",
"use_structured_logging": "true",
"parallel_workers": 2,
}- Start MLServer and check the number of worker processes:
ps -ef | grep spawn_main | grep python | wc -lExpected: 4 processes (2 for the default pool + 2 for the custom bar pool)
Observed: 6 processes, as InferencePool is instantiated twice.
This can also be demonstrated in a Python shell:
>>> class Foo:
... def __init__(self):
... print("hello")
...
>>> bar = {}
>>> bar.setdefault("1", Foo())
hello
<__main__.Foo object at 0x104916da0>
>>> bar.setdefault("1", Foo())
hello
<__main__.Foo object at 0x104916da0>
>>> setdefault() still calls Foo() twice because the argument is evaluated before checking if the key exists.
Impact
- Orphan processes are spawned unnecessarily.
- These processes are never used for inference but remain alive.
- Can lead to high memory usage and degraded performance in production environments with multiple models or high worker counts.