Models with the same inference_pool_gid still create a new InferencePool and spawn N parallel workers

### Description

The PR #2040 introduced a great feature allowing users to define custom inference pools per model, instead of sharing a single pool across models with different loads.

However, there's a small bug in [mlserver/parallel/registry.py](https://github.com/SeldonIO/MLServer/blob/master/mlserver/parallel/registry.py#L142) when the environment is not provided or an environment tarball is available:

```python
if not env_tarball:
    return (
        self._pools.setdefault(
            inference_pool_gid,
            InferencePool(self._settings, on_worker_stop=self._on_worker_stop),
        )
        if inference_pool_gid
        else self._default_pool
    )
```

If `inference_pool_gid` already exists in `self._pools`, a new `InferencePool` instance is still created (and thus spawns N new worker processes) before `setdefault` checks for an existing key.

From the `InferencePool` constructor:

```python
def __init__(
    self,
    settings: Settings,
    env: Optional[Environment] = None,
    on_worker_stop: List[InferencePoolHook] = [],
):
    configure_inference_pool(settings)

    ...
    for _ in range(self._settings.parallel_workers): # spawning Python processes
        worker = _spawn_worker(self._settings, self._responses, self._env)
        self._workers[worker.pid] = worker  # type: ignore
```

This leads to redundant process creation even when the pool already exists.

### Steps to reproduce

1. Create 2 ML models with the same `inference_pool_gid`, e.g. in `model-settings.json`
```json
{
    "name": "foo",
    "implementation": "...",
    "parameters": {
        "inference_pool_gid": "bar"
    }
}

``` 
2. Set the `parallel_workers` in `settings.json` to 2
```json
{
    "debug": "true",
    "use_structured_logging": "true",
    "parallel_workers": 2,
}
```

3. Start MLServer and check the number of worker processes:

```bash
ps -ef | grep spawn_main | grep python | wc -l
```

Expected: 4 processes (2 for the default pool + 2 for the custom bar pool)
Observed: 6 processes, as InferencePool is instantiated twice.

This can also be demonstrated in a Python shell:

```bash
>>> class Foo:
...     def __init__(self):
...             print("hello")
... 
>>> bar = {}
>>> bar.setdefault("1", Foo())
hello
<__main__.Foo object at 0x104916da0>
>>> bar.setdefault("1", Foo())
hello
<__main__.Foo object at 0x104916da0>
>>> 
```

`setdefault()` still calls `Foo()` twice because the argument is evaluated before checking if the key exists.

### Impact

- Orphan processes are spawned unnecessarily.
- These processes are never used for inference but remain alive.
- Can lead to high memory usage and degraded performance in production environments with multiple models or high worker counts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Models with the same inference_pool_gid still create a new InferencePool and spawn N parallel workers #2274

Description

Steps to reproduce

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Models with the same inference_pool_gid still create a new InferencePool and spawn N parallel workers #2274

Description

Description

Steps to reproduce

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions