Enable asynchronous mode when serving inference pipeline #587

cariveroco · 2024-08-29T13:07:04Z

Description

The pipeline_ml_factory allows the isolation of an inference pipeline that would be run during model serving. The run sequentially loads the I/O per node, and there could be potential performance gains if asynchronous mode can be enabled instead, like when the kedro run --async command is used (reference).

Context

We have a MLflow model that uses the pipeline_ml_factory and is hosted by a platform which enforces an API response timeout. We already optimized our code base, and are hoping that the processing speed could still be significantly reduced if the many I/O to our inference pipeline's nodes could be loaded/saved asynchronously.

The platform serves the model similar to how mlflow models serve does, where only the MLflow model itself is accessed. Within the docker container deployed by the hosting platform, our entrypoint script only has access to the MLflow model and cannot access the Kedro project path, so we cannot load any configurations set in the project's /conf directory. Thus, we are hoping that enabling the asynchronous mode could be somehow "encoded" within the MLflow model itself.

Possible Implementation/Alternatives

Unfortunately I have no suggestions on how this could be implemented, and actually unsure whether this feature is already available.

The text was updated successfully, but these errors were encountered:

Calychas · 2024-10-15T10:11:01Z

Hey @cariveroco! It can be enabled and "encoded" within the MLflow model itself. Use kpm_kwargs argument when using the pipeline_ml_factory like this:

from kedro.runner import SequentialRunner
from kedro_mlflow.pipeline import pipeline_ml_factory

pipeline_ml_factory(
        training=training_pipeline
        inference=inference_pipeline,
        kpm_kwargs={"runner": SequentialRunner(is_async=True)},
		...
)

Galileo-Galilei · 2024-10-15T20:02:56Z

First of all, sincere apologies @cariveroco for the long delay, I though I answered this question :/

Good news is, this is possible at training time as @Calychas demonstrates above (thanks!). This does not feel right and it may become something you specify at inference time once #580 is solved.

If you want more customization and control over running the pipeline programatically (including a custom mlflow model inside, but not necessarily limited to it), you can have a look at kedro-boot.

cariveroco · 2024-10-16T05:38:02Z

@Calychas Thank you so much! I missed that the Runner classes could be configured. Your sample code is very clear and it worked perfectly for what I want to achieve. This is sufficient for now, but I also agree with the #580 discussion that it's better to have this as a configurable parameter at runtime.

@Galileo-Galilei No worries! I couldn't thank you enough for the work and dedication you are putting into this project. Indeed, kedro-mlflow (specifically pipeline_ml_factory) was a godsend to me last year when I was trying to figure out how to deploy my Kedro pipelines as an API. Thank you also for introducing kedro-boot; I'll look into it as from first impression they seem to deal with APIs as well.

My question was resolved so I'll now close this issue. Looking forward to the new release that incorporates #580. Thank you!

Galileo-Galilei mentioned this issue Oct 1, 2024

Handle inference parameters #580

Closed

6 tasks

cariveroco closed this as completed Oct 16, 2024

Galileo-Galilei moved this to ✅ Done in kedro-mlflow roadmap Oct 29, 2024

Galileo-Galilei added this to kedro-mlflow roadmap Oct 29, 2024

Galileo-Galilei mentioned this issue Nov 1, 2024

Add inference parameters to KedroPipelineModel #445

Closed

Galileo-Galilei mentioned this issue Nov 30, 2024

✨ Enable to override params at predict time in KedroPipelineModel #612

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable asynchronous mode when serving inference pipeline #587

Enable asynchronous mode when serving inference pipeline #587

cariveroco commented Aug 29, 2024

Calychas commented Oct 15, 2024

Galileo-Galilei commented Oct 15, 2024

cariveroco commented Oct 16, 2024

Enable asynchronous mode when serving inference pipeline #587

Enable asynchronous mode when serving inference pipeline #587

Comments

cariveroco commented Aug 29, 2024

Description

Context

Possible Implementation/Alternatives

Calychas commented Oct 15, 2024

Galileo-Galilei commented Oct 15, 2024

cariveroco commented Oct 16, 2024