Skip to content

Conversation

phoeenniixx
Copy link
Member

@phoeenniixx phoeenniixx commented Aug 7, 2025

A proposal for the predict of pytorch-forecasting v2
(Copied from the hackmd: https://hackmd.io/@Pm5-sJBvSfeR6I59oCaLOA/BJIqEgYDlg/edit)

@phoeenniixx
Copy link
Member Author

FYI @fkiraly @agobbifbk @PranavBhatP

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great summary!

May I request some basic elements to this STEP:

  • IMPORTANT: ensure you also have usage vignettes for the new design. Put these at the top of the "new design" sections
  • IMPORTANT: also at the top, discuss requirements, and design principles you are using. Do not start with the solution (this is the wrong place to start in writing and in thinking), but describing the aim and problems.
  • introduction, motivation, high-level summary of what and how this does
  • in sections "code snippets", make clear whether they are designs or status quo; vignettes or internal code

@fkiraly
Copy link
Contributor

fkiraly commented Aug 7, 2025

Some design comments about the content:

  • I think predict has too many arguments. Can we reduce their number?
  • I think predict should be on the level of D1. So that users will never have to deal with the particular architecture in predict.

Regarding STEPs, should this not be in a asingle step together, with scope being ptf v2 API?

@phoeenniixx
Copy link
Member Author

phoeenniixx commented Aug 7, 2025

  • I think predict should be on the level of D1. So that users will never have to deal with the particular architecture in predict.

Can you please elaborate what are you thinking? I am not quite sure how to move forward with this

  • I think predict has too many arguments. Can we reduce their number?

Yeah we can use list (like dicts in __getitem__ of D1) to club similar args like use param returns which is a list containing the args you want to return - index, x etc.

Regarding STEPs, should this not be in a asingle step together, with scope being ptf v2 API?

Well I didnot had the edit access to #39, so I raised a new one :)

@fkiraly
Copy link
Contributor

fkiraly commented Aug 7, 2025

Can you please elaborate what are you thinking? I am not quite sure how to move forward with this

I am thinking: much closer to the sktime interface. So that the v2 interface is completely independent of model architecture, except through __init__ of model or package.

In particular, it means that predict can have args that relate to where to forecast, or taking exogenous data, but must not relate to model specifics such as decoder/encoder length.

I think predict has too many arguments. Can we reduce their number?
Yeah we can use list

Good idea, or dict.

Well I didnot had the edit access to #39, so I raised a new one :)

I see, @pranavvp16, can you give edit access? We can also leave different parts in different PR to work on them, but ultimately imo we want to have a single doc.

@fkiraly
Copy link
Contributor

fkiraly commented Aug 7, 2025

I noticed, #39 was me. I have now given you and @PranavBhatP write access so that you can directly edit. Also happy to use this PR instead and copy stuff over from #39, as you prefer. Perhaps for the start it is even better to keep the two PR separate?

@agobbifbk
Copy link
Collaborator

Probably here we need a distinction between the predict method of the model class (usually the forward loop but in some cases it can be model specific), and here what I can imagine is that there are not so many parameters (for example if I trained the model using a distribution loss I can ask for a single sample or multiple samples returning the point mean and standard deviation for example), and the predict of D1/D2 layer somehow that process the tensors given by the model predict using also the time and groups from the dataloader giving to the user a more usable output of the prediction (csv, pandas dataframe, xarray, ...).

There is something that I faced when we develop DSIPTS related to realtime prediction. In this case we don't have the future target and some known variables (e.g. the hour or the month). If we reuse the D2 as it is now probably the sample generation process will NOT produce the sample we need (we are discarding target with NaNs I suppose). Somehow we need to think to put this logic during the sample generation: what we do in DSIPTS is extending the input data with additional timestamps with nans on the target (or generally on the unknown variables) before extracting the temporal features and creating the sample(s). It is not relevant at this point of the discussion but we need to remember it :-)

@phoeenniixx phoeenniixx requested a review from fkiraly August 10, 2025 09:24
@phoeenniixx
Copy link
Member Author

Hi @fkiraly, @agobbifbk, I've made some additions to the design doc, Please review
Some points:

  • the .predict() is accepting only D2 layer (and not D1 layer or dataframe) as @agobbifbk feared this could lead to coupling as we'll be creating an instance of D2 layer (in case of D1 layer) and both D1/D2 layers (in case of raw dataframe) inside the BaseModel (where .predict() will reside) and I agree with him.
  • Maybe we can move the to_dataframe from an independent util to the PredictCallBack itself, because I think the data from return_info should be enough to create a dataframe? The return_info will return index, x, y etc and I think this should suffice. Although, I still need to try it out locally, So I am not sure if I am thinking right here.

@jgyasu jgyasu moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Aug 11, 2025
Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! I have two big comments.

  1. I think it is important to design this with at least two different examples for D2 in mind. Suggestion how to proceed:
  • find at least one model that will not be EncoderDecoderDataModule for D2
  • write down the vignette
  • given both vignettes, compare and write one "generalized" vignette
  1. It should be possible to access predict without D2, using only D1 and specification syntax
  • have a look at the M layer design: sktime/pytorch-forecasting#1870 - this would also work for predict
  • the interesting question is, should this be additional to, or instead of, the direct use of D2 and T layer?

@fkiraly fkiraly moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Aug 14, 2025
@phoeenniixx phoeenniixx requested a review from fkiraly August 15, 2025 12:33
@jgyasu jgyasu moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Aug 15, 2025
@agobbifbk
Copy link
Collaborator

It should be possible to access predict without D2, using only D1 and specification syntax
The D2 layer produces the dataloaders for the training procedure. We can think to use only a D1 + some information for rebuilding correctly the dataloader, but we need to store some information such as context/prediction length and scalers. If we force the user to pass through the D2 layer we are sure that all the information are in the correct place, I understand that this is an over-killing procedure, do you have any ideas to make it lighter?

@phoeenniixx
Copy link
Member Author

It should be possible to access predict without D2, using only D1 and specification syntax

Yeah, if you look at the recent changes, I've introduced a layered approach, where the D1 layer object creates a D2 layer object inside the _pkg class (and not in the actual model class), and the D2 layer is sent to the model class by the pkg class (actually just the dataloaders, D2 layer creates the dataloaders inside the _pkg class)

@phoeenniixx
Copy link
Member Author

If you want we can have a discussion on the new approach, where I could explain exactly the idea I am thinking (based on the suggestion made by @fkiraly )

@phoeenniixx phoeenniixx moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Aug 19, 2025
@phoeenniixx phoeenniixx moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Aug 21, 2025
@phoeenniixx
Copy link
Member Author

Hi @fkiraly, @agobbifbk, I have added the doctsrings to the model package class idea, adding the "side effects" of using the ckpt_path etc. Please have alook at it :)

@agobbifbk
Copy link
Collaborator

Can you point out the new lines, it will be easier for me, thx!

@phoeenniixx
Copy link
Member Author

phoeenniixx commented Aug 21, 2025

The main doctsrings lie in this class :


and the proposal starts from here

@phoeenniixx
Copy link
Member Author

I would really appreciate some commments on the docstrings of the package class (starting from line 666)

@agobbifbk
Copy link
Collaborator

Ok thx I was looking at the HMD file :-(

@agobbifbk
Copy link
Collaborator

You wrote The .predict() method signature for all models in v2:

def predict(
    data,
    mode: str = "prediction",  # "prediction", "quantiles", "raw"
    return_info: list[str] | None = None,  # e.g. ["index", "x", "y", "decoder_lengths"]
    write_interval,  # when to write to the disk?
    output_dir: str | None = None,  # if provided, stream to disk
    **trainer_kwargs,
) -> dict[str, torch.Tensor]:

do you really mean model or are you referring to the base class or a model wrapper class? Hard to think that each model implements such a function right?

pkg = DeepAR_pkg(trainer_cfg=trainer_cfg, ckpt_path="checkpoints/last.ckpt")
prediction_output = pkg.predict(
    data_module,  
    mode="quantiles",
    return_info=["index"]  # return index to get time_idx and groups
)

DataModule prepares the dataloader, what happens if you passes data_module to the predict? Probably here you want to get the prediction from the test set, right? So you need the test_dataloader? I see you do it after, my question is: do we want the predict method accepts also a dataModule object?

Here I see a critical point (you already mentioned it :-) ). If a checkpoint is passed, the datamodule_cfg must be loaded from the correct place:

  • advantages: less burdern for the user, we are sure we load the correct sutff
  • disadvantages: none
    - If ``ckpt_pth`` is NOT None:
        - The ``datamodule_cfg`` can either be ``dict`` or ``path``.
            - If ``dict``, the datamodule is directly configured using the dict, but this 
              is dangerous as the configurations should be exactly the same otherwise the 
              model pipeline will not behave as intented

Why optionally save checkpoints?

 Provide ``model_cfg`` + ``trainer_cfg`` + ``datamodule_cfg`` (as dict). 
      Call ``pkg.fit(dataset, ...)`` to train and optionally save checkpoints.

What do you think about this: I trained my model for 1000 epochs in 2days, I accidentally rerun the same script. I would like to be stopped in the case we see that there is already a trained model for that configuration. Do you think a simple boolean overwrite parameter can fit in the design?

Another point here:

        preds = self.model.predict(dataloader, mode, return_info,write_interval, 
                                   output_dir, trainer_kwargs, 
                                   **anyother_param_and_kwargs)
       return preds

this can be critical: suppose we have a D1 layer that reads from a large collection of csv meaning that the preds can not be stored in memory. As I understand from the rest of the document, the self.model.predict takes care to eventually save the result to the disk BUT I don't see how the model can revert the values in case of the application of a scaler on the target for example. The scalers probably are saved in the D2 object (are we sure that all D2 have a scaler?).
What about something like:

        for batch in dataloader:
              res = self.model.predict(batch)
              ##other logic for saving the results here so that we have access to all the D1/D2 info
              res_manipulated = d2.process_output(res, **some_params)
              ?? = d1.save_prediction(res_manipulated, **some_other_params)
              

Hope this helps and does not confuse you :-) Thx for the enormous work done so far!

@phoeenniixx
Copy link
Member Author

do you really mean model or are you referring to the base class or a model wrapper class? Hard to think that each model implements such a function right?

I mean the predict of BaseClass (see here)
Each model doesnt have to implement this, as the basic logic will lie inside the base class and like the current API, most of the models wont even require the .predict() implementation of their own. Although to pass these params to the BaseClass, we'd need the user to pass these params to the wrapper as well, so similar signature will be in the wrapper class as well

DataModule prepares the dataloader, what happens if you passes data_module to the predict? Probably here you want to get the prediction from the test set, right? So you need the test_dataloader? I see you do it after, my question is: do we want the predict method accepts also a dataModule object?

Well the user will pass the data_module to the predict of the wrapper, which will create the data loaders and then pass it to the model layer. We can have predict accept the data_module but ONLY if we want the user to give a way to not to use this wrapper at all and rather follow the same flow as wrapper manually and there they pass the data_module to the model layer. (Although I think if the user is going to do everything manually, they'd want the loading process to be done manually as well, If they want the data loading to be done manually, then they can pass those manually created data loaders directly to the model layer..?)

Why optionally save checkpoints?

Well I thought if someone is just trying out different models, and following the wrapper flow, everytime they'd call fit, it will always save the checkpoints, this could be frustrating as they were only trying out things and didnt want to save those ckpts, if the user actually want to save the ckpt, they can make save_ckpt=True. What do you think?

What do you think about this: I trained my model for 1000 epochs in 2days, I accidentally rerun the same script. I would like to be stopped in the case we see that there is already a trained model for that configuration. Do you think a simple boolean overwrite parameter can fit in the design?

Are you saying for the saved checkpoints? yes, we can have it, or rather better idea could be that we always save the checkpoints in a new folder inside the checkpoints folder, named like this - ckpt_date_time, in this even if they run multiple times, they could easily go back to the correct folder (much like GH commits)?

this can be critical: suppose we have a D1 layer that reads from a large collection of csv meaning that the preds can not be stored in memory. As I understand from the rest of the document, the self.model.predict takes care to eventually save the result to the disk BUT I don't see how the model can revert the values in case of the application of a scaler on the target for example. The scalers probably are saved in the D2 object (are we sure that all D2 have a scaler?).

Sorry I am not able to follow, can you please elaborate, or we could discuss in the meet today. But for the last question, I think from my current understanding, D2 layer should always have scalers, why would a D2 layer wont have scalers?

@jgyasu jgyasu moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Aug 26, 2025
## Aim


Current beta version of `ptf-v2` doesnot have any functionality to do the predicitions and this design document aims to provide some possible ideas to implement the prediction pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to be clearer - e.g., "pipeline" is not exactly what happens here and might lead to confusion with sklearn pipelines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline -> vignette?

"does not have dedicated predict mode"

#### Output Types:
The output is a `Prediction` class type object which has different keys depending upon the `mode` and other params (like `return_x` etc).
Here `N` is the size of validation data
* "prediction" -> tensor of shape`(N, prediction_length)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lack of information: where in the Prediction class are these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add a more info - exactly explaining how the class looks like


In v2, we should try to design `.predict()` to be more general, composable, and predictable while retaining ease of use.

### Requirements
Copy link
Contributor

@fkiraly fkiraly Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice and well thought out

* **Memory safety:** Large predictions can be streamed to disk without exhausting RAM.

### High-Level Summary
The proposed `.predict()` system for v2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some open questions:

  • we have two layers, so which of the two layers (pkg or model) have predict-like functions?
  • considering alternatives, e.g., having predict, predict_quantiles, and predict_raw, as opposed to a mode argument. This is how sktime is doing it, and we did actually consider (without considering ptf as an inspiration) to have a mode-like arg and actively decided against it. The reason was, the function would just have ended up as large if/else blocks, and dispatching to each other would be as unpleasant as the methods to_quantiles etc currently like
    • not saying that this is how we need to do this, and the two layers can even handle it differently - only that this alternative should be discussed. Why is it worse for D2 if it looked better for sktime? This should be a conscious decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • we have two layers, so which of the two layers (pkg or model) have predict-like functions?

if we go by naming - both classes would have a function named predict. But their working would be a little different:

  • For pkg: it would be a wrapper, calling predict() of model layer. See here for a basic idea how it would look like.
  • For model layer: it would be the actual predict() that would wrap the trainer.predict() and callbacks.

Now looking back at the high level summary in the EP, I think this is not clear enough, I would add more clear pointers :)

  • considering alternatives, e.g., having predict, predict_quantiles, and predict_raw, as opposed to a mode argument.

Hmm that is a good suggestion, and i agree it would be a a mess of if/else blocks. From here I think we could merge both ideas? Like for the wrapper, we would keep the modes, but for each mode, we call a different predict inside the pkg layer - this would make the code cleaner at model layer and at pkg layer it would just be a if/else block where we call a different type of predict for each mode.
Something like this:

Vignette would remain the same

prediction_output = pkg.predict(
    data_module, 
    mode="quantiles",
    return_info=["index"]  # return index to get time_idx and groups
)

pkg layer predict

def predict(self, dataset, mode, return_info, write_interval, output_dir, to_dataframe,
                trainer_kwargs, **anyother_param_and_kwargs):
        predict_dm = self._build_datamodule(dataset)
        dataloader= self._create_dataloaders(predict_dm)
        if mode == "predict":
              self.model.predict(...)
        elif mode == "raw":
              self.model.predict_raw(...)
        elif mode == "quantile":
              self.model.predict_quantile(...)
        else: 
              raise ValueError

I think this would be useful for the user? just change the mode, and rest is handled by the pkg layer. If there are different predicts at pkg layer, that would mean the user would have to change the whole function (and maybe even the signature, based on the requirements of the func) which would otherwise be handled by the wrapper.

What do you think?

return_info: list[str] | None = None, # e.g. ["index", "x", "y", "decoder_lengths"]
write_interval, # when to write to the disk?
output_dir: str | None = None, # if provided, stream to disk
**trainer_kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/idea: in D2, should trainer_kwargs move to __init__?

Copy link
Member Author

@phoeenniixx phoeenniixx Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These trainer_kwargs are used for the trainer initialization (add some customizations), and as trainer is used only while predict I am not sure if we should move it to __init__. I think a user may want to run fit on some other kind of trainer (like run on cpu, not gpu) than predict (run on gpu). This provides more flexibility? Also, it may be a possibility that we are using a pre-trained model, at that time trainer_kwargs would be used only by predict and not anywhere else (atleast I cant think of any other place).

That's why it is not kept at __init__. We keep trainer_cfg for the fit in __init__, but if the user want a little different predict they could pass trainer_kwargs.

What are your thoughts on this?

..., # other params like target_normalizer, num_workers etc
)
# init package
pkg = model_pkg(model_cfg, trainer_cfg=trainer_cfg, datamodule_cfg=datamodule_cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. Question regarding alternatives: why not **trainer_cfg, **datamodule_cfg? Have you explicitly considered both options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well model_pkg(model_cfg, trainer_cfg=trainer_cfg, datamodule_cfg=datamodule_cfg) keeps the cfgs separate, I am not sure **trainer_cfg, **datamodule_cfg could be used..
Are you saying something like this?
model_pkg(model_cfg, **trainer_cfg, **datamodule_cfg) - I think this would be harder to parse?
From my understanding, something like this:

pkg = model_pkg(model_cfg, **trainer_cfg, **datamodule_cfg)

would unpack trainer_cfg and datamodule_cfg and would pass the keys as args to the class. So the class should have all the parameters of trainer and datamodule that would make __init__ very complex where most of its params would only be used to initialise the datamodules and trainer? so why dont we keep them separate as dicts?

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice design, I think - some questions above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: PR under review

Development

Successfully merging this pull request may close these issues.

3 participants