-
Notifications
You must be signed in to change notification settings - Fork 30
[STEP] Design Document for Prediction pipeline of ptf-v2
#46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, great summary!
May I request some basic elements to this STEP:
- IMPORTANT: ensure you also have usage vignettes for the new design. Put these at the top of the "new design" sections
- IMPORTANT: also at the top, discuss requirements, and design principles you are using. Do not start with the solution (this is the wrong place to start in writing and in thinking), but describing the aim and problems.
- introduction, motivation, high-level summary of what and how this does
- in sections "code snippets", make clear whether they are designs or status quo; vignettes or internal code
Some design comments about the content:
Regarding STEPs, should this not be in a asingle step together, with scope being ptf v2 API? |
Can you please elaborate what are you thinking? I am not quite sure how to move forward with this
Yeah we can use list (like dicts in
Well I didnot had the edit access to #39, so I raised a new one :) |
I am thinking: much closer to the In particular, it means that
Good idea, or dict.
I see, @pranavvp16, can you give edit access? We can also leave different parts in different PR to work on them, but ultimately imo we want to have a single doc. |
I noticed, #39 was me. I have now given you and @PranavBhatP write access so that you can directly edit. Also happy to use this PR instead and copy stuff over from #39, as you prefer. Perhaps for the start it is even better to keep the two PR separate? |
Probably here we need a distinction between the There is something that I faced when we develop DSIPTS related to realtime prediction. In this case we don't have the future target and some known variables (e.g. the hour or the month). If we reuse the D2 as it is now probably the sample generation process will NOT produce the sample we need (we are discarding target with NaNs I suppose). Somehow we need to think to put this logic during the sample generation: what we do in DSIPTS is extending the input data with additional timestamps with nans on the target (or generally on the unknown variables) before extracting the temporal features and creating the sample(s). It is not relevant at this point of the discussion but we need to remember it :-) |
Hi @fkiraly, @agobbifbk, I've made some additions to the design doc, Please review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! I have two big comments.
- I think it is important to design this with at least two different examples for D2 in mind. Suggestion how to proceed:
- find at least one model that will not be
EncoderDecoderDataModule
for D2 - write down the vignette
- given both vignettes, compare and write one "generalized" vignette
- It should be possible to access
predict
without D2, using only D1 and specification syntax
- have a look at the M layer design: sktime/pytorch-forecasting#1870 - this would also work for
predict
- the interesting question is, should this be additional to, or instead of, the direct use of D2 and T layer?
|
Yeah, if you look at the recent changes, I've introduced a layered approach, where the D1 layer object creates a D2 layer object inside the |
If you want we can have a discussion on the new approach, where I could explain exactly the idea I am thinking (based on the suggestion made by @fkiraly ) |
Hi @fkiraly, @agobbifbk, I have added the doctsrings to the model package class idea, adding the "side effects" of using the |
Can you point out the new lines, it will be easier for me, thx! |
The main doctsrings lie in this class :
and the proposal starts from here |
I would really appreciate some commments on the docstrings of the package class (starting from line 666) |
Ok thx I was looking at the HMD file :-( |
You wrote def predict(
data,
mode: str = "prediction", # "prediction", "quantiles", "raw"
return_info: list[str] | None = None, # e.g. ["index", "x", "y", "decoder_lengths"]
write_interval, # when to write to the disk?
output_dir: str | None = None, # if provided, stream to disk
**trainer_kwargs,
) -> dict[str, torch.Tensor]: do you really mean pkg = DeepAR_pkg(trainer_cfg=trainer_cfg, ckpt_path="checkpoints/last.ckpt")
prediction_output = pkg.predict(
data_module,
mode="quantiles",
return_info=["index"] # return index to get time_idx and groups
) DataModule prepares the dataloader, what happens if you passes Here I see a critical point (you already mentioned it :-) ). If a checkpoint is passed, the datamodule_cfg must be loaded from the correct place:
- If ``ckpt_pth`` is NOT None:
- The ``datamodule_cfg`` can either be ``dict`` or ``path``.
- If ``dict``, the datamodule is directly configured using the dict, but this
is dangerous as the configurations should be exactly the same otherwise the
model pipeline will not behave as intented Why Provide ``model_cfg`` + ``trainer_cfg`` + ``datamodule_cfg`` (as dict).
Call ``pkg.fit(dataset, ...)`` to train and optionally save checkpoints. What do you think about this: I trained my model for 1000 epochs in 2days, I accidentally rerun the same script. I would like to be stopped in the case we see that there is already a trained model for that configuration. Do you think a simple boolean Another point here: preds = self.model.predict(dataloader, mode, return_info,write_interval,
output_dir, trainer_kwargs,
**anyother_param_and_kwargs)
return preds this can be critical: suppose we have a D1 layer that reads from a large collection of csv meaning that the for batch in dataloader:
res = self.model.predict(batch)
##other logic for saving the results here so that we have access to all the D1/D2 info
res_manipulated = d2.process_output(res, **some_params)
?? = d1.save_prediction(res_manipulated, **some_other_params)
Hope this helps and does not confuse you :-) Thx for the enormous work done so far! |
I mean the
Well the user will pass the
Well I thought if someone is just trying out different models, and following the wrapper flow, everytime they'd call
Are you saying for the saved checkpoints? yes, we can have it, or rather better idea could be that we always save the checkpoints in a new folder inside the
Sorry I am not able to follow, can you please elaborate, or we could discuss in the meet today. But for the last question, I think from my current understanding, D2 layer should always have scalers, why would a D2 layer wont have scalers? |
## Aim | ||
|
||
|
||
Current beta version of `ptf-v2` doesnot have any functionality to do the predicitions and this design document aims to provide some possible ideas to implement the prediction pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to be clearer - e.g., "pipeline" is not exactly what happens here and might lead to confusion with sklearn pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipeline -> vignette?
"does not have dedicated predict mode"
#### Output Types: | ||
The output is a `Prediction` class type object which has different keys depending upon the `mode` and other params (like `return_x` etc). | ||
Here `N` is the size of validation data | ||
* "prediction" -> tensor of shape`(N, prediction_length)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lack of information: where in the Prediction
class are these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add a more info - exactly explaining how the class looks like
|
||
In v2, we should try to design `.predict()` to be more general, composable, and predictable while retaining ease of use. | ||
|
||
### Requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice and well thought out
* **Memory safety:** Large predictions can be streamed to disk without exhausting RAM. | ||
|
||
### High-Level Summary | ||
The proposed `.predict()` system for v2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some open questions:
- we have two layers, so which of the two layers (pkg or model) have
predict
-like functions? - considering alternatives, e.g., having
predict
,predict_quantiles
, andpredict_raw
, as opposed to amode
argument. This is howsktime
is doing it, and we did actually consider (without considering ptf as an inspiration) to have amode
-like arg and actively decided against it. The reason was, the function would just have ended up as large if/else blocks, and dispatching to each other would be as unpleasant as the methodsto_quantiles
etc currently like- not saying that this is how we need to do this, and the two layers can even handle it differently - only that this alternative should be discussed. Why is it worse for D2 if it looked better for
sktime
? This should be a conscious decision.
- not saying that this is how we need to do this, and the two layers can even handle it differently - only that this alternative should be discussed. Why is it worse for D2 if it looked better for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we have two layers, so which of the two layers (pkg or model) have
predict
-like functions?
if we go by naming - both classes would have a function named predict
. But their working would be a little different:
- For
pkg
: it would be a wrapper, callingpredict()
of model layer. See here for a basic idea how it would look like. - For
model
layer: it would be the actualpredict()
that would wrap thetrainer.predict()
and callbacks.
Now looking back at the high level summary in the EP, I think this is not clear enough, I would add more clear pointers :)
- considering alternatives, e.g., having
predict
,predict_quantiles
, andpredict_raw
, as opposed to amode
argument.
Hmm that is a good suggestion, and i agree it would be a a mess of if/else blocks. From here I think we could merge both ideas? Like for the wrapper, we would keep the modes, but for each mode, we call a different predict
inside the pkg
layer - this would make the code cleaner at model layer and at pkg
layer it would just be a if/else block where we call a different type of predict for each mode.
Something like this:
Vignette would remain the same
prediction_output = pkg.predict(
data_module,
mode="quantiles",
return_info=["index"] # return index to get time_idx and groups
)
pkg
layer predict
def predict(self, dataset, mode, return_info, write_interval, output_dir, to_dataframe,
trainer_kwargs, **anyother_param_and_kwargs):
predict_dm = self._build_datamodule(dataset)
dataloader= self._create_dataloaders(predict_dm)
if mode == "predict":
self.model.predict(...)
elif mode == "raw":
self.model.predict_raw(...)
elif mode == "quantile":
self.model.predict_quantile(...)
else:
raise ValueError
I think this would be useful for the user? just change the mode, and rest is handled by the pkg
layer. If there are different predict
s at pkg
layer, that would mean the user would have to change the whole function (and maybe even the signature, based on the requirements of the func) which would otherwise be handled by the wrapper.
What do you think?
return_info: list[str] | None = None, # e.g. ["index", "x", "y", "decoder_lengths"] | ||
write_interval, # when to write to the disk? | ||
output_dir: str | None = None, # if provided, stream to disk | ||
**trainer_kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question/idea: in D2, should trainer_kwargs
move to __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These trainer_kwargs
are used for the trainer
initialization (add some customizations), and as trainer
is used only while predict
I am not sure if we should move it to __init__
. I think a user may want to run fit
on some other kind of trainer
(like run on cpu, not gpu) than predict
(run on gpu). This provides more flexibility? Also, it may be a possibility that we are using a pre-trained model, at that time trainer_kwargs
would be used only by predict
and not anywhere else (atleast I cant think of any other place).
That's why it is not kept at __init__
. We keep trainer_cfg
for the fit
in __init__
, but if the user want a little different predict
they could pass trainer_kwargs
.
What are your thoughts on this?
..., # other params like target_normalizer, num_workers etc | ||
) | ||
# init package | ||
pkg = model_pkg(model_cfg, trainer_cfg=trainer_cfg, datamodule_cfg=datamodule_cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting. Question regarding alternatives: why not **trainer_cfg, **datamodule_cfg
? Have you explicitly considered both options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well model_pkg(model_cfg, trainer_cfg=trainer_cfg, datamodule_cfg=datamodule_cfg)
keeps the cfgs
separate, I am not sure **trainer_cfg
, **datamodule_cfg
could be used..
Are you saying something like this?
model_pkg(model_cfg, **trainer_cfg, **datamodule_cfg)
- I think this would be harder to parse?
From my understanding, something like this:
pkg = model_pkg(model_cfg, **trainer_cfg, **datamodule_cfg)
would unpack trainer_cfg
and datamodule_cfg
and would pass the keys as args to the class. So the class should have all the parameters of trainer
and datamodule
that would make __init__
very complex where most of its params would only be used to initialise the datamodules and trainer? so why dont we keep them separate as dicts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice design, I think - some questions above.
A proposal for the
predict
ofpytorch-forecasting
v2(Copied from the hackmd: https://hackmd.io/@Pm5-sJBvSfeR6I59oCaLOA/BJIqEgYDlg/edit)