-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can Orion handle training a 2TB dataset? #567
Comments
Hi @bigmisspanda – thank you for your question! You are right, all the preprocessing primitives require to be in memory. One work around can be to replace these primitives with your own scalable functions and then start the Orion pipeline from the modeling primitive directly. Another can be to chunk up your training data and training the pipeline on each chunk. |
Yes, thank you for your help. I understand what you mean. My plan is to use
Is my approach feasible? Can |
Your plan looks logical to me! I'm not too familiar with what |
The concept of |
Description
In my case, the training data is very large and cannot be loaded into memory all at once. It seems that
time_segments_aggregate
,SimpleImputer
,MinMaxScaler
, androlling_window_sequences
in the pipeline all require the data to be stored in memory. Can Orion handle training a 2-10TB dataset?The text was updated successfully, but these errors were encountered: