Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large data multivariate time series #571

Open
jafa2222 opened this issue Oct 13, 2024 · 3 comments
Open

Large data multivariate time series #571

jafa2222 opened this issue Oct 13, 2024 · 3 comments
Labels
question Further information is requested

Comments

@jafa2222
Copy link

Description

I have a very large dataset, around 500 gigabytes, which contains various folders and subfolders with multiple Excel files. Due to limited RAM, it is not possible to use these files simultaneously (especially for data normalization). Is there a solution or example for this issue? The folders are structured as follows, with each Excel file being a multivariate time series.

location1
PP17-01 2023-01-02
>1.xlsx
PP17-01 2023-01-03
>2.xlsx
location2
PP18-01 2024-01-02
>1.xlsx
PP18-01 2024-01-03
>2.xlsx

@sarahmish
Copy link
Collaborator

Hi @jafa2222 – thank you for opening this issue!

With Orion, you should model each entity separately. For example, if PP17 is a different entity than PP18, then you should create two models, one for each.

For scalability, please refer to issue #567 where I suggest some solutions to loading the data in-memory.

@sarahmish sarahmish added the question Further information is requested label Oct 14, 2024
@jafa2222
Copy link
Author

Hi @jafa2222 – thank you for opening this issue!

With Orion, you should model each entity separately. For example, if PP17 is a different entity than PP18, then you should create two models, one for each.

For scalability, please refer to issue #567 where I suggest some solutions to loading the data in-memory.

Thank you for the clarification and the helpful reference. These files actually correspond to different sensors, but they are present in all files. Our goal is to train the network on this normal operational data and then test it on faulty data. Are you suggesting that we need to develop a separate model for each sensor, rather than using an approach like LSTM AE where multiple sensors are fees to model, and the reconstruction error is measured?

@sarahmish
Copy link
Collaborator

Yes @jafa2222, current models in Orion support one-sensor detections rather than multiple. This is because most models are learn a pattern of one signal, making it better at learning what the expected pattern should look like.

The first blog post we highlight in our readme talks about the general framework followed by these models if you'd like to read more about this topic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants