-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Csvhandlerpolar #30
Csvhandlerpolar #30
Conversation
…fferently based if that is passed as an argument to allow for later on loading different datasets based on the split
…defined in CsvHandler would return an error
Please do not delete branch as it requires testing of some functionalities. |
branch can be deleted |
The metadata now is the belonging to one of the sets. But is the split function (selected by user) going to assign this value modifying the csv in input? |
Yes, the split:meta:int column is special, it will be added or modified by the CsvParser but can also be passed by the user in the input csv |
ok i like it but the skip of split when user passes it should be handled correctly. Can you open a issue for it? |
How do you see this happening ? Currently, we throw an error if there is no split:meta:int column, I think that even if a split:meta:int column exists, if a splitter is called, this wil be overwritten, if a splitter isn't called, it will stay as is. |
the first error is more than fine because it caver cases in which the user did not specify neither manually neither explicitly a splitter scheme. I would like to have another error in case that column already exist and the split mthod is called on such a csv. Going something like "a split meta column is already present but a splitter function has been called. remove the split meta column if you want to call a split function." |
There are now two classes for handling csv, CsvParser (for parsing the data, doing noising etc.) and CsvLoader (for loading the data into models). Additionally both now inherit from the same CsvHandler class.
Then, Polars has replaced Pandas as it is much faster.