Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csvhandlerpolar #30

Merged
merged 10 commits into from
Mar 13, 2024
Merged

Csvhandlerpolar #30

merged 10 commits into from
Mar 13, 2024

Conversation

mathysgrapotte
Copy link
Contributor

There are now two classes for handling csv, CsvParser (for parsing the data, doing noising etc.) and CsvLoader (for loading the data into models). Additionally both now inherit from the same CsvHandler class.

Then, Polars has replaced Pandas as it is much faster.

@mathysgrapotte
Copy link
Contributor Author

Please do not delete branch as it requires testing of some functionalities.

@mathysgrapotte mathysgrapotte marked this pull request as draft March 13, 2024 08:47
@mathysgrapotte mathysgrapotte marked this pull request as ready for review March 13, 2024 09:08
@mathysgrapotte mathysgrapotte self-assigned this Mar 13, 2024
@mathysgrapotte
Copy link
Contributor Author

branch can be deleted

@suzannejin suzannejin merged commit c182af8 into main Mar 13, 2024
1 check passed
@mathysgrapotte mathysgrapotte deleted the csvhandlerpolar branch March 13, 2024 10:55
@alessiovignoli
Copy link
Contributor

The metadata now is the belonging to one of the sets. But is the split function (selected by user) going to assign this value modifying the csv in input?

@mathysgrapotte
Copy link
Contributor Author

mathysgrapotte commented Mar 13, 2024

Yes, the split:meta:int column is special, it will be added or modified by the CsvParser but can also be passed by the user in the input csv

@alessiovignoli
Copy link
Contributor

ok i like it but the skip of split when user passes it should be handled correctly. Can you open a issue for it?

@mathysgrapotte
Copy link
Contributor Author

How do you see this happening ? Currently, we throw an error if there is no split:meta:int column, I think that even if a split:meta:int column exists, if a splitter is called, this wil be overwritten, if a splitter isn't called, it will stay as is.

@alessiovignoli
Copy link
Contributor

the first error is more than fine because it caver cases in which the user did not specify neither manually neither explicitly a splitter scheme. I would like to have another error in case that column already exist and the split mthod is called on such a csv. Going something like "a split meta column is already present but a splitter function has been called. remove the split meta column if you want to call a split function."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants