-
Notifications
You must be signed in to change notification settings - Fork 415
Name columns #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like this. We can make the Do you believe this is something you could implement? I can help reviewing the PR. |
You can find some code I created to keep track of how many output columns did a feature get expanded to here: https://github.com/paulgb/sklearn-pandas/pull/56/files There is some discussion regarding that also in #54 |
OK here's a PR for the functionality. Let me know if you want me to make any changes, or feel free to edit it however you like. |
Great! Go right ahead - and thanks. |
A pleasure. :) Closing as I understand this feature is included in the fact that pandas dataframes have named columns. |
I have a custom Transformer, where In this case I think that the following lines will break the transforming: if hasattr(t, 'classes_') and (len(t.classes_)>2):
return [c + '_' + o for o in t.classes_] Because in my case the number of classes is not equals to the number of columns. Might I be using |
@arnau126 The code to generate the column names is just inferring them because some sklearn transformers use this internal attribute as the unique class names, and generate I believe the best solution here is to check that the number of columns of the transformer output is equal to @arnau126 do you believe you can create a PR for this? Tests are run using |
I came across this discussion looking for column names to interpret feature importance of a classifier. Am I correct to assume that using the dataframe output is currently the only way to get the column names? For what I want to do I'm actually quite happy with matrix output, but I still need the column names. (I'm using a bunch of LabelBinarizers – this is the reason why feature importance is hard to make sense of without naming). |
We can for sure find a way to generate and output the names without outputting a dataframe, yes. Do you think this is something you can contribute? |
Yes, I started looking at the naming function anyway. I'll give it a try, but it may take a little due to my workload. |
No hurries, let me know if I can help with pointers about how to do the testing or whatever. |
Let's continue talking about this in #78 |
It would be nice to have a way to specify the names of columns that are created by a transform, such that later on you could pass mapper.names (or similar) to any functions that expect a list of column names (eg variable importance) or for use in any charts where you would want to label the columns with their names.
This could default to the name of the pandas column that created it (if there's only one input and output) or the input columns joined with '_' if there's multiple inputs, and the name concatenated with '_1', '_2' etc if there's multiple outputs.
The text was updated successfully, but these errors were encountered: