Skip to content

ENH: implement drop_levels argument in loc #35418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arw2019 opened this issue Jul 27, 2020 · 7 comments
Closed

ENH: implement drop_levels argument in loc #35418

arw2019 opened this issue Jul 27, 2020 · 7 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@arw2019
Copy link
Member

arw2019 commented Jul 27, 2020

xref #6249, #18262, #35301

Following up on #35301. This is an outstanding piece of DataFrame.xs functionality that's currently not available in .loc. Since xs is slated for deprecation I'd like implement this in .loc

Example

Let's define a multiindexed dataframe (example from xs docs)

   ...:  import pandas as pd
   ...:  
   ...: d = {  
   ...:     'num_legs': [4, 4, 2, 2],  
   ...:     'num_wings': [0, 0, 2, 2],  
   ...:     'class': ['mammal', 'mammal', 'mammal', 'bird'],  
   ...:     'animal': ['cat', 'dog', 'bat', 'penguin'],  
   ...:     'locomotion': ['walks', 'walks', 'flies', 'walks'] 
   ...:     }  
   ...:  
   ...: df = pd.DataFrame(data=d) 
   ...: df.set_index(['class', 'animal', 'locomotion'], inplace=True)                                                 

In [2]: df                                                                                                            
Out[2]: 
                           num_legs  num_wings
class  animal  locomotion                     
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2

xs and .loc both let us take a cross-section through a MultiIndex

In [3]: import pandas._testing as tm 
    ...: 
    ...: res_xs = df.xs(('mammal', slice(None))) 
    ...: res_loc  = df.loc[('mammal', slice(None))] 
    ...: tm.assert_frame_equal(res_xs, res_loc)                                                                       

but with xs we can choose to keep the index columns through which we're slicing using the drop_level argument:

In [4]: df.xs(('mammal', slice(None), 'flies'), drop_level=False)                                                    
Out[4]: 
                          num_legs  num_wings
class  animal locomotion                     
mammal bat    flies              2          2

new API

Looking for ideas on what's appropriate here. Based off of suggestions in #6249 would something like this work?

In [ ]: df.loc(drop_levels=False)[('mammal', slice(None), 'flies')]      
Out[4]: 
                          num_legs  num_wings
class  animal locomotion                     
mammal bat    flies              2          2
                 
@arw2019 arw2019 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 27, 2020
@simonjayhawkins
Copy link
Member

Thanks @arw2019 for writing this up.

Looking for ideas on what's appropriate here. Based off of suggestions in #6249 would something like this work?

I think that would make sense.

we already have an axis parameter

>>> df.loc(axis=1)["num_legs"]
class   animal   locomotion
mammal  cat      walks         4
        dog      walks         4
        bat      flies         2
bird    penguin  walks         2
Name: num_legs, dtype: int64
>>>

If axis is specified, then we would not drop levels from the axis given if drop_levels=False

However, if axis is not specified and we have a 2d indexer in order to allow the behaviour to be controlled for both axis, we may need to have drop_levels={True, False, 'index', 'columns'}

@simonjayhawkins simonjayhawkins added API Design Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 27, 2020
@simonjayhawkins
Copy link
Member

There may be api breaking implications with adding a drop_levels parameter.

currently

>>> df.loc[(slice(None), "cat", slice(None))]
                   num_legs  num_wings
class  locomotion
mammal walks              4          0
>>>
>>> df.loc[(slice(None), "cat"), :]
                          num_legs  num_wings
class  animal locomotion
mammal cat    walks              4          0
>>>
>>> df.loc[(slice(None), "cat", slice(None)), :]
                          num_legs  num_wings
class  animal locomotion
mammal cat    walks              4          0
>>>

so I don't think that the default could be True

@TomAugspurger
Copy link
Contributor

Haven't read too closely yet, but I don't think we should treat xs being able to do something loc can't as an argument for making loc more complicated. Rather, that's an argument for not deprecating xs.

@jbrockmendel
Copy link
Member

I agree with Tom

@jbrockmendel
Copy link
Member

Where does it say xs is slated for deprecation?

@simonjayhawkins
Copy link
Member

listed under potential at #18262 with a couple of -1's in the discussion and #6249 (comment)

@mroeschke
Copy link
Member

Seems to be not much enthusiasm for this feature even if xs doesn't get deprecated, so closing due to lack of support. Can reopen if another use case supporting this feature is raised

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

5 participants