-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pivoting #28
Comments
Reshaping TheoryReshaping has been done through various packages, the best known being {stats} (in base R), There are 2 main types of reshaping, reshaping to wide, and reshaping to long. reshaping widein our source table we have 4 types of columns :
The new columns will require names, they are in all existing solutions built from Finally, we might have a way to aggregate the values when the sets of ids and key columns Because of this important number of parameters, all approaches use default values to reduce Other differences are :
pivot to wider
reshaping longWhen reshaping long, the key and value columns don't exist yet pivot to longer
for |
Pivoting to wider
so, taking the simplest example from
We can write :
A few observations :
Let's see how we would handle the multi spread :
Our proposal is less general than
Final example :
Here our solution is very straightforward :
with {tidyr} I think we cannot do the following without copying the breaks column first:
I looked at all the examples from
I don't have a clear idea on how to fill in NAs and it's a useful feature so I must keep thinking. Here's a crazy idea, using a pseudo op
Margins are not an issue specific to reshaping but specific to aggregation, tidyverse won't work on this because margins are not tidy (one row is one obs). But still they're sometimes useful. to be treated in another issue. |
Pivoting to longer We need : id columns, key columns, value column and varying columns That's one more than to pivot to wider, and it makes the challenge much harder. The simplest way is to use stack (if we want to stay in base R) :
becomes :
And then we rename our columns, because stack doesn't support custom names. Note that we can summarize to more than one row per group, This is not satisfactory because Let's try to rewrite the above with a candidate syntax :
We solve the problem of having one more type of column by using 2 on the lhs. I believe the order is intuitive, first the labels, then the values. next example :
The issue here is we need to spot a column, and to rename it, we can get close with the following, using previous syntax, but we'll have to rename downstream, and filter out. The output is still arguably more readable, through I would have liked to do the renaming in the gathering operation.
We can use regex and capture groups, we could support named capture groups in case of ambiguity :
A difference is that all columns that match the regex are to be kept, while in principle, the next example :
Here we are going to create several columns, it's quite straightforward following our last example :
last example : This one is tricky because we want several value columns,
We can borrow the idea and do :
I believe we're better off imposing that the last name on the lhs be the value column(s), this means we have to use named capture to solve this one. but we don't have to name all groups, it can work like named argument to functions, we go by position after matching names. |
some ambiguity remains This looks like a summarizing call and i'll have to inspect the symbols to know it's a pivot op :
Although, strings are highlighter and it would look special, so maybe not that bad. And this is weird too :
Here it's not ambiguous because the lhs is not a symbol, and not a call to Now would we understand right away that the above are pivot operations ? it would be nice to tag them in some way We often click on [+] to expand and on [-] to collapse, so maybe we could use those ?
maybe better:
Or we can use a different symbol to signal we're pivoting, but we're a bit short on those, this doesn't look so good:
Or some infix, used for syntax only, or containing the actual logic :
They don't look really good though, and I don't like using |
A couple months later I think to pivot wider and longer respectively those are fine :
I still think
This has the fill value on the lhs, where we expect only names, so not very intuitive. or
It uses the mutate by syntax, under a summarizing call, which is confusing. or
This looks clunky for sure, but is readable enough. Can be shortened :
we can use To be compared with
|
more on the "fill" syntax. I think reusing
We could also have a special pseudo operator, or consider
There's also an option to use
|
A crazy idea that would solve the ambiguity, we use
We'd use |
It's part of #29 .
I want to find a syntax that sees pivoting as a grouped operation and supports easier multi-spread or multi-gather ops.
The text was updated successfully, but these errors were encountered: