there doesn't seem to be an elegant way to combine groupBy and pivoting . am I missing something? #76
-
in Arquero.js, there is a nice and elegant chain syntax for combining groupby and pivoting.... data.groupby().pivot(). Yet, for the life of me, I can't figure out an equivalent in Tidy. neither Claude 3.5 and ChatGPT both hallucinate on this issue and suggest things that look right, but don't actually work. Just looking for any tips on whether there is some trick I'm missing, or confirmation that it is not possible without nesting tidy functions. the problem with Arquero is that its meant for big datasets and so convert the json (and thereby converts column types like dates to string) which becomes a pain when you want to go back to standard json. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
You can pass as many tidy functions to the second argument of groupBy as you'd like: const data = [
{ group: 'a', type: 'one', place: 'canada', val: 1 },
{ group: 'a', type: 'one', place: 'usa', val: 10 },
{ group: 'b', type: 'one', place: 'campbell', val: 4 },
{ group: 'b', type: 'one', place: 'brampton', val: 8 },
{ group: 'a', type: 'two', place: 'brampton', val: 7 },
{ group: 'b', type: 'two', place: 'boston', val: 3 },
{ group: 'b', type: 'two', place: 'usa', val: 11 },
{ group: 'a', type: 'three', place: 'canada', val: 20 },
];
output = tidy(
data,
groupBy(['group'], [
pivotWider({
namesFrom: 'place',
valuesFrom: 'val',
}),
summarize({
types: items => Array.from(new Set(items.map(d => d.type))),
canada: sum('canada'),
usa: sum('usa'),
brampton: sum('brampton'),
campbell: sum('campbell'),
boston: sum('boston')
})
])
);
// outputs:
[
{
"group": "a",
"types": ["one", "two", "three"],
"canada": 21,
"usa": 10,
"brampton": 7,
"campbell": 0,
"boston": 0
},
{
"group": "b",
"types": ["one", "two"],
"canada": 0,
"usa": 11,
"brampton": 8,
"campbell": 4,
"boston": 3
}
] |
Beta Was this translation helpful? Give feedback.
-
I think you are accidentally calling const data = [
{"date":"2006-04-01T00:00:00.000Z","value":8940,"type":"quarterly","year":2006,"month":3,"quarter":"Q1"},
{"date":"2006-07-01T00:00:00.000Z","value":8009,"type":"quarterly","year":2006,"month":6,"quarter":"Q2"},
{"date":"2006-09-30T00:00:00.000Z","value":8739,"type":"quarterly","year":2006,"month":8,"quarter":"Q3"},
{"date":"2006-12-30T00:00:00.000Z","value":9694,"type":"quarterly","year":2006,"month":11,"quarter":"Q4"},
{"date":"2007-03-31T00:00:00.000Z","value":8852,"type":"quarterly","year":2007,"month":2,"quarter":"Q1"},
{"date":"2007-06-30T00:00:00.000Z","value":8680,"type":"quarterly","year":2007,"month":5,"quarter":"Q2"},
{"date":"2007-09-29T00:00:00.000Z","value":10090,"type":"quarterly","year":2007,"month":8,"quarter":"Q3"}
]
output = tidy(
data,
select(['-date', '-month']),
pivotWider({
namesFrom: 'quarter',
valuesFrom: 'value',
})
)
// outputs:
[
{"type": "quarterly", "year": 2006, "Q1": 8940, "Q2": 8009, "Q3": 8739, "Q4": 9694},
{"type": "quarterly", "year": 2007, "Q1": 8852, "Q2": 8680, "Q3": 10090}
] |
Beta Was this translation helpful? Give feedback.
-
Generally, the API is modeled after pivot_wider from tidyr. It just uses all columns that aren't being pivoted to determine the unique groups to make wider (i.e. it preserves the values in non-pivoted columns). If you don't drop them in select, you end up with only 1 of Q1,Q2,Q3,Q4 populated per item since the date and month column need to be maintained. |
Beta Was this translation helpful? Give feedback.
You can pass as many tidy functions to the second argument of groupBy as you'd like: