there doesn't seem to be an elegant way to combine groupBy and pivoting . am I missing something? #76

disleyland · 2025-01-23T12:52:06Z

disleyland
Jan 23, 2025

in Arquero.js, there is a nice and elegant chain syntax for combining groupby and pivoting.... data.groupby().pivot(). Yet, for the life of me, I can't figure out an equivalent in Tidy.

neither
Tidy(data, groupBy('key',pivotwider(...))
nor
Tidy(data, groupBy('key').pivotwider(...))
works.

Claude 3.5 and ChatGPT both hallucinate on this issue and suggest things that look right, but don't actually work. Just looking for any tips on whether there is some trick I'm missing, or confirmation that it is not possible without nesting tidy functions.

the problem with Arquero is that its meant for big datasets and so convert the json (and thereby converts column types like dates to string) which becomes a pain when you want to go back to standard json.

Answered by pbeshai

Jan 24, 2025

You can pass as many tidy functions to the second argument of groupBy as you'd like:

const data = [
  { group: 'a', type: 'one',   place: 'canada',   val: 1 },
  { group: 'a', type: 'one',   place: 'usa',      val: 10 },
  { group: 'b', type: 'one',   place: 'campbell', val: 4 },
  { group: 'b', type: 'one',   place: 'brampton', val: 8 },
  { group: 'a', type: 'two',   place: 'brampton', val: 7 },
  { group: 'b', type: 'two',   place: 'boston',   val: 3 },
  { group: 'b', type: 'two',   place: 'usa',      val: 11 },
  { group: 'a', type: 'three', place: 'canada',   val: 20 },
];

output = tidy(
  data,
  groupBy(['group'], [
    pivotWider({
      namesFrom: 'place',
      valuesFrom: 'val',

View full answer

pbeshai · 2025-01-24T16:44:32Z

pbeshai
Jan 24, 2025
Maintainer

You can pass as many tidy functions to the second argument of groupBy as you'd like:

const data = [
  { group: 'a', type: 'one',   place: 'canada',   val: 1 },
  { group: 'a', type: 'one',   place: 'usa',      val: 10 },
  { group: 'b', type: 'one',   place: 'campbell', val: 4 },
  { group: 'b', type: 'one',   place: 'brampton', val: 8 },
  { group: 'a', type: 'two',   place: 'brampton', val: 7 },
  { group: 'b', type: 'two',   place: 'boston',   val: 3 },
  { group: 'b', type: 'two',   place: 'usa',      val: 11 },
  { group: 'a', type: 'three', place: 'canada',   val: 20 },
];

output = tidy(
  data,
  groupBy(['group'], [
    pivotWider({
      namesFrom: 'place',
      valuesFrom: 'val',
    }),
    summarize({
      types: items => Array.from(new Set(items.map(d => d.type))),
      canada: sum('canada'),
      usa: sum('usa'),
      brampton: sum('brampton'),
      campbell: sum('campbell'),
      boston: sum('boston')
    })
  ])
);

// outputs:
[
  {
    "group": "a",
    "types": ["one", "two", "three"],
    "canada": 21,
    "usa": 10,
    "brampton": 7,
    "campbell": 0,
    "boston": 0
  },
  {
    "group": "b",
    "types": ["one", "two"],
    "canada": 0,
    "usa": 11,
    "brampton": 8,
    "campbell": 4,
    "boston": 3
  }
]

1 reply

disleyland Jan 24, 2025
Author

hmm. seems like two different issues then I was facing.

I was using https://unpkg.com/@tidyjs/[email protected]/dist/umd/tidy.min.js

Even this version wasn't giving me your outputs, only returned a single array item.

So I went to your observableHQ page, there your example worked as expected, but it still didn't work with my data table.

here's my example

data = [{"date":"2006-04-01T00:00:00.000Z","value":8940,"type":"quarterly","year":2006,"month":3,"quarter":"Q1"},{"date":"2006-07-01T00:00:00.000Z","value":8009,"type":"quarterly","year":2006,"month":6,"quarter":"Q2"},{"date":"2006-09-30T00:00:00.000Z","value":8739,"type":"quarterly","year":2006,"month":8,"quarter":"Q3"},{"date":"2006-12-30T00:00:00.000Z","value":9694,"type":"quarterly","year":2006,"month":11,"quarter":"Q4"},{"date":"2007-03-31T00:00:00.000Z","value":8852,"type":"quarterly","year":2007,"month":2,"quarter":"Q1"},{"date":"2007-06-30T00:00:00.000Z","value":8680,"type":"quarterly","year":2007,"month":5,"quarter":"Q2"},{"date":"2007-09-29T00:00:00.000Z","value":10090,"type":"quarterly","year":2007,"month":8,"quarter":"Q3"}]

now this returns an empty array even in observable.

tidy(
data,
groupBy(['year'], [
pivotLonger({
namesFrom: 'quarter',
valuesFrom: 'value',
})
])
)

?

pbeshai · 2025-01-24T18:00:42Z

pbeshai
Jan 24, 2025
Maintainer

I think you are accidentally calling pivotLonger when you want pivotWider. Additionally, since you have some columns like date and month, it will perhaps make the pivotWider not do what you want if I'm understanding correctly. I also don't think the groupBy is necessary. Consider this approach:

const data = [
  {"date":"2006-04-01T00:00:00.000Z","value":8940,"type":"quarterly","year":2006,"month":3,"quarter":"Q1"},
  {"date":"2006-07-01T00:00:00.000Z","value":8009,"type":"quarterly","year":2006,"month":6,"quarter":"Q2"},
  {"date":"2006-09-30T00:00:00.000Z","value":8739,"type":"quarterly","year":2006,"month":8,"quarter":"Q3"},
  {"date":"2006-12-30T00:00:00.000Z","value":9694,"type":"quarterly","year":2006,"month":11,"quarter":"Q4"},
  {"date":"2007-03-31T00:00:00.000Z","value":8852,"type":"quarterly","year":2007,"month":2,"quarter":"Q1"},
  {"date":"2007-06-30T00:00:00.000Z","value":8680,"type":"quarterly","year":2007,"month":5,"quarter":"Q2"},
  {"date":"2007-09-29T00:00:00.000Z","value":10090,"type":"quarterly","year":2007,"month":8,"quarter":"Q3"}
]

output = tidy(
  data,
  select(['-date', '-month']),
  pivotWider({
    namesFrom: 'quarter',
    valuesFrom: 'value',
  })
)

// outputs:
[
  {"type": "quarterly", "year": 2006, "Q1": 8940, "Q2": 8009, "Q3": 8739, "Q4": 9694},
  {"type": "quarterly", "year": 2007, "Q1": 8852, "Q2": 8680, "Q3": 10090}
]

1 reply

disleyland Jan 24, 2025
Author

ah yes wider. :)

ok. so tidy is more sensitive to what other keys are in the dataset then I'm use to coming from Arquero. arquero just ignores the other keys not specified in the pivot command. interestingly, Arquero pivot logic actually needs me to use groupby before the pivot to generate the same output above.

So what is the logic in how tidy pivot, without an explicit groupBy command, figures out itself if it should group by one of the input columns? In your example, your removed two columns in order to "trigger" the right groupby operation by pivotWider. just wondering how you knew that was the right way to solve the problem ... :)

pbeshai · 2025-01-24T20:25:39Z

pbeshai
Jan 24, 2025
Maintainer

Generally, the API is modeled after pivot_wider from tidyr. It just uses all columns that aren't being pivoted to determine the unique groups to make wider (i.e. it preserves the values in non-pivoted columns). If you don't drop them in select, you end up with only 1 of Q1,Q2,Q3,Q4 populated per item since the date and month column need to be maintained.

1 reply

disleyland Jan 24, 2025
Author

gotcha. many thanks for the help and your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

there doesn't seem to be an elegant way to combine groupBy and pivoting . am I missing something? #76

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

there doesn't seem to be an elegant way to combine groupBy and pivoting . am I missing something? #76

disleyland Jan 23, 2025

Replies: 3 comments · 3 replies

pbeshai Jan 24, 2025 Maintainer

disleyland Jan 24, 2025 Author

pbeshai Jan 24, 2025 Maintainer

disleyland Jan 24, 2025 Author

pbeshai Jan 24, 2025 Maintainer

disleyland Jan 24, 2025 Author

disleyland
Jan 23, 2025

Replies: 3 comments 3 replies

pbeshai
Jan 24, 2025
Maintainer

disleyland Jan 24, 2025
Author

pbeshai
Jan 24, 2025
Maintainer

disleyland Jan 24, 2025
Author

pbeshai
Jan 24, 2025
Maintainer

disleyland Jan 24, 2025
Author