You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For ease of interaction in a spreadsheet, people will often "join" together a parent dataset and a child dataset together in the same tab or sheet in a way that doesn't repeat parent data over and over. For example, let's say you have a parent dataset (e.g., companies) and a child dataset (e.g., contacts at those companies).
The Panhandled Dataset
Parent Dataset
A
B
valueA1
valueB1
valueA2
valueB2
Child Dataset
A
C
D
valueA1
valueC1
valueD1
valueA2
valueC2
valueD2
valueA1
valueC3
valueD3
valueA1
valueC4
valueD4
valueA2
valueC5
valueD5
valueA2
valueC6
valueD6
Panhandling Them Together
Join together with A as the common column
A
B
C
D
valueA1
valueB1
valueC1
valueD1
valueA1
valueB1
valueC3
valueD3
valueA1
valueB1
valueC4
valueD4
valueA2
valueB2
valueC2
valueD2
valueA2
valueB2
valueC5
valueD5
valueA2
valueB2
valueC6
valueD6
Drop redundant "parent" data
A
B
C
D
valueA1
valueB1
valueC1
valueD1
valueC3
valueD3
valueC4
valueD4
valueA2
valueB2
valueC2
valueD2
valueC5
valueD5
valueC6
valueD6
My Motivation
I needed to import into Salesforce the data from a panhandled spreadsheet with company and contact information mixed together. For this, I needed a properly formed database (tidy) structure to import into the Account object and a separate one to import into the Contact object. I looked through the tidyr package, but didn't see anything that would do what I needed done.
Untangling the Dataset
This is what the script does.
Files
Input
data/original.xlsx
Output
data/parent.csv
data/child.csv
Assumptions for Input File
The different "pans" (parent with children) can be separated by blank lines or not.
All information except the "pans" has been deleted.
The first column contains the keys for the parent and child datasets
Process
Read in the spreadsheet as our original table to work with
In the original table, clear any blank rows used to separate parent/child groupings
In the original table, collect together the rows with first columns that aren't blank and store in the parent table
From the original table, fill the blanks cells in column 1 with the last non-blank value in that column (working down)
Save just column 1 into a one-column (key) table we will use later
From the original table, create a temporary table consisting of only the rows that are blank in the first column
Using this temporary table, get the name of the first column in this table to figure out which number column is the beginning of the child columns in the original (where the panhandle meets the pan)
Use this information to drop the columns that aren' t child columns and bind the remaining columns with the single key column created above to make a child table
Return to the parent table to drop the columns that aren't parent columns