Skip to content

Commit 1aaffe1

Browse files
authored
Merge pull request #132 from aunderwo/master
Improve speed of adding extra rows before calling get_dummies
2 parents e201bf2 + 09c13a7 commit 1aaffe1

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

pangolin/scripts/pangolearn.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -139,9 +139,8 @@ def readInAndFormatData(sequencesFile, indiciesToKeep, blockSize=1000):
139139

140140
# add extra rows to ensure all of the categories are represented, as otherwise
141141
# not enough columns will be created when we call get_dummies
142-
for i in categories:
143-
line = [i] * len(indiciesToKeep)
144-
df.loc[len(df)] = line
142+
extra_rows = [[i] * len(indiciesToKeep) for i in categories]
143+
df = pd.concat([df, pd.DataFrame(extra_rows, columns = indiciesToKeep)], ignore_index=True)
145144

146145
# get one-hot encoding
147146
df = pd.get_dummies(df, columns=indiciesToKeep)

0 commit comments

Comments
 (0)