You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are get annoying file diff triggers when reprocessing the pipeline, even if nothing changes in the file. This is important to fix so that we are able to isolate actual changes that result from reprocessing output data.
As @shntnu notes in #48 the reason why the gzip files are triggering positive diffs, is because of an added timestamp.
Fortunately, it looks like pandas-dev/pandas#33398 has added the ability to include args to pandas gzip compression. This improvement will be included in pandas version 1.1, which is scheduled for an Aug 1 release.
Three Options
pandas v1.1 option (assuming that it solves this problem!)
We are get annoying file diff triggers when reprocessing the pipeline, even if nothing changes in the file. This is important to fix so that we are able to isolate actual changes that result from reprocessing output data.
As @shntnu notes in #48 the reason why the gzip files are triggering positive diffs, is because of an added timestamp.
The way to remove the timestamp from the file is to pass a
--no-name
(-n
) flag to the gzip command. See http://linuxcommand.org/lc3_man_pages/gzip1.htmlFortunately, it looks like pandas-dev/pandas#33398 has added the ability to include args to pandas
gzip
compression. This improvement will be included in pandas version 1.1, which is scheduled for an Aug 1 release.Three Options
For the pandas or python option, the solution should ideally live in
pycytominer
. I've created a stub for this at cytomining/pycytominer#83The text was updated successfully, but these errors were encountered: