Skip to content

Outputs of CellRangerCount seem to contain duplicated files in an archive #11

Open
@prioux

Description

@prioux

CellRangerCount seems to create a folder with a bunch of files.

Inside that folder there is a .tgz archive that contain a duplication of the files already in that folder. That TAR file contains most, but not all of the files in the output folder.

This should be cleaned up. Either:

  1. the tar file is completely redundant and should be removed before saving the output folder or
  2. the tar file is supposed to be the final product (with a few less files) and we shouldn't even attempt to save the output folder.

How to check as a CBRAIN developer:

a) cd to the root of a finished CellRangerCount task
b) create the list of files in the output dir:

find cellranger_count_res -type f | sort >/tmp/listfiles  # adjust name of folder

c) create the list of files in the .tgz archive

tar -tzf cellranger_count_res/cellranger_count_res.mri.tgz | sort >/tmp/archfiles

d) Compare them with diff, or csdiff:

diff /tmp/listfiles /tmp/archfiles

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions