Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csc fetch-data in MacOSX arm vs. Ubuntu X86, data size is 71.35 GB vs. 1.3T? #91

Open
FrankieMAN-YU opened this issue Feb 2, 2025 · 7 comments
Assignees
Labels
Partipant Issue A code issue/bug encountered by challenge participant(s).

Comments

@FrankieMAN-YU
Copy link

Beginner to join this competition, I wander where is the train_crop_manifest.csv that indicates all the labeled crops' corresponding raw fib-sem images, that is "voxel_size" "translation" "shape", etc.

And anothor questions: Is all the s0 resolution 2nm?

Sincerely waiting for your answers!

@FrankieMAN-YU
Copy link
Author

New question, i ran command csc fetch-data both in MacOSX arm and Ubuntu X86, data size is 71.35 GB and 1.3T, why there have a big gap? What's the annoted data size?

@rhoadesScholar rhoadesScholar added the Partipant Issue A code issue/bug encountered by challenge participant(s). label Feb 6, 2025
@rhoadesScholar rhoadesScholar changed the title Where is the train_crop_manifest.csv? csc fetch-data in MacOSX arm vs. Ubuntu X86, data size is 71.35 GB vs. 1.3T? Feb 6, 2025
@rhoadesScholar
Copy link
Member

rhoadesScholar commented Feb 6, 2025

Hi @FrankieMAN-YU! For your first two questions, I am hoping to get these Q&A type questions in the repo Discussions, so everyone will continue to easily see past questions/answers as the competition goes on. :)

As such I have answered your questions here:

That said, this data size difference seems to potentially be more of a bug, so let's keep it as an issue.

  1. Just to verify, you ran csc fetch-data with exactly the same arguments (and what were they), and same version of the repo?
  2. Have you tried running csc visualize to look at the data and visually compare what whether the same chunks/resolutions are there?

@FrankieMAN-YU
Copy link
Author

Hi @rhoadesScholar !
For question 1: I ran exactly the same arguments csc fetch-data, and lib version remain same.
For question 2: I'm going to try this command sooner and will update it as soon as possible.

@rhoadesScholar
Copy link
Member

rhoadesScholar commented Feb 7, 2025

For question 1: I ran exactly the same arguments csc fetch-data, and lib version remain same.

@FrankieMAN-YU
Do you happen to know the approximate dates of each of the downloads?

@FrankieMAN-YU
Copy link
Author

For MacOSX:
jrc_cos7-1a: 7.59G
jrc_cos7-1b: 1.53G
jrc_ctl-id8-1: 171MB
jrc_fly-mb-1a: 355.4MB
jrc_fly-vnc-1: 30.72GB
jrc_hela-2: 1.25GB
jrc_hela-3: 922.1MB
jrc_jurkat-1: 564.5MB
jrc_macrophage-2: 551.9MB
jrc_mus-heart-1: 222MB
jrc_mus-kidney: 1.21GB
jrc_mus-kidney-3: 53.8MB
jrc_mus-kidney-glomerulus: 236.6MB
jrc_mus-liver: 1.14GB
jrc_mus-liver-3: 55.5MB
jrc_mus-liver-zon-1: 9.56GB
jrc_mus-liver-zon-2: 10.24GB
jrc_mus-nacc-1: 153.8MB
jrc_sum159-1: 223.8MB
jrc_sum159-4: 686.7MB
jrc_ut21-1413-003: 923.2MB
jrc_zf-cardiac-1: 509.7MB

And i found that some of the datasets are relatively small (or abnormal, e.g. jrc_mus-liver-3 only 55.5MB), some of the datasets are relatively large (more than 1GB, this may be more normal), and i just using csc fetch-data CLI command, it seems that download all the dataset for full resolution? (jrc_cos7-1a nearly 8GB)
@rhoadesScholar Sincerely looking forward your early reply!

@FrankieMAN-YU
Copy link
Author

Hi @rhoadesScholar , I write a script that directly download groundtruth and corresponding raw images, and checked the image and groundtruth, but have a few questions:

  1. Each data chunk, such as jrc_mus-kidney-3 and jrc_mus-liver-zon-1, has a different number of categories in the groundtruth folder. The all folder represents the aggregation of all categories, but for jrc_mus-kidney-3/crop472, I found that only category indices [0, 37] are present, and for the chlor category, only index [0] is available. Is this normal?

  2. For all downloaded all groundtruth files, does each unique index represent the same category?

  3. Is the directly downloaded data type semantic segmentation? How are the labels for instance segmentation obtained?

  4. I found some abnormal crop labels, such as jcr_hela-3/crop60, where the annotated groundtruth is all 0, and the original image appears to be random noise.

@paperplane03
Copy link

Yes, I find the latest csc fetch-data will download 1.3T data on Ubuntu server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Partipant Issue A code issue/bug encountered by challenge participant(s).
Projects
None yet
Development

No branches or pull requests

4 participants