-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demos for cosmosCatalog don't apply any cuts #812
Comments
Actually, the default "marginal" settings do impose cuts on a number of aspects of the fits to avoid obviously bad parametric fits, including cuts on size / Sersic n / flux. My suggestion is that (a) we can update the comments to mention this, (b) include a single cut on size or flux to demonstrate that one can do additional cuts in the constructor, with comments explaining that one can apply multiple cuts in this way. Sound OK?
Sorry - I don't entirely understand this comment. If you have a flux limit, and you have a size-flux relation that's a power law with some scatter, then as a rule you DO expect the size distribution to flatten out / roll off... or at least I do...? |
I think on the first point, it would be helpful to showcase the min_hlr, min_flux, etc. options to COSMOSCatalog to point out that these are available (and often desired). The cgc.yaml example does use some of them, but that may be a little too hidden for most users. |
On 10/14/16, Rachel Mandelbaum [email protected] wrote:
I ran a sim with the defaults and got galaxies that were smaller and fainter
Yes, of course, in a measurement. But a simulation should not build that in If the goal is to intentionally not produce such a catalog, but rather just |
I agree it would be nice if we could make a sim without any cuts on the input/parent population at all, but we cannot do that unless we deliberately choose to extrapolate beyond what we observe. That's a legitimate thing to do if you want to (see issue #808), but it's not what this catalog was ever claimed to be. It is a COSMOS-based catalog with a flux limit imposed on it. Once you impose a flux limit, there's an implicit size limit as well. Unless I am misunderstanding what you are asking about? I feel we are talking past each other here... Re: the defaults - one thing to check is the flux normalization compared to any observations you are trying to mimic. That's been a sticking in point in some of my sims. Another issue that has come up when making HSC sims is that COSMOS catalog has had deblending done. But in a ground-based observation, there will be ambiguous blends, so a subset of the galaxies will appear brighter and larger as a result. |
Erin said:
Rachel said:
I think some numbers will help get you two on the same page. I think this is the kind of thing that Erin is having trouble with: import numpy
import galsim
cat = galsim.COSMOSCatalog() # Use default exclusion level == 'marginal' and no other cuts
print 'Total objects passing marginal exclusion =',cat.getNTot()
# Get the raw data about the parametric fits
param = cat.param_cat
flux_radius = param['flux_radius'] # The SExtractor radius
mag_auto = param['mag_auto'] # The SExtractor magnitude
hlr = param['hlr'] # The fitted half light radius from the Sersic or B+D fit
flux = param['flux'] # The fitted flux from the Sersic or B+D fit
good_sersic = (param['fit_status'][:,4] > 0) & (param['fit_status'][:,4] < 5)
good_bulge = (param['fit_status'][:,0] > 0) & (param['fit_status'][:,0] < 5)
print 'Total objects with good Sersic fit =',numpy.sum(good_sersic)
print 'Total objexts with good bulge fit =',numpy.sum(good_bulge)
print 'Range of flux_radius is',min(flux_radius),max(flux_radius)
print 'Range of max_auto is',min(mag_auto),max(mag_auto)
print 'Range of hlr for Sersic fits is',min(hlr[:,0][good_sersic]),max(hlr[:,0][good_sersic])
print 'Range of flux for Sersic fits is',min(flux[:,0][good_sersic]),max(flux[:,0][good_sersic])
print 'Range of hlr for bulge+disk fits is',min(hlr[:,1][good_bulge]),max(hlr[:,1][good_bulge])
print 'Range of flux for bulge+disk fits is',min(flux[:,1][good_bulge]),max(flux[:,1][good_bulge])
Both those zeros and the extremely huge values for flux and hlr are I think the issue. It seems like they should probably have been excluded by the 'marginal' (i.e. most conservative) exclusion level. |
The number 87798 is the total number in the catalog, and not all objects pass the marginal cuts. I think you mean to be using It's definitely the case that when you make Sersic fits to galaxies down to 25.2, you will get some nonsense/crap. The marginal cuts are meant to exclude those, but you are not using the marginal cuts in what's above. But what I'm not sure about is this: are you and Erin concerned about some of the weird fits you're seeing? If so, you should make sure to use the "marginal" cuts, which will help (but may not remove all bad fits, just a few classes of them). I originally thought Erin's concern was about weird fit parameters, but his later comments seemed to indicate that he does not want any cuts applied before making sims. And if that's the case, you have to deal with some bad fits. I don't see any way around this when doing parametric fits to noisy data. |
To be clear, you should be using |
Sorry, you're right. I wasn't applying that properly. When I do, they all list as having good fits for both Sersic and B+D. But it looks like there's still a bunch of garbage objects left even when I do apply that. (Unless I'm still doing it wrong.) import numpy
import galsim
cat = galsim.COSMOSCatalog() # Use default exclusion level == 'marginal' and no other cuts
print 'Total objects passing marginal exclusion =',cat.getNObjects()
# Get the raw data about the parametric fits
orig_index = cat.orig_index # Only the objects that the catalog will use
param = cat.param_cat[orig_index]
flux_radius = param['flux_radius'] # The SExtractor radius
mag_auto = param['mag_auto'] # The SExtractor magnitude
hlr = param['hlr'] # The fitted half light radius from the Sersic or B+D fit
flux = param['flux'] # The fitted flux from the Sersic or B+D fit
use_bulge = param['use_bulgefit'] # Whether the B+D fit is preferred
good_sersic = (param['fit_status'][:,4] > 0) & (param['fit_status'][:,4] < 5)
good_bulge = (param['fit_status'][:,0] > 0) & (param['fit_status'][:,0] < 5)
print 'Total objects with good Sersic fit =',numpy.sum(good_sersic)
print 'Total objexts with good bulge fit =',numpy.sum(good_bulge)
print 'Number with Sersic hlr == 0 is',numpy.sum(hlr[:,0][good_sersic]==0.)
print 'Number with Sersic flux == 0 is',numpy.sum(flux[:,0][good_sersic]==0.)
print 'Number with bulge+disk hlr == 0 is',numpy.sum(hlr[:,1][good_bulge]==0.)
print 'Number with bulge+disk flux == 0 is',numpy.sum(flux[:,1][good_bulge]==0.)
print 'Number with Sersic hlr < 0.01 is',numpy.sum(hlr[:,0][good_sersic]<0.01)
print 'Number with Sersic flux < 0.01 is',numpy.sum(flux[:,0][good_sersic]<0.01)
print 'Number with bulge+disk hlr < 0.01 is',numpy.sum(hlr[:,1][good_bulge]<0.01)
print 'Number with bulge+disk flux < 0.01 is',numpy.sum(flux[:,1][good_bulge]<0.01)
I'm probably still doing something wrong with the B+D, since most objects claim to have a good bulgefit, but over half of them have zero hlr,flux. |
BTW, when I run the same script with |
There's a flag in there ( |
OK, that makes sense. I should only look at the hlr or flux for the version that COSMOSCatalog will actually use to build the parametric object. Here's some more analysis, where I do that:
So it looks like the fluxes should all be reasonable. (1 to several thousand seems like the right dynamic range.) But there are still some objects with half-light radii of only a few milliarcseconds. That's 1/10 of the HST pixel scale, so those still seem like indications of bad fits to me. @esheldon Does this seem to match up with the weirdnesses you were seeing? I thought you said you were getting some objects with crazy small hlr, like 1.e-8 or so. But maybe that was from the ngmix fits of the drawn objects. Which is possible if all the flux is in a single pixel, the fit may dive relentlessly to very small hlr values. |
On 10/15/16, Mike Jarvis [email protected] wrote:
Yes, looking back it is really the smallness of the objects, not the low flux Rachel, I was actually arguing the opposite, sorry I wasn't clear. I would like a more curated catalog , one that shows fewer obvious signs of For example, I mentioned the default cuts result in a "rolloff" of size but in I can give my use case: I would like to be able to generate a catalog that So I'd like to generate a version of this catalog that to first order doesn't The fine details aren't incredibly important, but it is important, for |
Yes, that's true. If you use So basically, yes, I don't believe those fit values in detail, but I don't think it matters too much either. BTW part of my thinking here is that people who care in great detail what the galaxies will look like won't want Sersic fits, and will use the images themselves. There are many other things wrong with using Sersic profiles, besides these tiny hlr values for 1% of the objects. |
Erin:
Thanks very much for this explanation. I think I understand what you are getting at much better now, and in fact it's something that comes up for HSC as well. Our data are deep enough that this flux limit in the catalog means some galaxies are not represented, so we have to either (a) extrapolate the catalog to represent all galaxies past our flux limit or (b) do science with a brighter subset of galaxies that are well-represented by this catalog (and that can be obtained by simulating with this catalog, then applying cuts). Ultimately, I think the answer is that COSMOS is not a sufficient parent catalog to use on its own for simulations for DES or HSC; we need a deeper HST dataset for that purpose, or we have to do some kind of clever tricks on COSMOS. So, I do see a use case for what you are suggesting. I don't have a great answer for how to do it with the existing catalog. If you have suggestions for how to achieve this, I would be happy to help make it easier to do what you want using GalSim (e.g., a setting in COSMOSCatalog() constructor or whatever). I should note that in #808, the idea is to define the joint distribution of Sersic parameters using the part of the catalog that isn't too close to the flux limit, so as to avoid the realm where selection effects are important. Then we use those joint distributions for the extrapolation, so I think that the extrapolated catalog might be just what you are requesting? Francois has not implemented this in GalSim yet, but he has working code for it that he has been playing with, so in principle this could end up in GalSim before long. If you post on there requesting it, that might help. :) Unless you think it's not what you want, after all. |
On 10/16/16, Rachel Mandelbaum [email protected] wrote:
Yes, that might be what I need. Looking at the joint distribution of the fit sersic parameters, it is not Erin Scott Sheldon |
Yes, surface brightness is definitely relevant here. Let me send you some more info offline. |
Is this conversation resolved to everyone's satisfaction? Just wondering if the issue can be closed or if there is some action item still to be done here? |
I think sims using this catalog will be central going forward. It would be good to converge on a set of reliable cuts. |
I think we can see how things go with #808 |
The demos that use
cosmosCatalog
do not apply any cuts on the size or flux of the galaxies.I think it would be useful to include cuts in the demos to instruct end users.
In the deeper catalog, there is clear evidence of measurement issues that result in oddities such as zero size and zero flux. These would not be present in an actual sample at the intended depth.
Also, the size distribution "rolls off" at the small end rather than continuing the power law trend one would expect in a real sample. Essentially this is an additional selection effect that the end user is probably not expecting.
Mike pointed out that these cuts can be applied in the constructor.
The text was updated successfully, but these errors were encountered: