Add new filter for faint trails#384
Conversation
|
Hello @supernova-trackers Thanks for contributing to Fink! Could you explicit a bit more what is the science case behind? I've made a larger test on 10M alert data (almost 6 months in 2026): blocks: []
catalog_filename: null
content:
- diaSource
- ssSource
- diaObject
dates:
startdate: '2026-01-01'
stopdate: '2026-05-19'
extra_cond:
- diaSource.trailLength > 2
- NOT diaSource.pixelFlags_cr
- 31.4 - 2.5 * LOG10(diaSource.scienceFlux) < 21
- 31.4 - 2.5 * LOG10(diaSource.scienceFlux) > 18
filters: []For the next 7 days, you can consume the data of my run: fink_datatransfer \
-survey lsst \
-topic ftransfer_lsst_2026-05-22_110066 \
-outdir ftransfer_lsst_2026-05-22_110066 \
--dump_schemas \
--verboseReading the data: # read the Data Transfer output
pdf = pd.read_parquet('ftransfer_lsst_2026-05-22_110066')I get 224,789 alerts matching, or about 2%. This is still way too much by at least an order of magnitude I would say to be usable for human inspection and follow-up operations. I tried to add a few reasonable filters that I describe below. Good quality dataLet's keep only data deemed good based on quality flags provided by the project: from fink_filters.rubin.blocks import b_good_quality
mask_good_quality = b_good_quality(pdf['diaSource'].apply(pd.Series))Positive fluxesNegative fluxes are sign of noisy data or underlying fitting problem: # Define a mask to keep sources with only positive flux
mask_flux_pos = pdf['diaSource'].apply(lambda x: x['psfFlux'] > 0)
# Define a mask to keep sources with only positive trail flux
mask_trailflux_pos = pdf['diaSource'].apply(lambda x: x['trailFlux'] > 0)All filtersApplying all filters, I get: pdf_sub = pdf[mask_flux_pos & mask_trailflux_pos & mask_good_quality]
len(pdf_sub) # 36451This is not quite an order of magnitude less, but if I inspect the percentage of trailed alert per night (quick and dirty): count f:night percent
0 9068 20260219 0.044758
1 5127 20260223 0.025306
2 2415 20260108 0.011920
3 1562 20260109 0.007710
4 1120 20260117 0.005528
5 855 20260110 0.004220
6 701 20260112 0.003460
7 465 20260225 0.002295
8 451 20260226 0.002226
9 441 20260116 0.002177
10 415 20260224 0.002048
11 278 20260119 0.001372
12 263 20260113 0.001298
13 191 20260107 0.000943
14 189 20260118 0.000933
15 177 20260106 0.000874
16 164 20260104 0.000809
17 141 20260111 0.000696
18 113 20260103 0.000558
19 94 20260101 0.000464
20 89 20260102 0.000439
21 82 20260410 0.000405
22 76 20260105 0.000375
23 70 20260227 0.000346
24 58 20260307 0.000286
25 57 20260128 0.000281
26 34 20260406 0.000168
27 31 20260405 0.000153
28 28 20260412 0.000138
29 27 20260413 0.000133
30 26 20260414 0.000128
31 24 20260309 0.000118
32 23 20260411 0.000114
33 18 20260306 0.000089
34 11 20260308 0.000054
35 11 20260301 0.000054
36 10 20260302 0.000049
37 9 20260303 0.000044
38 9 20260517 0.000044
39 8 20260228 0.000039
40 8 20260501 0.000039
41 7 20260512 0.000035
42 7 20260430 0.000035
43 6 20260421 0.000030
44 4 20260415 0.000020
45 2 20260409 0.000010
46 1 20260408 0.000005
47 1 20260407 0.000005We can see that the highest nights were before March -- and after it is much smaller (real/bogus model has changed during these times). So that would make it more tractable :-) What do you think of adding these extra criteria? |
Hello Julien, Thanks for having a look at this filter. The goal behind the filter is to get trails which are not already associated to a known solar system object. In any case, I thought it would be good, as a first experiment, to also receive the known objects. This will allow to validate that we are able to find the object again from the alert. |
About adding the b_good_quality filter, I wonder if that is a good idea. |
|
Hello @supernova-trackers
This information is on the stream -- actually DataTransfer and livestream access exactly the same data content. There is a column in the alert packet def faint_trails(diaSource: pd.DataFrame, is_sso) -> pd.Series:
"""""""
f_not_sso = ~is_sso
...
The |
|
On another note - I've just heard that So that will bring extra information (e.g. you could use |
|
Hello Julien,
For initial validation purpose, could we please keep the known SSO for now? I also experimented the calculation of an "smooth elongation" from the cutout science image, in my parquet file post-processing, in order to filter out galaxies and stars which are identified as trails. That seemed to work quite well, but I am not sure if it is possible to integrate such computation into the filter. Here is the code: def compute_elongation_from_image(data):
"""
Robust elongation using thresholded significant pixels.
"""
try:
data = np.nan_to_num(data).astype(float)
# Estimate noise using MAD
median = np.median(data)
mad = np.median(np.abs(data - median))
sigma = 1.4826 * mad if mad > 0 else np.std(data)
# Threshold: keep only significant pixels
threshold = median + 3 * sigma
mask = data > threshold
# If too few pixels → skip
if np.sum(mask) < 5:
return np.nan
# Use ONLY significant pixels
y, x = np.where(mask)
weights = data[mask]
# Centroid
total = np.sum(weights)
x_c = np.sum(x * weights) / total
y_c = np.sum(y * weights) / total
dx = x - x_c
dy = y - y_c
ixx = np.sum(weights * dx * dx) / total
iyy = np.sum(weights * dy * dy) / total
ixy = np.sum(weights * dx * dy) / total
M = np.array([[ixx, ixy],
[ixy, iyy]])
eigvals = np.linalg.eigvals(M)
l1, l2 = np.max(eigvals), np.min(eigvals)
if l2 <= 0:
return np.nan
elongation = np.sqrt(l1 / l2)
return float(elongation)
except Exception as e:
print(f"Elongation computation failed: {e}")
return np.nanelongation = compute_elongation_from_image(data)
# Skip detections with elongation < 2
if elong < 2.0:
continue |
|
Hello,
Sure!
You can simply add this snippet to your filter. Just add for example a python module from fink_filters.rubin.livestream.filter_faint_trails.utils import compute_elongation_from_image
def faint_trails(diaSource: pd.DataFrame, cutoutScience: pd.Series) -> pd.Series: # Assuming you work on the science image
...
mag = fu.flux_to_apparent_mag(diaSource.psfFlux)
f_faint = (mag > 18) & (mag < 21)
f_long_trail = diaSource.trailLength > 2
f_not_cosmic_ray = ~diaSource.pixelFlags_cr
f_intermediate = f_long_trail & f_faint & f_not_cosmic_ray
# apply image processing only on the ones surviving f_intermediate to get faster
f_elong = pd.Series(False, index=cutoutScience.index)
f_elong.loc[f_intermediate] = cutoutScience.loc[f_intermediate].apply(
lambda image: compute_elongation_from_image(image) < 2.0
).astype(bool)
return f_intermediate & f_elongI haven't tested it. |
Linked to issue(s): Closes #383
What changes were proposed in this pull request?
New filter named faint_trails:
How was this patch tested?
Extracted data from Fink data transfer tool, using date 23rd Feb 2026.
Copied a parquet file containing 2 matching alerts
Checked the count and the diaSourceId value in the test.