Skip to content

Add new filter for faint trails#384

Open
supernova-trackers wants to merge 1 commit into
astrolabsoftware:masterfrom
supernova-trackers:new-filter-faint-trails
Open

Add new filter for faint trails#384
supernova-trackers wants to merge 1 commit into
astrolabsoftware:masterfrom
supernova-trackers:new-filter-faint-trails

Conversation

@supernova-trackers

Copy link
Copy Markdown

Linked to issue(s): Closes #383

What changes were proposed in this pull request?

New filter named faint_trails:

  • magnitude between 18 and 21 (faint, but still can be detected by a telescope with moderate size)
  • trail longer than 2 arcsec (fast moving object)
  • not a cosmic ray

How was this patch tested?

Extracted data from Fink data transfer tool, using date 23rd Feb 2026.
Copied a parquet file containing 2 matching alerts
Checked the count and the diaSourceId value in the test.

@JulienPeloton

Copy link
Copy Markdown
Member

Hello @supernova-trackers

Thanks for contributing to Fink! Could you explicit a bit more what is the science case behind?

I've made a larger test on 10M alert data (almost 6 months in 2026):

blocks: []
catalog_filename: null
content:
- diaSource
- ssSource
- diaObject
dates:
  startdate: '2026-01-01'
  stopdate: '2026-05-19'
extra_cond:
- diaSource.trailLength > 2
- NOT diaSource.pixelFlags_cr
- 31.4 - 2.5 * LOG10(diaSource.scienceFlux) < 21
- 31.4 - 2.5 * LOG10(diaSource.scienceFlux) > 18
filters: []

For the next 7 days, you can consume the data of my run:

fink_datatransfer \
    -survey lsst \
    -topic ftransfer_lsst_2026-05-22_110066 \
    -outdir ftransfer_lsst_2026-05-22_110066 \
    --dump_schemas \
    --verbose

Reading the data:

# read the Data Transfer output
pdf = pd.read_parquet('ftransfer_lsst_2026-05-22_110066')

I get 224,789 alerts matching, or about 2%. This is still way too much by at least an order of magnitude I would say to be usable for human inspection and follow-up operations. I tried to add a few reasonable filters that I describe below.

Good quality data

Let's keep only data deemed good based on quality flags provided by the project:

from fink_filters.rubin.blocks import b_good_quality

mask_good_quality = b_good_quality(pdf['diaSource'].apply(pd.Series))

Positive fluxes

Negative fluxes are sign of noisy data or underlying fitting problem:

# Define a mask to keep sources with only positive flux
mask_flux_pos = pdf['diaSource'].apply(lambda x: x['psfFlux'] > 0)

# Define a mask to keep sources with only positive trail flux
mask_trailflux_pos = pdf['diaSource'].apply(lambda x: x['trailFlux'] > 0)

All filters

Applying all filters, I get:

pdf_sub = pdf[mask_flux_pos & mask_trailflux_pos & mask_good_quality]
len(pdf_sub)  # 36451

This is not quite an order of magnitude less, but if I inspect the percentage of trailed alert per night (quick and dirty):

    count   f:night   percent
0    9068  20260219  0.044758
1    5127  20260223  0.025306
2    2415  20260108  0.011920
3    1562  20260109  0.007710
4    1120  20260117  0.005528
5     855  20260110  0.004220
6     701  20260112  0.003460
7     465  20260225  0.002295
8     451  20260226  0.002226
9     441  20260116  0.002177
10    415  20260224  0.002048
11    278  20260119  0.001372
12    263  20260113  0.001298
13    191  20260107  0.000943
14    189  20260118  0.000933
15    177  20260106  0.000874
16    164  20260104  0.000809
17    141  20260111  0.000696
18    113  20260103  0.000558
19     94  20260101  0.000464
20     89  20260102  0.000439
21     82  20260410  0.000405
22     76  20260105  0.000375
23     70  20260227  0.000346
24     58  20260307  0.000286
25     57  20260128  0.000281
26     34  20260406  0.000168
27     31  20260405  0.000153
28     28  20260412  0.000138
29     27  20260413  0.000133
30     26  20260414  0.000128
31     24  20260309  0.000118
32     23  20260411  0.000114
33     18  20260306  0.000089
34     11  20260308  0.000054
35     11  20260301  0.000054
36     10  20260302  0.000049
37      9  20260303  0.000044
38      9  20260517  0.000044
39      8  20260228  0.000039
40      8  20260501  0.000039
41      7  20260512  0.000035
42      7  20260430  0.000035
43      6  20260421  0.000030
44      4  20260415  0.000020
45      2  20260409  0.000010
46      1  20260408  0.000005
47      1  20260407  0.000005

We can see that the highest nights were before March -- and after it is much smaller (real/bogus model has changed during these times). So that would make it more tractable :-)

What do you think of adding these extra criteria?

@supernova-trackers

Copy link
Copy Markdown
Author

What do you think of adding these extra criteria?

Hello Julien,

Thanks for having a look at this filter.
The idea of considering only the positive fluxes is indeed good.

The goal behind the filter is to get trails which are not already associated to a known solar system object.
So, we could also filter on the ssObjectId (and that would reduce the number of alerts even more).
However, I am not sure that the live alert stream contains this data (yet).
When I do a data transfer, I see it, but it may be because that's not the real time stream and the data is already processed?
As a last resort, I could implement a local routine which compares the position and angle with the known objects database (that would allow filtering on my end).

In any case, I thought it would be good, as a first experiment, to also receive the known objects. This will allow to validate that we are able to find the object again from the alert.

@supernova-trackers

Copy link
Copy Markdown
Author

Good quality data

Let's keep only data deemed good based on quality flags provided by the project:

from fink_filters.rubin.blocks import b_good_quality

mask_good_quality = b_good_quality(pdf['diaSource'].apply(pd.Series))

About adding the b_good_quality filter, I wonder if that is a good idea.
Isn't this more suitable for punctual sources?
Couldn't some trails be treated by mistake as artifacts or poor quality data?

@JulienPeloton

Copy link
Copy Markdown
Member

Hello @supernova-trackers

So, we could also filter on the ssObjectId (and that would reduce the number of alerts even more).
However, I am not sure that the live alert stream contains this data (yet).

This information is on the stream -- actually DataTransfer and livestream access exactly the same data content. There is a column in the alert packet pred.is_sso that can be used. So we could add an extra bit:

def faint_trails(diaSource: pd.DataFrame, is_sso) -> pd.Series:
    """""""
    f_not_sso = ~is_sso
    ...

About adding the b_good_quality filter, I wonder if that is a good idea.
Isn't this more suitable for punctual sources?
Couldn't some trails be treated by mistake as artifacts or poor quality data?

The b_good_quality filter mainly uses processing flags (bad subtraction, saturation, centroid misestimation, etc.) but not yet the reliability score (which could be indeed biased towards point-like sources). Although some trailed sources could be missed, the goal at this early stage of the stream is really to remove the massive amount of boguses rather than trying to save a (very) few good sources that would be discarded by this filter. When the data quality will stabilize, then yes refinement could be added.

@JulienPeloton

Copy link
Copy Markdown
Member

On another note - I've just heard that trailLengthErr is now populated in alerts, and other empty fields associated with trail measurements will be populated soon 🎉

So that will bring extra information (e.g. you could use trailFluxErr to quantify the SNR of the trail). I'll keep you in touch of the progress.

@supernova-trackers

supernova-trackers commented May 28, 2026

Copy link
Copy Markdown
Author

Hello Julien,

This information is on the stream -- actually DataTransfer and livestream access exactly the same data content. There is a column in the alert packet pred.is_sso that can be used. So we could add an extra bit:

For initial validation purpose, could we please keep the known SSO for now?

I also experimented the calculation of an "smooth elongation" from the cutout science image, in my parquet file post-processing, in order to filter out galaxies and stars which are identified as trails. That seemed to work quite well, but I am not sure if it is possible to integrate such computation into the filter.

Here is the code:

def compute_elongation_from_image(data):
    """
    Robust elongation using thresholded significant pixels.
    """
    try:
        data = np.nan_to_num(data).astype(float)

        # Estimate noise using MAD
        median = np.median(data)
        mad = np.median(np.abs(data - median))
        sigma = 1.4826 * mad if mad > 0 else np.std(data)

        # Threshold: keep only significant pixels
        threshold = median + 3 * sigma

        mask = data > threshold

        # If too few pixels → skip
        if np.sum(mask) < 5:
            return np.nan

        # Use ONLY significant pixels
        y, x = np.where(mask)
        weights = data[mask]

        # Centroid
        total = np.sum(weights)
        x_c = np.sum(x * weights) / total
        y_c = np.sum(y * weights) / total

        dx = x - x_c
        dy = y - y_c

        ixx = np.sum(weights * dx * dx) / total
        iyy = np.sum(weights * dy * dy) / total
        ixy = np.sum(weights * dx * dy) / total

        M = np.array([[ixx, ixy],
                      [ixy, iyy]])

        eigvals = np.linalg.eigvals(M)

        l1, l2 = np.max(eigvals), np.min(eigvals)

        if l2 <= 0:
            return np.nan

        elongation = np.sqrt(l1 / l2)

        return float(elongation)

    except Exception as e:
        print(f"Elongation computation failed: {e}")
        return np.nan
elongation = compute_elongation_from_image(data)

# Skip detections with elongation < 2
if elong < 2.0:
    continue

@JulienPeloton

Copy link
Copy Markdown
Member

Hello,

For initial validation purpose, could we please keep the known SSO for now?

Sure!

I also experimented the calculation of an "smooth elongation" from the cutout science image, in my parquet file post-processing, in order to filter out galaxies and stars which are identified as trails. That seemed to work quite well, but I am not sure if it is possible to integrate such computation into the filter.

You can simply add this snippet to your filter. Just add for example a python module utils.py alongside the filter.py with your code, and you will need to slightly modify the filter

from fink_filters.rubin.livestream.filter_faint_trails.utils import compute_elongation_from_image

def faint_trails(diaSource: pd.DataFrame, cutoutScience: pd.Series) -> pd.Series:  # Assuming you work on the science image
    ...
    mag = fu.flux_to_apparent_mag(diaSource.psfFlux)
    f_faint = (mag > 18) & (mag < 21)
    f_long_trail = diaSource.trailLength > 2
    f_not_cosmic_ray = ~diaSource.pixelFlags_cr

    f_intermediate = f_long_trail & f_faint & f_not_cosmic_ray

    # apply image processing only on the ones surviving f_intermediate to get faster
    f_elong = pd.Series(False, index=cutoutScience.index)
    f_elong.loc[f_intermediate] = cutoutScience.loc[f_intermediate].apply(
        lambda image: compute_elongation_from_image(image) < 2.0
    ).astype(bool)

    return f_intermediate & f_elong

I haven't tested it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Rubin] Create new filter for faint trails

2 participants