Skip to content

Implement RF-based backward feature selection via MDA for sits models #1367

@ctotti

Description

@ctotti

Describe the new API function requested
While sits_reduce() handles temporal combinations, it would be valuable to also include a model-driven feature selection method that:

  1. Complements existing reduction by selecting optimal bands/features for specific classification tasks
  2. Uses Random Forest's variable importance (e.g., Mean Decrease in Accuracy and Gini importance) to:
    - Rank features by predictive importance
    - Iteratively remove least important features (backward elimination)
  3. Preserves sits workflow by returning a modified sits tibble

Associated sits API function

sits_feature_selection(
  samples,               # sits tibble (time series)
  bands = NULL,          # Bands to evaluate (NULL = all)
  importance_metric = "mda",  # or "gini"
  n_iter = 20,           # Max iterations
  accuracy_loss = 0.02,  # Allowed accuracy drop (2%)
  rf_params = list(),    # Custom RF params (num_trees, etc)
  multicores = 1         # Parallel processing
)

The returned tibble can then be used to generate a reduced sampled cube, continuing the standard pipeline with sits_train(), sits_classify(), etc.

Additional context
This function would be especially useful for reducing high-dimensional inputs from multiple indices and texture metrics, improving performance and minimizing overfitting. Focusing on band-level optimization, complementing the temporal reduction approach of sits_reduce().

Note: I’m a user and not familiar with the internal feasibility of this integration.

References
[1] Mahmood et al. (2025) - Demonstrated 80% feature reduction with <2% accuracy loss

Metadata

Metadata

Projects

Status

To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions