Allow me to opt out of replacing missing values (missing_value_replacement=None
)
#938
Labels
feature request
Request for a new feature
missing_value_replacement=None
)
#938
Problem Description
A number of the RDT transformers have a
missing_value_replacement
parameter that describes how to replace missing values in the forward transform. There are a variety of strategies that include replacement with a random value ('random'
), replacing with the mean of the data ('mean'
), etc. This was done under the assumption that the downstream software does not accept missing values.However, I may have some use cases where my downstream software does handle missing values. In this case, I do not want the RDTs to do anything to the missing values. In such cases, we should offer
missing_value_replacement=None
as an option.Note that we used to have this option. Just that it is listed as deprecated for all the transformers. We should un-deprecated it because there are valid use cases that need this.
Expected behavior
Reinstate the
None
option for themissing_value_replacement
parameter, wherever it exists. If this is passed in, then during the forward transform: do not replace the missing values with anything. Just pass the missing values along.missing_value_replacement
should not change; in most cases, it is'random'
. The user would need to explicitly pass inmissing_value_replacement=None
to access this new functionality.For the reverse transform things can be a bit trickier. Ideally the
missing_value_generation
parameter is supposed to tell us how to recreate missing values. But it the passed-in data already contains missing values, then it can mess up generation. Here's what we can do:If the
missing_value_generation
is notNone
and the passed in data contains some null values, then don't do anything to the missing values. Instead, show the user a warning.Additional context
This change should probably be done to a base class. It would ultimately affect the following transformers:
ClusterBasedNormalizer
,FloatFormatter
,GaussianNormalizer
,XGaussianNormalizer
BinaryEncoder
OptimizedTimestampEncoder
,UnixTimestampEncoder
The text was updated successfully, but these errors were encountered: