We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please indicate the following details about the environment in which you found the bug:
Generated synthetic data contains only null values when:
OptimizedTimestampEncoder
UnixTimestampEncoder
enforce_min_max_values
True
missing_value_generation
Coming soon
Link to internal Colab notebook
Sample code for just OptimizedTimestampEncoder:
import numpy as np import pandas as pd from sdv.metadata import Metadata from sdv.single_table import GaussianCopulaSynthesizer from rdt.transformers.datetime import OptimizedTimestampEncoder # Generate data with missing date values start_date = pd.Timestamp("2023-01-01") dates = np.array([start_date + pd.Timedelta(days=j) for j in range(100)], dtype="datetime64[ns]") num_missing = int(100 * 0.2) missing_indices = np.random.choice(100, num_missing, replace=False) dates[missing_indices] = np.datetime64("NaT") data = pd.DataFrame({'date': dates}) metadata = Metadata.detect_from_dataframe(data) metadata.update_column( column_name='date', sdtype='datetime', table_name='table' ) # Update transformers synthesizer = GaussianCopulaSynthesizer(metadata) synthesizer.auto_assign_transformers(data) transformer = OptimizedTimestampEncoder( missing_value_replacement='mean', missing_value_generation='from_column', enforce_min_max_values=True) synthesizer.update_transformers( column_name_to_transformer={ 'date': transformer } ) synthesizer.fit(data) synthetic_data = synthesizer.sample(100) synthetic_data.isnull().sum()
The last line of code returns 100% null values.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
Generated synthetic data contains only null values when:
OptimizedTimestampEncoder
orUnixTimestampEncoder
with the following parameters:enforce_min_max_values
isTrue
missing_value_generation
is 'from_column'Workaround
Coming soon
Steps to reproduce
Link to internal Colab notebook
Sample code for just OptimizedTimestampEncoder:
The last line of code returns 100% null values.
The text was updated successfully, but these errors were encountered: