Skip to content

Commit 0bc2a9b

Browse files
committed
make release-tag: Merge branch 'main' into stable
2 parents fd67bd3 + ee9a8bb commit 0bc2a9b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+4950
-733
lines changed

.github/workflows/integration.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,12 @@ jobs:
1111
strategy:
1212
matrix:
1313
python-version: [ '3.8', '3.9', '3.10', '3.11', '3.12']
14-
os: [ubuntu-latest, macos-latest, windows-latest]
14+
os: [ubuntu-latest, windows-latest]
15+
include:
16+
- os: macos-latest
17+
python-version: '3.8'
18+
- os: macos-latest
19+
python-version: '3.12'
1520
steps:
1621
- uses: actions/checkout@v4
1722
- name: Set up Python ${{ matrix.python-version }}

.github/workflows/minimum.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,12 @@ jobs:
1111
strategy:
1212
matrix:
1313
python-version: [ '3.8', '3.9', '3.10', '3.11', '3.12']
14-
os: [ubuntu-latest, macos-latest, windows-latest]
14+
os: [ubuntu-latest, windows-latest]
15+
include:
16+
- os: macos-latest
17+
python-version: '3.8'
18+
- os: macos-latest
19+
python-version: '3.12'
1520
steps:
1621
- uses: actions/checkout@v4
1722
- name: Set up Python ${{ matrix.python-version }}

.github/workflows/unit.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,12 @@ jobs:
1111
strategy:
1212
matrix:
1313
python-version: [ '3.8', '3.9', '3.10', '3.11', '3.12']
14-
os: [ubuntu-latest, macos-latest, windows-latest]
14+
os: [ubuntu-latest, windows-latest]
15+
include:
16+
- os: macos-latest
17+
python-version: '3.8'
18+
- os: macos-latest
19+
python-version: '3.12'
1520
steps:
1621
- uses: actions/checkout@v4
1722
- name: Set up Python ${{ matrix.python-version }}

HISTORY.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,41 @@
11
# Release Notes
22

3+
## 1.13.0 - 2024-05-15
4+
5+
This release adds a utility function called `get_random_subset` that helps users get a subset of their multi-table data so that modeling can be done quicker. Given a dictionary of table names mapped to DataFrames, metadata, a main table and a desired number of rows to use for the main table, it will subsample the data in a way that maintains referential integrity.
6+
7+
This release also adds two new local file handlers: the `CSVHandler` and the `ExcelHandler`. This enables users to easily load from and save synthetic data to these files types. These handlers return data and metadata in the multi-table format, so we also added the function `get_table_metadata` to get a `SingleTableMetadata` object from a `MultiTableMetadata` object.
8+
9+
Finally, this release fixes some bugs that prevented synthesizers from working with data that had numerical column names.
10+
11+
### New Features
12+
13+
* Add `get_random_subset` poc utility function - Issue [#1877](https://github.com/sdv-dev/SDV/issues/1877) by @R-Palazzo
14+
* Add usage logging - Issue [#1903](https://github.com/sdv-dev/SDV/issues/1903) by @pvk-developer
15+
* Move function `drop_unknown_references` from `poc` to be directly under `utils` - Issue [#1947](https://github.com/sdv-dev/SDV/issues/1947) by @R-Palazzo
16+
* Add CSVHandler - Issue [#1949](https://github.com/sdv-dev/SDV/issues/1949) by @pvk-developer
17+
* Add ExcelHandler - Issue [#1950](https://github.com/sdv-dev/SDV/issues/1950) by @pvk-developer
18+
* Add get_table_metadata function - Issue [#1951](https://github.com/sdv-dev/SDV/issues/1951) by @R-Palazzo
19+
* Save usage log file as a csv - Issue [#1974](https://github.com/sdv-dev/SDV/issues/1974) by @frances-h
20+
* Split out metadata creation from data import in the local files handlers - Issue [#1975](https://github.com/sdv-dev/SDV/issues/1975) by @pvk-developer
21+
* Improve error message when trying to sample before fitting (single table) - Issue [#1978](https://github.com/sdv-dev/SDV/issues/1978) by @R-Palazzo
22+
23+
### Bugs Fixed
24+
25+
* Metadata detection crashes when the column names are integers (`AttributeError: 'int' object has no attribute 'lower'`) - Issue [#1933](https://github.com/sdv-dev/SDV/issues/1933) by @lajohn4747
26+
* Synthesizers crash when column names are integers (`TypeError: unsupported operand`) - Issue [#1935](https://github.com/sdv-dev/SDV/issues/1935) by @lajohn4747
27+
* Switch parameter order in drop_unknown_references - Issue [#1944](https://github.com/sdv-dev/SDV/issues/1944) by @R-Palazzo
28+
* Unexpected NaN values in sequence_index when dataframe isn't reset - Issue [#1973](https://github.com/sdv-dev/SDV/issues/1973) by @fealho
29+
* Fix pandas DtypeWarning in download_demo - Issue [#1980](https://github.com/sdv-dev/SDV/issues/1980) by @fealho
30+
31+
### Maintenance
32+
33+
* Only run unit and integration tests on oldest and latest python versions for macos - Issue [#1948](https://github.com/sdv-dev/SDV/issues/1948) by @frances-h
34+
35+
### Internal
36+
37+
* Update code to remove `FutureWarning` related to 'enforce_uniqueness' parameter - Issue [#1995](https://github.com/sdv-dev/SDV/issues/1995) by @pvk-developer
38+
339
## 1.12.1 - 2024-04-19
440

541
This release makes a number of changes to how id columns are generated. By default, id columns with a regex will now have their values scrambled in the output. Id columns without a regex that are numeric will be created randomly. If they're not numeric, they will have a random suffix.

Makefile

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -123,12 +123,8 @@ test-integration: ## run tests quickly with the default Python
123123
test-readme: ## run the readme snippets
124124
invoke readme
125125

126-
.PHONY: test-tutorials
127-
test-tutorials: ## run the tutorial notebooks
128-
invoke tutorials
129-
130126
.PHONY: test
131-
test: test-unit test-integration test-readme test-tutorials ## test everything that needs test dependencies
127+
test: test-unit test-integration test-readme ## test everything that needs test dependencies
132128

133129
.PHONY: test-all
134130
test-all: ## run tests on every Python version with tox
@@ -239,6 +235,10 @@ ifeq ($(CHANGELOG_LINES),0)
239235
$(error Please insert the release notes in HISTORY.md before releasing)
240236
endif
241237

238+
.PHONY: git-push
239+
git-push: ## Simply push the repository to github
240+
git push
241+
242242
.PHONY: check-release
243243
check-release: check-clean check-main check-history ## Check if the release can be made
244244
@echo "A new release can be made"
@@ -265,5 +265,5 @@ release-major: check-release bumpversion-major release
265265

266266
.PHONY: check-deps
267267
check-deps:
268-
$(eval allow_list='cloudpickle=|graphviz=|numpy=|pandas=|tqdm=|copulas=|ctgan=|deepecho=|rdt=|sdmetrics=')
268+
$(eval allow_list='cloudpickle=|graphviz=|numpy=|pandas=|tqdm=|copulas=|ctgan=|deepecho=|rdt=|sdmetrics=|platformdirs=')
269269
pip freeze | grep -v "SDV.git" | grep -E $(allow_list) | sort > $(OUTPUT_FILEPATH)

latest_requirements.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ deepecho==0.6.0
55
graphviz==0.20.3
66
numpy==1.26.4
77
pandas==2.2.2
8-
rdt==1.11.1
8+
platformdirs==4.2.1
9+
rdt==1.12.1
910
sdmetrics==0.14.0
10-
tqdm==4.66.2
11+
tqdm==4.66.4

pyproject.toml

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,10 @@ dependencies = [
2525
'botocore>=1.31',
2626
'cloudpickle>=2.1.0',
2727
'graphviz>=0.13.2',
28-
"numpy>=1.20.0;python_version<'3.10'",
28+
"numpy>=1.21.0;python_version<'3.10'",
2929
"numpy>=1.23.3,<2;python_version>='3.10' and python_version<'3.12'",
3030
"numpy>=1.26.0,<2;python_version>='3.12'",
31-
"pandas>=1.1.3;python_version<'3.10'",
32-
"pandas>=1.3.4;python_version>='3.10' and python_version<'3.11'",
31+
"pandas>=1.4.0;python_version<'3.11'",
3332
"pandas>=1.5.0;python_version>='3.11' and python_version<'3.12'",
3433
"pandas>=2.1.1;python_version>='3.12'",
3534
'tqdm>=4.29',
@@ -38,6 +37,7 @@ dependencies = [
3837
'deepecho>=0.6.0',
3938
'rdt>=1.12.0',
4039
'sdmetrics>=0.14.0',
40+
'platformdirs>=4.0',
4141
]
4242

4343
[project.urls]
@@ -51,7 +51,9 @@ dependencies = [
5151
sdv = { main = 'sdv.cli.__main__:main' }
5252

5353
[project.optional-dependencies]
54+
excel = ['pandas[excel]']
5455
test = [
56+
'sdv[excel]',
5557
'pytest>=3.4.2',
5658
'pytest-cov>=2.6.0',
5759
'pytest-rerunfailures>=10.3,<15',
@@ -140,7 +142,8 @@ namespaces = false
140142
'make.bat',
141143
'*.jpg',
142144
'*.png',
143-
'*.gif'
145+
'*.gif',
146+
'sdv_logger_config.yml'
144147
]
145148

146149
[tool.setuptools.exclude-package-data]
@@ -154,7 +157,7 @@ namespaces = false
154157
version = {attr = 'sdv.__version__'}
155158

156159
[tool.bumpversion]
157-
current_version = "1.12.1"
160+
current_version = "1.13.0.dev1"
158161
parse = '(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\.(?P<release>[a-z]+)(?P<candidate>\d+))?'
159162
serialize = [
160163
'{major}.{minor}.{patch}.{release}{candidate}',

sdv/__init__.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
__author__ = 'DataCebo, Inc.'
88
__email__ = '[email protected]'
9-
__version__ = '1.12.1'
9+
__version__ = '1.13.0.dev1'
1010

1111

1212
import sys
@@ -16,8 +16,8 @@
1616
from types import ModuleType
1717

1818
from sdv import (
19-
constraints, data_processing, datasets, evaluation, io, lite, metadata, metrics, multi_table,
20-
sampling, sequential, single_table, version)
19+
constraints, data_processing, datasets, evaluation, io, lite, logging, metadata, metrics,
20+
multi_table, sampling, sequential, single_table, version)
2121

2222
__all__ = [
2323
'constraints',
@@ -26,6 +26,7 @@
2626
'evaluation',
2727
'io',
2828
'lite',
29+
'logging',
2930
'metadata',
3031
'metrics',
3132
'multi_table',
@@ -94,7 +95,7 @@ def _find_addons():
9495
addon = entry_point.load()
9596
except Exception as e: # pylint: disable=broad-exception-caught
9697
msg = (
97-
f'Failed to load "{entry_point.name}" from "{entry_point.version}" '
98+
f'Failed to load "{entry_point.name}" from "{entry_point.value}" '
9899
f'with error:\n{e}'
99100
)
100101
warnings.warn(msg)

sdv/_utils.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,8 @@ def _validate_foreign_keys_not_null(metadata, data):
214214
invalid_tables = defaultdict(list)
215215
for table_name, table_data in data.items():
216216
for foreign_key in metadata._get_all_foreign_keys(table_name):
217+
if foreign_key not in table_data and int(foreign_key) in table_data:
218+
foreign_key = int(foreign_key)
217219
if table_data[foreign_key].isna().any():
218220
invalid_tables[table_name].append(foreign_key)
219221

sdv/data_processing/data_processor.py

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -412,7 +412,7 @@ def _update_transformers_by_sdtypes(self, sdtype, transformer):
412412
self._transformers_by_sdtype[sdtype] = transformer
413413

414414
@staticmethod
415-
def create_anonymized_transformer(sdtype, column_metadata, enforce_uniqueness,
415+
def create_anonymized_transformer(sdtype, column_metadata, cardinality_rule,
416416
locales=['en_US']):
417417
"""Create an instance of an ``AnonymizedFaker``.
418418
@@ -424,24 +424,26 @@ def create_anonymized_transformer(sdtype, column_metadata, enforce_uniqueness,
424424
Sematic data type or a ``Faker`` function name.
425425
column_metadata (dict):
426426
A dictionary representing the rest of the metadata for the given ``sdtype``.
427-
enforce_uniqueness (bool):
428-
If ``True`` overwrite ``enforce_uniqueness`` with ``True`` to ensure unique
429-
generation for primary keys.
427+
cardinality_rule (str):
428+
If ``'unique'`` enforce that every created value is unique.
429+
If ``'match'`` match the cardinality of the data seen during fit.
430+
If ``None`` do not consider cardinality.
431+
Defaults to ``None``.
430432
locales (str or list):
431433
Locale or list of locales to use for the AnonymizedFaker transfomer.
432434
Defaults to ['en_US'].
433435
434436
Returns:
435437
Instance of ``rdt.transformers.pii.AnonymizedFaker``.
436438
"""
437-
kwargs = {'locales': locales}
439+
kwargs = {
440+
'locales': locales,
441+
'cardinality_rule': cardinality_rule
442+
}
438443
for key, value in column_metadata.items():
439444
if key not in ['pii', 'sdtype']:
440445
kwargs[key] = value
441446

442-
if enforce_uniqueness:
443-
kwargs['enforce_uniqueness'] = True
444-
445447
try:
446448
transformer = get_anonymized_transformer(sdtype, kwargs)
447449
except AttributeError as error:
@@ -494,7 +496,7 @@ def _get_transformer_instance(self, sdtype, column_metadata):
494496
is_baseprovider = transformer.provider_name == 'BaseProvider'
495497
if is_lexify and is_baseprovider: # Default settings
496498
return self.create_anonymized_transformer(
497-
sdtype, column_metadata, False, self._locales
499+
sdtype, column_metadata, None, self._locales
498500
)
499501

500502
kwargs = {
@@ -598,11 +600,11 @@ def _create_config(self, data, columns_created_by_constraints):
598600

599601
elif pii:
600602
sdtypes[column] = 'pii'
601-
enforce_uniqueness = bool(column in self._keys)
603+
cardinality_rule = 'unique' if bool(column in self._keys) else None
602604
transformers[column] = self.create_anonymized_transformer(
603605
sdtype,
604606
column_metadata,
605-
enforce_uniqueness,
607+
cardinality_rule,
606608
self._locales
607609
)
608610

@@ -614,7 +616,7 @@ def _create_config(self, data, columns_created_by_constraints):
614616
transformers[column] = self.create_anonymized_transformer(
615617
sdtype=sdtype,
616618
column_metadata=column_metadata,
617-
enforce_uniqueness=True,
619+
cardinality_rule='unique',
618620
locales=self._locales
619621
)
620622

0 commit comments

Comments
 (0)