Skip to content

Commit 33f2643

Browse files
committed
make release-tag: Merge branch 'main' into stable
2 parents c428c2b + b3e4375 commit 33f2643

File tree

27 files changed

+904
-195
lines changed

27 files changed

+904
-195
lines changed

Diff for: .github/workflows/release_notes.yml

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: Release Notes Generator
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
branch:
7+
description: 'Branch to merge release notes into.'
8+
required: true
9+
default: 'main'
10+
version:
11+
description:
12+
'Version to use for the release. Must be in format: X.Y.Z.'
13+
date:
14+
description:
15+
'Date of the release. Must be in format YYYY-MM-DD.'
16+
17+
jobs:
18+
releasenotesgeneration:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
- name: Set up Python 3.10
23+
uses: actions/setup-python@v5
24+
with:
25+
python-version: '3.10'
26+
27+
- name: Install dependencies
28+
run: |
29+
python -m pip install --upgrade pip
30+
python -m pip install requests==2.31.0
31+
32+
- name: Generate release notes
33+
env:
34+
GH_ACCESS_TOKEN: ${{ secrets.GH_ACCESS_TOKEN }}
35+
run: >
36+
python scripts/release_notes_generator.py
37+
-v ${{ inputs.version }}
38+
-d ${{ inputs.date }}
39+
40+
- name: Create pull request
41+
id: cpr
42+
uses: peter-evans/create-pull-request@v4
43+
with:
44+
token: ${{ secrets.GH_ACCESS_TOKEN }}
45+
commit-message: Release notes for v${{ inputs.version }}
46+
author: "github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>"
47+
committer: "github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>"
48+
title: v${{ inputs.version }} Release Notes
49+
body: "This is an auto-generated PR to update the release notes."
50+
branch: release-notes
51+
branch-suffix: short-commit-hash
52+
base: ${{ inputs.branch }}

Diff for: HISTORY.md

+34-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,39 @@
11
# Release Notes
22

3-
## 1.13.1 - 2024-05-16
3+
### v1.14.0 - 2024-06-13
4+
5+
This release provides a number of new features. A big one is that it adds the ability to fit the `HMASynthesizer` on disconnected schemas! It also enables the `PARSynthesizer` to work with constraints in certain conditions. More specifically, the `PARSynthesizer` can now handle constraints as long as the columns involved in the constraints are either exclusively all context columns or exclusively all non-context columns.
6+
7+
Additionally, a `verbose` parameter was added to the `TVAESynthesizer` to get a more detailed progress bar. Also, a bug was corrected that renamed the `file_path` parameter in the `ExcelHandler.read()` method to `filepath` as specified in the official [SDV docs](https://docs.sdv.dev/sdv/multi-table-data/data-preparation/loading-data/excel#read).
8+
9+
### Internal
10+
11+
* Add workflow to generate release notes - Issue [#2050](https://github.com/sdv-dev/SDV/issues/2050) by @amontanez24
12+
13+
### Bugs Fixed
14+
15+
* PARSynthesizer: Duplicate sequence index values when `sequence_length` is higher than real data - Issue [#2031](https://github.com/sdv-dev/SDV/issues/2031) by @lajohn4747
16+
* PARSynthesizer model won't fit if sequence_index is missing - Issue [#1972](https://github.com/sdv-dev/SDV/issues/1972) by @lajohn4747
17+
* `DataProcessor` never gets assigned a `table_name`. - Issue [#1964](https://github.com/sdv-dev/SDV/issues/1964) by @fealho
18+
19+
### New Features
20+
21+
* Rename `file_path` to `filepath` parameter in ExcelHandler - Issue [#2055](https://github.com/sdv-dev/SDV/issues/2055) by @amontanez24
22+
* Enable the ability to run multi table synthesizers on disjointed table schemas - Issue [#2047](https://github.com/sdv-dev/SDV/issues/2047) by @lajohn4747
23+
* Add header to log.csv file - Issue [#2046](https://github.com/sdv-dev/SDV/issues/2046) by @lajohn4747
24+
* If no filepath is provided, do not create a file during `sample` - Issue [#2042](https://github.com/sdv-dev/SDV/issues/2042) by @lajohn4747
25+
* Add verbosity to `TVAESynthesizer` - Issue [#1990](https://github.com/sdv-dev/SDV/issues/1990) by @fealho
26+
* Allow constraints in PARSynthesizer (for all context cols, or all non-context columns) - Issue [#1936](https://github.com/sdv-dev/SDV/issues/1936) by @lajohn4747
27+
* Improve error message when sampling on a non-CPU device - Issue [#1819](https://github.com/sdv-dev/SDV/issues/1819) by @fealho
28+
* Better data validation message for `auto_assign_transformers` - Issue [#1509](https://github.com/sdv-dev/SDV/issues/1509) by @lajohn4747
29+
30+
### Miscellaneous
31+
32+
* Do not enforce min/max on sequence index column - Issue [#2043](https://github.com/sdv-dev/SDV/pull/2043)
33+
* Include validation check for single table auto_assign_transformers - Issue [#2021](https://github.com/sdv-dev/SDV/pull/2021)
34+
* Add the dummy context column to metadata and not to extra_context_column - Issue [#2019](https://github.com/sdv-dev/SDV/pull/2019)
35+
36+
# 1.13.1 - 2024-05-16
437

538
This release fixes the `ModuleNotFoundError` error that was causing the 1.13.0 release to fail.
639

Diff for: README.md

+12-8
Original file line numberDiff line numberDiff line change
@@ -94,12 +94,12 @@ column and the primary key (`guest_email`).
9494
## Synthesizing Data
9595
Next, we can create an **SDV synthesizer**, an object that you can use to create synthetic data.
9696
It learns patterns from the real data and replicates them to generate synthetic data. Let's use
97-
the `FAST_ML` preset synthesizer, which is optimized for performance.
97+
the [GaussianCopulaSynthesizer](https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers/gaussiancopulasynthesizer).
9898

9999
```python
100-
from sdv.lite import SingleTablePreset
100+
from sdv.single_table import GaussianCopulaSynthesizer
101101

102-
synthesizer = SingleTablePreset(metadata, name='FAST_ML')
102+
synthesizer = GaussianCopulaSynthesizer(metadata)
103103
synthesizer.fit(data=real_data)
104104
```
105105

@@ -131,11 +131,15 @@ quality_report = evaluate_quality(
131131
```
132132

133133
```
134-
Creating report: 100%|██████████| 4/4 [00:00<00:00, 19.30it/s]
135-
Overall Quality Score: 89.12%
136-
Properties:
137-
Column Shapes: 90.27%
138-
Column Pair Trends: 87.97%
134+
Generating report ...
135+
136+
(1/2) Evaluating Column Shapes: |████████████████| 9/9 [00:00<00:00, 1133.09it/s]|
137+
Column Shapes Score: 89.11%
138+
139+
(2/2) Evaluating Column Pair Trends: |██████████████████████████████████████████| 36/36 [00:00<00:00, 502.88it/s]|
140+
Column Pair Trends Score: 88.3%
141+
142+
Overall Score (Average): 88.7%
139143
```
140144

141145
This object computes an overall quality score on a scale of 0 to 100% (100 being the best) as well

Diff for: latest_requirements.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
cloudpickle==3.0.0
22
copulas==0.11.0
3-
ctgan==0.10.0
3+
ctgan==0.10.1
44
deepecho==0.6.0
55
graphviz==0.20.3
66
numpy==1.26.4
77
pandas==2.2.2
8-
platformdirs==4.2.1
8+
platformdirs==4.2.2
99
rdt==1.12.1
10-
sdmetrics==0.14.0
10+
sdmetrics==0.14.1
1111
tqdm==4.66.4

Diff for: pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ namespaces = false
158158
version = {attr = 'sdv.__version__'}
159159

160160
[tool.bumpversion]
161-
current_version = "1.13.1"
161+
current_version = "1.14.0.dev1"
162162
parse = '(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\.(?P<release>[a-z]+)(?P<candidate>\d+))?'
163163
serialize = [
164164
'{major}.{minor}.{patch}.{release}{candidate}',

Diff for: scripts/release_notes_generator.py

+153
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
"""Script to generate release notes."""
2+
3+
import argparse
4+
import os
5+
from collections import defaultdict
6+
7+
import requests
8+
9+
LABEL_TO_HEADER = {
10+
'feature request': 'New Features',
11+
'bug': 'Bugs Fixed',
12+
'internal': 'Internal',
13+
'maintenance': 'Maintenance',
14+
'customer success': 'Customer Success',
15+
'documentation': 'Documentation',
16+
'misc': 'Miscellaneous'
17+
}
18+
ISSUE_LABELS = [
19+
'documentation',
20+
'maintenance',
21+
'internal',
22+
'bug',
23+
'feature request',
24+
'customer success'
25+
]
26+
NEW_LINE = '\n'
27+
GITHUB_URL = 'https://api.github.com/repos/sdv-dev/sdv'
28+
GITHUB_TOKEN = os.getenv('GH_ACCESS_TOKEN')
29+
30+
31+
def _get_milestone_number(milestone_title):
32+
url = f'{GITHUB_URL}/milestones'
33+
headers = {
34+
'Authorization': f'Bearer {GITHUB_TOKEN}'
35+
}
36+
query_params = {
37+
'milestone': milestone_title,
38+
'state': 'all',
39+
'per_page': 100
40+
}
41+
response = requests.get(url, headers=headers, params=query_params)
42+
body = response.json()
43+
if response.status_code != 200:
44+
raise Exception(str(body))
45+
46+
milestones = body
47+
for milestone in milestones:
48+
if milestone.get('title') == milestone_title:
49+
return milestone.get('number')
50+
51+
raise ValueError(f'Milestone {milestone_title} not found in past 100 milestones.')
52+
53+
54+
def _get_issues_by_milestone(milestone):
55+
headers = {
56+
'Authorization': f'Bearer {GITHUB_TOKEN}'
57+
}
58+
# get milestone number
59+
milestone_number = _get_milestone_number(milestone)
60+
url = f'{GITHUB_URL}/issues'
61+
page = 1
62+
query_params = {
63+
'milestone': milestone_number,
64+
'state': 'all'
65+
}
66+
issues = []
67+
while True:
68+
query_params['page'] = page
69+
response = requests.get(url, headers=headers, params=query_params)
70+
body = response.json()
71+
if response.status_code != 200:
72+
raise Exception(str(body))
73+
74+
issues_on_page = body
75+
if not issues_on_page:
76+
break
77+
78+
issues.extend(issues_on_page)
79+
page += 1
80+
81+
return issues
82+
83+
84+
def _get_issues_by_category(release_issues):
85+
category_to_issues = defaultdict(list)
86+
87+
for issue in release_issues:
88+
issue_title = issue['title']
89+
issue_number = issue['number']
90+
issue_url = issue['html_url']
91+
line = f'* {issue_title} - Issue [#{issue_number}]({issue_url})'
92+
assignee = issue.get('assignee')
93+
if assignee:
94+
login = assignee['login']
95+
line += f' by @{login}'
96+
97+
# Check if any known label is marked on the issue
98+
labels = [label['name'] for label in issue['labels']]
99+
found_category = False
100+
for category in ISSUE_LABELS:
101+
if category in labels:
102+
category_to_issues[category].append(line)
103+
found_category = True
104+
break
105+
106+
if not found_category:
107+
category_to_issues['misc'].append(line)
108+
109+
return category_to_issues
110+
111+
112+
def _create_release_notes(issues_by_category, version, date):
113+
title = f'## v{version} - {date}'
114+
release_notes = f'{title}{NEW_LINE}{NEW_LINE}'
115+
116+
for category in ISSUE_LABELS + ['misc']:
117+
issues = issues_by_category.get(category)
118+
if issues:
119+
section_text = (
120+
f'### {LABEL_TO_HEADER[category]}{NEW_LINE}{NEW_LINE}'
121+
f'{NEW_LINE.join(issues)}{NEW_LINE}{NEW_LINE}'
122+
)
123+
124+
release_notes += section_text
125+
126+
return release_notes
127+
128+
129+
def update_release_notes(release_notes):
130+
"""Add the release notes for the new release to the ``HISTORY.md``."""
131+
file_path = 'HISTORY.md'
132+
with open(file_path, 'r') as history_file:
133+
history = history_file.read()
134+
135+
token = '# Release Notes\n\n'
136+
split_index = history.find(token) + len(token) + 1
137+
header = history[:split_index]
138+
new_notes = f'{header}{release_notes}{history[split_index:]}'
139+
140+
with open(file_path, 'w') as new_history_file:
141+
new_history_file.write(new_notes)
142+
143+
144+
if __name__ == '__main__':
145+
parser = argparse.ArgumentParser()
146+
parser.add_argument('-v', '--version', type=str, help='Release version number (ie. v1.0.1)')
147+
parser.add_argument('-d', '--date', type=str, help='Date of release in format YYYY-MM-DD')
148+
args = parser.parse_args()
149+
release_number = args.version
150+
release_issues = _get_issues_by_milestone(release_number)
151+
issues_by_category = _get_issues_by_category(release_issues)
152+
release_notes = _create_release_notes(issues_by_category, release_number, args.date)
153+
update_release_notes(release_notes)

Diff for: sdv/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
__author__ = 'DataCebo, Inc.'
88
__email__ = '[email protected]'
9-
__version__ = '1.13.1'
9+
__version__ = '1.14.0.dev1'
1010

1111

1212
import sys

Diff for: sdv/io/local/local.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -192,16 +192,16 @@ def write(self, synthetic_data, folder_name, file_name_suffix=None, mode='x'):
192192
class ExcelHandler(BaseLocalHandler):
193193
"""A class for handling Excel files."""
194194

195-
def _read_excel(self, file_path, sheet_names=None):
195+
def _read_excel(self, filepath, sheet_names=None):
196196
"""Read data from Excel File and return just the data as a dictionary."""
197197
data = {}
198198
if sheet_names is None:
199-
xl_file = pd.ExcelFile(file_path)
199+
xl_file = pd.ExcelFile(filepath)
200200
sheet_names = xl_file.sheet_names
201201

202202
for sheet_name in sheet_names:
203203
data[sheet_name] = pd.read_excel(
204-
file_path,
204+
filepath,
205205
sheet_name=sheet_name,
206206
parse_dates=False,
207207
decimal=self.decimal,
@@ -210,11 +210,11 @@ def _read_excel(self, file_path, sheet_names=None):
210210

211211
return data
212212

213-
def read(self, file_path, sheet_names=None):
213+
def read(self, filepath, sheet_names=None):
214214
"""Read data from Excel files and return it along with metadata.
215215
216216
Args:
217-
file_path (str):
217+
filepath (str):
218218
The path to the Excel file to read.
219219
sheet_names (list of str, optional):
220220
The names of sheets to read. If None, all sheets are read.
@@ -226,7 +226,7 @@ def read(self, file_path, sheet_names=None):
226226
if sheet_names is not None and not isinstance(sheet_names, list):
227227
raise ValueError("'sheet_names' must be None or a list of strings.")
228228

229-
return self._read_excel(file_path, sheet_names)
229+
return self._read_excel(filepath, sheet_names)
230230

231231
def write(self, synthetic_data, file_name, sheet_name_suffix=None, mode='w'):
232232
"""Write synthetic data to an Excel File.

Diff for: sdv/lite/single_table.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -136,8 +136,7 @@ def sample_from_conditions(self, conditions, max_tries_per_batch=100,
136136
The batch size to use per attempt at sampling. Defaults to 10 times
137137
the number of rows.
138138
output_file_path (str or None):
139-
The file to periodically write sampled rows to. Defaults to
140-
a temporary file, if None.
139+
The file to periodically write sampled rows to. Defaults to None.
141140
142141
Returns:
143142
pandas.DataFrame:
@@ -168,8 +167,7 @@ def sample_remaining_columns(self, known_columns, max_tries_per_batch=100,
168167
The batch size to use per attempt at sampling. Defaults to 10 times
169168
the number of rows.
170169
output_file_path (str or None):
171-
The file to periodically write sampled rows to. Defaults to
172-
a temporary file, if None.
170+
The file to periodically write sampled rows to. Defaults to None.
173171
174172
Returns:
175173
pandas.DataFrame:

0 commit comments

Comments
 (0)