Skip to content

Commit 1fc8561

Browse files
committed
make release-tag: Merge branch 'master' into stable
2 parents 1ff4fad + 56b9cd0 commit 1fc8561

30 files changed

+432
-46
lines changed

.github/workflows/integration.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
runs-on: ${{ matrix.os }}
1010
strategy:
1111
matrix:
12-
python-version: [3.6, 3.7, 3.8]
12+
python-version: [3.6, 3.7, 3.8, 3.9]
1313
os: [ubuntu-latest, macos-10.15, windows-latest]
1414
steps:
1515
- uses: actions/checkout@v1

.github/workflows/minimum.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
runs-on: ${{ matrix.os }}
1010
strategy:
1111
matrix:
12-
python-version: [3.6, 3.7, 3.8]
12+
python-version: [3.6, 3.7, 3.8, 3.9]
1313
os: [ubuntu-latest, macos-10.15, windows-latest]
1414
steps:
1515
- uses: actions/checkout@v1

.github/workflows/readme.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
runs-on: ${{ matrix.os }}
1010
strategy:
1111
matrix:
12-
python-version: [3.6, 3.7, 3.8]
12+
python-version: [3.6, 3.7, 3.8, 3.9]
1313
os: [ubuntu-latest, macos-10.15] # skip windows bc rundoc fails
1414
steps:
1515
- uses: actions/checkout@v1

.github/workflows/tutorials.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ jobs:
99
runs-on: ${{ matrix.os }}
1010
strategy:
1111
matrix:
12-
python-version: [3.6, 3.7, 3.8]
13-
os: [ubuntu-latest, macos-latest, windows-latest]
12+
python-version: [3.6, 3.7, 3.8, 3.9]
13+
os: [ubuntu-latest, macos-10.15, windows-latest]
1414
steps:
1515
- uses: actions/checkout@v1
1616
- name: Set up Python ${{ matrix.python-version }}

.github/workflows/unit.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
runs-on: ${{ matrix.os }}
1010
strategy:
1111
matrix:
12-
python-version: [3.6, 3.7, 3.8]
12+
python-version: [3.6, 3.7, 3.8, 3.9]
1313
os: [ubuntu-latest, macos-10.15, windows-latest]
1414
steps:
1515
- uses: actions/checkout@v1

.gitignore

-1
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,6 @@ ENV/
109109
sdv/data/
110110
docs/**/*.pkl
111111
docs/**/*metadata.json
112-
docs/images
113112
docs/savefig
114113
tutorials/**/*.pkl
115114
tutorials/**/*metadata.json

HISTORY.md

+20
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Release Notes
22

3+
## 0.13.1 - 2021-12-22
4+
5+
This release adds support for passing tabular constraints to the HMA1 model, and adds more explicit error handling for
6+
metric evaluation. It also includes a fix for using categorical columns in the PAR model and documentation updates
7+
for metadata and HMA1.
8+
9+
### Bugs Fixed
10+
11+
* Categorical column after sequence_index column - Issue [#314](https://github.com/sdv-dev/SDV/issues/314) by @fealho
12+
13+
### New Features
14+
15+
* Support passing tabular constraints to the HMA1 model - Issue [#296](https://github.com/sdv-dev/SDV/issues/296) by @katxiao
16+
* Metric evaluation error handling metrics - Issue [#638](https://github.com/sdv-dev/SDV/issues/638) by @katxiao
17+
18+
### Documentation Changes
19+
20+
* Make true/false values lowercase in Metadata Schema specification - Issue [#664](https://github.com/sdv-dev/SDV/issues/664) by @katxiao
21+
* Update docstrings for hma1 methods - Issue [#642](https://github.com/sdv-dev/SDV/issues/642) by @katxiao
22+
323
## 0.13.0 - 2021-11-22
424

525
This release makes multiple improvements to different `Constraint` classes. The `Unique` constraint can now

README.md

+57-16
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
1-
<p align="left">
2-
<a href="https://dai.lids.mit.edu">
3-
<img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
4-
</a>
5-
<i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i>
1+
<div align="center">
2+
<br/>
3+
<p align="center">
4+
<i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i>
65
</p>
76

87
[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
@@ -13,17 +12,16 @@
1312
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sdv-dev/SDV/master?filepath=tutorials)
1413
[![Slack](https://img.shields.io/badge/Slack%20Workspace-Join%20now!-36C5F0?logo=slack)](https://join.slack.com/t/sdv-space/shared_invite/zt-gdsfcb5w-0QQpFMVoyB2Yd6SRiMplcw)
1514

16-
<img width=30% src="docs/images/SDV-Logo-Color-Tagline.png">
15+
<div align="left">
16+
<br/>
17+
<p align="center">
18+
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/SDV-DataCebo.png"></img>
19+
</p>
20+
</div>
1721

18-
* Website: https://sdv.dev
19-
* Documentation: https://sdv.dev/SDV
20-
* [User Guides](https://sdv.dev/SDV/user_guides/index.html)
21-
* [Developer Guides](https://sdv.dev/SDV/developer_guides/index.html)
22-
* Github: https://github.com/sdv-dev/SDV
23-
* License: [MIT](https://github.com/sdv-dev/SDV/blob/master/LICENSE)
24-
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
22+
</div>
2523

26-
## Overview
24+
# Overview
2725

2826
The **Synthetic Data Vault (SDV)** is a **Synthetic Data Generation** ecosystem of libraries
2927
that allows users to easily learn [single-table](
@@ -41,7 +39,27 @@ Underneath the hood it uses several probabilistic graphical modeling and deep le
4139
techniques. To enable a variety of data storage structures, we employ unique
4240
hierarchical generative modeling and recursive sampling techniques.
4341

44-
### Current functionality and features:
42+
| Important Links | |
43+
| -------------------------- | -------------------------------------------------------------- |
44+
| :computer: **[Website]** | Check out the SDV Website for more information about the project. |
45+
| :orange_book: **[SDV Blog]** | Regular publshing of useful content about Synthetic Data Generation. |
46+
| :book: **[Documentation]** | Quickstarts, User and Development Guides, and API Reference. |
47+
| :octocat: **[Repository]** | The link to the Github Repository of this library. |
48+
| :scroll: **[License]** | The entire ecosystem is published under the MIT License. |
49+
| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage. |
50+
| ![](slack.png) **[Community]** | Join our Slack Workspace for announcements and discussions. |
51+
| ![](mybinder.png) **[Tutorials]** | Run the SDV Tutorials in a Binder environment. |
52+
53+
[Website]: https://sdv.dev
54+
[SDV Blog]: https://sdv.dev/blog
55+
[Documentation]: https://sdv.dev/SDV
56+
[Repository]: https://github.com/sdv-dev/SDV
57+
[License]: https://github.com/sdv-dev/SDV/blob/master/LICENSE
58+
[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha
59+
[Community]: https://join.slack.com/t/sdv-space/shared_invite/zt-gdsfcb5w-0QQpFMVoyB2Yd6SRiMplcw
60+
[Tutorials]: https://mybinder.org/v2/gh/sdv-dev/SDV/master?filepath=tutorials
61+
62+
## Current functionality and features:
4563

4664
* Synthetic data generators for [single tables](
4765
https://sdv.dev/SDV/user_guides/single_table/index.html) with the following
@@ -89,7 +107,7 @@ pip install sdv
89107
**Using `conda`:**
90108

91109
```bash
92-
conda install -c sdv-dev -c pytorch -c conda-forge sdv
110+
conda install -c pytorch -c conda-forge sdv
93111
```
94112

95113
For more installation options please visit the [SDV installation Guide](
@@ -254,3 +272,26 @@ Neha Patki, Roy Wedge, Kalyan Veeramachaneni. [The Synthetic Data Vault](https:/
254272
month={Oct}
255273
}
256274
```
275+
276+
---
277+
278+
279+
<div align="center">
280+
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/DataCebo.png"></img></a>
281+
</div>
282+
<br/>
283+
<br/>
284+
285+
The [DataCebo team](https://datacebo.com) is the proud developer of [The Synthetic Data Vault Project](
286+
https://sdv.dev), the largest open source ecosystem for synthetic data generation & evaluation.
287+
The ecosystem is home to multiple libraries that support synthetic data, including:
288+
289+
* 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
290+
* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,
291+
multi table and time series data.
292+
* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data
293+
generation models.
294+
295+
[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully
296+
integrated solution and your one-stop shop for synthetic data.Or, use the standalone libraries
297+
for specific needs.

conda/meta.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{% set name = 'sdv' %}
2-
{% set version = '0.13.0' %}
2+
{% set version = '0.13.1.dev1' %}
33

44
package:
55
name: "{{ name|lower }}"
@@ -28,7 +28,7 @@ requirements:
2828
- ctgan >=0.5.0,<0.6
2929
- deepecho >=0.3.0.post1,<0.4
3030
- rdt >=0.6.1,<0.7
31-
- sdmetrics >=0.4.0,<0.5
31+
- sdmetrics >=0.4.1,<0.5
3232
run:
3333
- graphviz
3434
- python >=3.6,<3.10
@@ -41,7 +41,7 @@ requirements:
4141
- ctgan >=0.5.0,<0.6
4242
- deepecho >=0.3.0.post1,<0.4
4343
- rdt >=0.6.1,<0.7
44-
- sdmetrics >=0.4.0,<0.5
44+
- sdmetrics >=0.4.1,<0.5
4545

4646
about:
4747
home: "https://sdv.dev"

docs/developer_guides/sdv/metadata.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ the following keys.
130130
"fields": {
131131
"social_security_number": {
132132
"type": "categorical",
133-
"pii": True,
133+
"pii": true,
134134
"pii_category": "ssn"
135135
},
136136
...
@@ -180,7 +180,7 @@ A list of all possible localizations can be found on the `Faker documentation si
180180
"fields": {
181181
"address": {
182182
"type": "categorical",
183-
"pii": True,
183+
"pii": true,
184184
"pii_category": "address"
185185
"pii_locales": ["sv_SE", "en_US"]
186186
},
@@ -215,7 +215,7 @@ If a field is specified as a ``primary_key`` of the table, then the field must b
215215
...
216216
}
217217
218-
If the subtype of the primary key is integer, an optional regular expression can be passed to
218+
If the subtype of the primary key is string, an optional regular expression can be passed to
219219
generate keys that match it:
220220

221221
.. code-block:: python

docs/images/CTGAN-DataCebo.png

50.9 KB
Loading

docs/images/Copulas-DataCebo.png

49.7 KB
Loading

docs/images/DataCebo-Blue.png

59.2 KB
Loading

docs/images/DataCebo.png

52.8 KB
Loading

docs/images/DeepEcho-DataCebo.png

45.2 KB
Loading

docs/images/RDT-DataCebo.png

25 KB
Loading

docs/images/SDGym-DataCebo.png

26.6 KB
Loading

docs/images/SDMetrics-DataCebo.png

34.8 KB
Loading

docs/images/SDV-DataCebo.png

21.9 KB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
.. _relational_constraints:
2+
3+
Constraints
4+
===========
5+
6+
SDV supports adding constraints within a single table. See :ref:`single_table_constraints`
7+
for more information about the available single table constraints.
8+
9+
In order to use single-table constraints within a relational model, you can pass
10+
in a list of applicable constraints when adding a table to your relational ``Metadata``.
11+
(See :ref:`relational_metadata` for more information on constructing a ``Metadata`` object.)
12+
13+
In this example, we wish to add a ``UniqueCombinations`` constraint to our ``sessions`` table,
14+
which is a child table of ``users``. First, we will create a ``Metadata`` object and add the
15+
``users`` table.
16+
17+
.. ipython:: python
18+
:okwarning:
19+
20+
from sdv import load_demo, Metadata
21+
22+
tables = load_demo()
23+
24+
metadata = Metadata()
25+
26+
metadata.add_table(
27+
name='users',
28+
data=tables['users'],
29+
primary_key='user_id'
30+
)
31+
32+
The metadata now contains the ``users`` table.
33+
34+
.. ipython:: python
35+
:okwarning:
36+
37+
metadata
38+
39+
Now, we want to add a child table ``sessions`` which contains a single table constraint.
40+
In the ``sessions`` table, we wish to only have combinations of ``(device, os)`` that
41+
appear in the original data.
42+
43+
.. ipython:: python
44+
:okwarning:
45+
46+
from sdv.constraints import UniqueCombinations
47+
48+
constraint = UniqueCombinations(columns=['device', 'os'])
49+
50+
metadata.add_table(
51+
name='sessions',
52+
data=tables['sessions'],
53+
primary_key='session_id',
54+
parent='users',
55+
foreign_key='user_id',
56+
constraints=[constraint],
57+
)
58+
59+
If we get the table metadata for ``sessions``, we can see that the constraint has been added.
60+
61+
.. ipython:: python
62+
:okwarning:
63+
64+
metadata.get_table_meta('sessions')
65+
66+
We can now use this metadata to fit a relational model and synthesize data.
67+
68+
.. ipython:: python
69+
:okwarning:
70+
71+
from sdv.relational import HMA1
72+
73+
model = HMA1(metadata)
74+
model.fit(tables)
75+
new_data = model.sample()
76+
77+
In the sampled data, we should see that our constraint is being satisfied.
78+
79+
.. ipython:: python
80+
:okwarning:
81+
82+
new_data

docs/user_guides/relational/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ Relational Data
1010

1111
data_description
1212
models
13+
constraints

docs/user_guides/single_table/custom_constraints.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Let's look at a demo dataset:
2323
employees = load_tabular_demo()
2424
employees
2525
26-
The dataset defined in :ref:`_single_table_constraints` contains basic details about employees.
26+
The dataset defined in :ref:`handling_constraints` contains basic details about employees.
2727
We will use this dataset to demonstrate how you can create your own constraint.
2828

2929

sdv/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
__author__ = """MIT Data To AI Lab"""
88
__email__ = '[email protected]'
9-
__version__ = '0.13.0'
9+
__version__ = '0.13.1.dev1'
1010

1111
from sdv import constraints, evaluation, metadata, relational, tabular
1212
from sdv.demo import get_available_demos, load_demo

sdv/evaluation.py

-1
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,6 @@ def evaluate(synthetic_data, real_data=None, metadata=None, root_path=None,
133133
synthetic_data = synthetic_data[table]
134134

135135
scores = sdmetrics.compute_metrics(metrics, real_data, synthetic_data, metadata=metadata)
136-
scores.dropna(inplace=True)
137136

138137
if aggregate:
139138
return scores.normalized_score.mean()

sdv/metadata/dataset.py

+16-2
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import pandas as pd
1111
from rdt import HyperTransformer, transformers
1212

13+
from sdv.constraints import Constraint
1314
from sdv.metadata import visualization
1415
from sdv.metadata.errors import MetadataError
1516

@@ -871,7 +872,7 @@ def _get_field_details(self, data, fields):
871872
return fields_metadata
872873

873874
def add_table(self, name, data=None, fields=None, fields_metadata=None,
874-
primary_key=None, parent=None, foreign_key=None):
875+
primary_key=None, parent=None, foreign_key=None, constraints=None):
875876
"""Add a new table to this metadata.
876877
877878
``fields`` list can be a mixture of field names, which will be build automatically
@@ -902,7 +903,10 @@ def add_table(self, name, data=None, fields=None, fields_metadata=None,
902903
parent (str):
903904
Table name to refere a foreign key field. Defaults to ``None``.
904905
foreign_key (str):
905-
Foreing key field name to ``parent`` table primary key. Defaults to ``None``.
906+
Foreign key field name to ``parent`` table primary key. Defaults to ``None``.
907+
constraints (list[Constraint, dict]):
908+
List of Constraint objects or dicts representing the constraints for the
909+
given table.
906910
907911
Raises:
908912
ValueError:
@@ -938,6 +942,16 @@ def add_table(self, name, data=None, fields=None, fields_metadata=None,
938942

939943
self._metadata['tables'][name] = table_metadata
940944

945+
if constraints:
946+
meta_constraints = []
947+
for constraint in constraints:
948+
if isinstance(constraint, Constraint):
949+
meta_constraints.append(constraint.to_dict())
950+
else:
951+
meta_constraints.append(constraint)
952+
953+
table_metadata['constraints'] = meta_constraints
954+
941955
try:
942956
if primary_key:
943957
self.set_primary_key(name, primary_key)

0 commit comments

Comments
 (0)