Name		Name	Last commit message	Last commit date
parent directory ..
dataverse		dataverse
evaluation		evaluation
.gitattributes		.gitattributes
PUMS.csv		PUMS.csv
PUMS.yaml		PUMS.yaml
PUMS_dup.csv		PUMS_dup.csv
PUMS_dup.yaml		PUMS_dup.yaml
PUMS_dup_twotable.yaml		PUMS_dup_twotable.yaml
PUMS_dup_twotable_reverse.yaml		PUMS_dup_twotable_reverse.yaml
PUMS_large.csv		PUMS_large.csv
PUMS_large.yaml		PUMS_large.yaml
PUMS_null.csv		PUMS_null.csv
PUMS_pid.csv		PUMS_pid.csv
PUMS_pid.yaml		PUMS_pid.yaml
PUMS_two_table.yaml		PUMS_two_table.yaml
README.md		README.md
askreddit.csv.zip		askreddit.csv.zip
clean_askreddit.csv		clean_askreddit.csv
create_example_dataset.py		create_example_dataset.py
d1.csv		d1.csv
d2.csv		d2.csv
example.csv		example.csv
example.yaml		example.yaml
iris.csv		iris.csv
iris.yaml		iris.yaml
reddit.csv		reddit.csv
reddit.yaml		reddit.yaml
simulation.csv		simulation.csv

README.md

Datasets for Unit Tests

CSV files with associated metadata (.yaml).

PUMS: A 1000 row sample from PUMS (US Census Public Use Microdata). Metadata has row_privacy set.
PUMS_pid: A 1000 row sample from PUMS. Has an extra column, pid, a primary key that can be used to bound user contribution.
PUMS_large: A sample of 1.2 million records from PUMS, which includes a primary key (PersonId) and slightly different schema
PUMS_null: Same as PUMS_pid, with values randomly missing. Useful for testing nullable support.
iris: The standard iris dataset
reddit: A collection of n-grams from reddit posts

Downloading Datasets

The datasets will be automatically downloaded the first time you run pytest tests under sql/. To download the test datasets without running unit tests, you can do the following:

cd sql
pip install -r tests/requirements.txt
python tests/check_databases.py

You are encouraged to use these datasets in unit tests where the data can be accessed from a CSV. Some of these datasets are also loaded automatically into the SQL database engines installed into engine-specific GitHub Actions images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

README.md

Datasets for Unit Tests

Downloading Datasets

Files

datasets

Directory actions

More options

Directory actions

More options

Latest commit

History

datasets

Folders and files

parent directory

README.md

Datasets for Unit Tests

Downloading Datasets