Skip to content

Commit 45e8bde

Browse files
jdye64ayushdgcharlesblucaJeremy Dyersarahyurick
authored
[REVIEW] Bump Arrow DataFusion Python dependency to 28.0.0 (#1181)
* Bump ADP -> 26.0.0 * warn on optimization failure instead of erroring and exiting * Resolve initial build errors * Switch to crates release, add zlib to host/build deps * Add zlib to aarch build deps * Bump to ADP 27 and introduce support for wildcard expressions, a wildcard expression name will be subbed with the first column in the incoming schema plan * remove bit of logic that is no longer needed to manually check the wildcard 'name' as a '*' * experiment with removing zlib, hoping that fixes os x build * Change expected_df result to 1.5 from 1. 3/2 is in fact 1.5 and not 1 * Fix cargo test * add .cargo/config.toml in hopes of fixing linker build issues on osx * Remove extra config.toml * Try overriding runner-installed toolchain * Revert "Try overriding runner-installed toolchain" This reverts commit b2e85df. * Initial migration to maturin build system * Make some modifications to Rust package name * Adjust native library name from _.internal to dask_planner * Resolve initial conda build issues * Replace setuptools-rust with maturin in CI * Constrain maturin, remove setuptools-rust from CI envs * Update docs and Rust CI * Remove more dask_planner appearances * Bump pyarrow min version to resolve 3.8 conflicts * test commit seeing how CI will respond without cmd_loop import * Rename module to _datafusion_lib * Switch to maturin develop for CI installs * Fix failing cargo tests, changed output, from datafusion version bump * Fix cargo test syntax issue * Fix failing Rust tests * Remove linux config.toml options * Fix Rust object import * Apply code suggestions * Bump to recent ADP commit * Initial unblocker for pyarrow string handling * Compatibility code for old or no pyarrow installation * Added RexCall Operation to handle InSubquery Expr and also adjusted column_name function to examine InSubquery nested Expr instance for name * Add Sarah's fix for datetime.time error * Add condition to guard against complex function names that contain a '.' in their column name * unmarked xfail for queries 6, 9, & 54 * Quick fix for pydantic upstream breakage * Update dask_sql/physical/utils/filter.py Co-authored-by: Sarah Yurick <[email protected]> * Apply Sarah's suggestions * Attempt to unblock failures at parse_datetime * Disable pyarrow strings for now * Remove breakpoint * Remove pydantic constraint now that fastapi is bumped * Apply pyproject suggestions * Bump build system to maturin 1.1 * Move filter datetime handling, remove string datetime handling for now * Actually check containment in InSubquery call * bring back decorrelated_where_exists and decorrelate_when_in * Checkstyle fixes * Remove xfail for queries 58 and 61 which pass now * Fix pytest syntax issue * whatever, have it your way black * Remove debugging println * re-add support for ilike using the case_insensitive member of like * Handle non-decimal scalar args for cuDF in RexCall * Try using maturin with zig for wheel builds * Install protoc for all wheel builds and zlib1g-dev in linux builds * Remove Cargo tests because that code is already being tested in DataFusion anyway * Adjust optimizer/utils test includes * Adjust import path for doctest * Adjust import path for doctest (more) * Check if zlib is installed on ubuntu runners * Try invoking maturin directly for conda builds * Revert "Try invoking maturin directly for conda builds" This reverts commit 24f465f. * Install protoc via apt * Add zlib to conda environment so that conda install c-compiler can locate the necessary zlib header files * Remove pytest coalesce option for Sum(b) with a string conditional result as that is not valid sql in some cases * Revert "Install protoc via apt" This reverts commit 488cbaf. * Try not using zig for x86_64 builds * Try installing protoc from apt again * Revert "Try installing protoc from apt again" This reverts commit 66ebed4. * Try explicitly setting PROTOC location for x86_64 builds * Where is protoc? * Fix protoc binary location * Disable docker container for linux x86_64 build * Properly upload artifacts for ARM/intel * Disable aarch64 builds for now * Constrain mlflow to avoid import error * Set wheel tags to manylinux_2_17 * Use manylinux docker container for x86_64 builds * No sudo for protoc installation * Install protoc directly from github * Specify PROTOC environment variable for x86_64 runs * More doc updates to reflect new installation style * Fix docker builds * Bump ADP to stable 28.0.0 --------- Co-authored-by: Ayush Dattagupta <[email protected]> Co-authored-by: Charles Blackmon-Luca <[email protected]> Co-authored-by: Jeremy Dyer <[email protected]> Co-authored-by: Sarah Yurick <[email protected]>
1 parent 3fad76b commit 45e8bde

File tree

140 files changed

+2169
-1957
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

140 files changed

+2169
-1957
lines changed
File renamed without changes.

.github/CODEOWNERS

+4-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,7 @@
22
* @ayushdg @charlesbluca @galipremsagar
33

44
# rust codeowners
5-
dask_planner/ @ayushdg @charlesbluca @galipremsagar @jdye64
5+
.cargo/ @ayushdg @charlesbluca @galipremsagar @jdye64
6+
src/ @ayushdg @charlesbluca @galipremsagar @jdye64
7+
Cargo.toml @ayushdg @charlesbluca @galipremsagar @jdye64
8+
Cargo.lock @ayushdg @charlesbluca @galipremsagar @jdye64

.github/workflows/conda.yml

+6-5
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,9 @@ on:
66
pull_request:
77
paths:
88
- setup.py
9-
- dask_planner/Cargo.toml
10-
- dask_planner/Cargo.lock
11-
- dask_planner/pyproject.toml
12-
- dask_planner/rust-toolchain.toml
9+
- Cargo.toml
10+
- Cargo.lock
11+
- pyproject.toml
1312
- continuous_integration/recipe/**
1413
- .github/workflows/conda.yml
1514
schedule:
@@ -34,7 +33,9 @@ jobs:
3433
fail-fast: false
3534
matrix:
3635
python: ["3.8", "3.9", "3.10"]
37-
arch: ["linux-64", "linux-aarch64"]
36+
# FIXME: aarch64 builds are consuming too much memory to run on GHA
37+
# arch: ["linux-64", "linux-aarch64"]
38+
arch: ["linux-64"]
3839
steps:
3940
- uses: actions/checkout@v3
4041
with:

.github/workflows/release.yml

+113-65
Original file line numberDiff line numberDiff line change
@@ -15,111 +15,159 @@ concurrency:
1515
env:
1616
upload: ${{ github.event_name == 'release' && github.repository == 'dask-contrib/dask-sql' }}
1717

18-
# Required shell entrypoint to have properly activated conda environments
19-
defaults:
20-
run:
21-
shell: bash -l {0}
22-
2318
jobs:
24-
wheels:
25-
name: Build and publish py3.${{ matrix.python }} wheels on ${{ matrix.os }}
26-
runs-on: ${{ matrix.os }}
19+
linux:
20+
name: Build and publish wheels for linux ${{ matrix.target }}
21+
runs-on: ubuntu-latest
2722
strategy:
2823
fail-fast: false
2924
matrix:
30-
os: [ubuntu-latest, windows-latest, macos-latest]
31-
python: ["8", "9", "10"] # 3.x
25+
target: [x86_64, aarch64]
3226
steps:
3327
- uses: actions/checkout@v3
28+
- name: Install Protoc
29+
uses: arduino/setup-protoc@v1
30+
if: matrix.target == 'aarch64'
31+
with:
32+
version: '3.x'
33+
repo-token: ${{ secrets.GITHUB_TOKEN }}
34+
- uses: actions/setup-python@v4
35+
with:
36+
python-version: '3.10'
37+
- name: Build wheels for x86_64
38+
if: matrix.target == 'x86_64'
39+
uses: PyO3/maturin-action@v1
40+
with:
41+
target: ${{ matrix.target }}
42+
args: --release --out dist
43+
sccache: 'true'
44+
manylinux: '2_17'
45+
before-script-linux: >
46+
DOWNLOAD_URL=$(curl --retry 6 --retry-delay 10 -s https://api.github.com/repos/protocolbuffers/protobuf/releases/latest | grep -o '"browser_download_url": "[^"]*' | cut -d'"' -f4 | grep "\linux-x86_64.zip$") &&
47+
curl --retry 6 --retry-delay 10 -LO $DOWNLOAD_URL &&
48+
unzip protoc-*-linux-x86_64.zip -d $HOME/.local
49+
docker-options: --env PROTOC=/root/.local/bin/protoc
50+
- name: Build wheels for aarch64
51+
if: matrix.target == 'aarch64'
52+
uses: PyO3/maturin-action@v1
53+
with:
54+
target: ${{ matrix.target }}
55+
args: --release --out dist --zig
56+
sccache: 'true'
57+
manylinux: '2_17'
58+
- name: Check dist files
59+
run: |
60+
pip install twine
61+
62+
twine check dist/*
63+
ls -lh dist/
64+
- name: Upload binary wheels
65+
uses: actions/upload-artifact@v3
3466
with:
35-
fetch-depth: 0
67+
name: wheels for linux ${{ matrix.target }}
68+
path: dist/*
69+
- name: Publish package
70+
if: env.upload == 'true'
71+
env:
72+
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
73+
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
74+
run: twine upload dist/*
75+
76+
windows:
77+
name: Build and publish wheels for windows
78+
runs-on: windows-latest
79+
steps:
80+
- uses: actions/checkout@v3
3681
- name: Install Protoc
37-
if: matrix.os != 'ubuntu-latest'
3882
uses: arduino/setup-protoc@v1
3983
with:
4084
version: '3.x'
4185
repo-token: ${{ secrets.GITHUB_TOKEN }}
42-
- name: Set up QEMU for linux-aarch64
43-
if: matrix.os == 'ubuntu-latest'
44-
uses: docker/setup-qemu-action@v2
86+
- uses: actions/setup-python@v4
4587
with:
46-
platforms: arm64
47-
- name: Add rust toolchain target for macos-aarch64
48-
if: matrix.os == 'macos-latest'
49-
run: rustup target add aarch64-apple-darwin
88+
python-version: '3.10'
89+
architecture: x64
5090
- name: Build wheels
51-
uses: pypa/[email protected]
91+
uses: PyO3/maturin-action@v1
92+
with:
93+
target: x64
94+
args: --release --out dist
95+
sccache: 'true'
96+
- name: Check dist files
97+
run: |
98+
pip install twine
99+
100+
twine check dist/*
101+
ls dist/
102+
- name: Upload binary wheels
103+
uses: actions/upload-artifact@v3
104+
with:
105+
name: wheels for windows
106+
path: dist/*
107+
- name: Publish package
108+
if: env.upload == 'true'
52109
env:
53-
CIBW_BUILD: 'cp3${{ matrix.python }}-*'
54-
CIBW_SKIP: '*musllinux*'
55-
CIBW_ARCHS_LINUX: 'aarch64 x86_64'
56-
CIBW_ARCHS_WINDOWS: 'AMD64'
57-
CIBW_ARCHS_MACOS: 'x86_64 arm64'
58-
# Without CARGO_NET_GIT_FETCH_WITH_CLI we oom (https://github.com/rust-lang/cargo/issues/10583)
59-
CIBW_ENVIRONMENT_LINUX: >
60-
CARGO_NET_GIT_FETCH_WITH_CLI="true"
61-
PATH="$HOME/.cargo/bin:$HOME/.local/bin:$PATH"
62-
CIBW_ENVIRONMENT_WINDOWS: 'PATH="$UserProfile\.cargo\bin;$PATH"'
63-
CIBW_BEFORE_BUILD: 'pip install -U setuptools-rust'
64-
CIBW_BEFORE_BUILD_LINUX: >
65-
ARCH=$([ $(uname -m) == x86_64 ] && echo x86_64 || echo aarch_64) &&
66-
DOWNLOAD_URL=$(curl --retry 6 --retry-delay 10 -s https://api.github.com/repos/protocolbuffers/protobuf/releases/latest | grep -o '"browser_download_url": "[^"]*' | cut -d'"' -f4 | grep "\linux-${ARCH}.zip$") &&
67-
curl --retry 6 --retry-delay 10 -LO $DOWNLOAD_URL &&
68-
unzip protoc-*-linux-$ARCH.zip -d $HOME/.local &&
69-
protoc --version &&
70-
pip install -U setuptools-rust &&
71-
pip list &&
72-
curl --retry 6 --retry-delay 10 https://sh.rustup.rs -sSf | sh -s -- --default-toolchain=stable --profile=minimal -y &&
73-
rustup show
110+
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
111+
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
112+
run: twine upload dist/*
113+
114+
macos:
115+
name: Build and publish wheels for macos ${{ matrix.target }}
116+
runs-on: macos-latest
117+
strategy:
118+
fail-fast: false
119+
matrix:
120+
target: [x86_64, aarch64]
121+
steps:
122+
- uses: actions/checkout@v3
123+
- name: Install Protoc
124+
uses: arduino/setup-protoc@v1
125+
with:
126+
version: '3.x'
127+
repo-token: ${{ secrets.GITHUB_TOKEN }}
128+
- uses: actions/setup-python@v4
74129
with:
75-
package-dir: .
76-
output-dir: dist
77-
config-file: "dask_planner/pyproject.toml"
78-
- name: Set up Python
79-
uses: conda-incubator/[email protected]
130+
python-version: '3.10'
131+
- name: Build wheels
132+
uses: PyO3/maturin-action@v1
80133
with:
81-
miniforge-variant: Mambaforge
82-
use-mamba: true
83-
python-version: "3.8"
84-
channel-priority: strict
134+
target: ${{ matrix.target }}
135+
args: --release --out dist
136+
sccache: 'true'
85137
- name: Check dist files
86138
run: |
87-
mamba install twine
139+
pip install twine
88140
89141
twine check dist/*
90142
ls -lh dist/
91143
- name: Upload binary wheels
92144
uses: actions/upload-artifact@v3
93145
with:
94-
name: wheels for py3.${{ matrix.python }} on ${{ matrix.os }}
146+
name: wheels for macos ${{ matrix.target }}
95147
path: dist/*
96148
- name: Publish package
97149
if: env.upload == 'true'
98150
env:
99151
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
100152
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
101153
run: twine upload dist/*
154+
102155
sdist:
103-
name: Build and publish source distribution
104156
runs-on: ubuntu-latest
105157
steps:
106158
- uses: actions/checkout@v3
159+
- name: Build sdist
160+
uses: PyO3/maturin-action@v1
107161
with:
108-
fetch-depth: 0
109-
- name: Set up Python
110-
uses: conda-incubator/setup-[email protected]
162+
command: sdist
163+
args: --out dist
164+
- uses: actions/setup-python@v4
111165
with:
112-
miniforge-variant: Mambaforge
113-
use-mamba: true
114-
python-version: "3.8"
115-
channel-priority: strict
116-
- name: Build source distribution
117-
run: |
118-
mamba install setuptools-rust twine
119-
120-
python setup.py sdist
166+
python-version: '3.10'
121167
- name: Check dist files
122168
run: |
169+
pip install twine
170+
123171
twine check dist/*
124172
ls -lh dist/
125173
- name: Publish source distribution

.github/workflows/rust.yml

-5
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@ jobs:
5151
- name: Optionally update upstream dependencies
5252
if: needs.detect-ci-trigger.outputs.triggered == 'true'
5353
run: |
54-
cd dask_planner
5554
bash update-dependencies.sh
5655
- name: Install Protoc
5756
uses: arduino/setup-protoc@v1
@@ -60,11 +59,9 @@ jobs:
6059
repo-token: ${{ secrets.GITHUB_TOKEN }}
6160
- name: Check workspace in debug mode
6261
run: |
63-
cd dask_planner
6462
cargo check
6563
- name: Check workspace in release mode
6664
run: |
67-
cd dask_planner
6865
cargo check --release
6966
7067
# test the crate
@@ -84,7 +81,6 @@ jobs:
8481
- name: Optionally update upstream dependencies
8582
if: needs.detect-ci-trigger.outputs.triggered == 'true'
8683
run: |
87-
cd dask_planner
8884
bash update-dependencies.sh
8985
- name: Install Protoc
9086
uses: arduino/setup-protoc@v1
@@ -93,5 +89,4 @@ jobs:
9389
repo-token: ${{ secrets.GITHUB_TOKEN }}
9490
- name: Run tests
9591
run: |
96-
cd dask_planner
9792
cargo test

.github/workflows/test-upstream.yml

+1-4
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,10 @@ jobs:
6868
- name: Optionally update upstream cargo dependencies
6969
if: env.which_upstream == 'DataFusion'
7070
run: |
71-
cd dask_planner
7271
bash update-dependencies.sh
7372
- name: Build the Rust DataFusion bindings
7473
run: |
75-
python setup.py build install
74+
maturin develop
7675
- name: Install hive testing dependencies
7776
if: matrix.os == 'ubuntu-latest'
7877
run: |
@@ -124,11 +123,9 @@ jobs:
124123
env:
125124
UPDATE_ALL_CARGO_DEPS: false
126125
run: |
127-
cd dask_planner
128126
bash update-dependencies.sh
129127
- name: Install dependencies and nothing else
130128
run: |
131-
mamba install setuptools-rust
132129
pip install -e . -vv
133130
134131
which python

.github/workflows/test.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ jobs:
7272
shared-key: test
7373
- name: Build the Rust DataFusion bindings
7474
run: |
75-
python setup.py build install
75+
maturin develop
7676
- name: Install hive testing dependencies
7777
if: matrix.os == 'ubuntu-latest'
7878
run: |
@@ -118,7 +118,6 @@ jobs:
118118
repo-token: ${{ secrets.GITHUB_TOKEN }}
119119
- name: Install dependencies and nothing else
120120
run: |
121-
mamba install "setuptools-rust>=1.5.2"
122121
pip install -e . -vv
123122
124123
which python

.gitignore

+1-9
Original file line numberDiff line numberDiff line change
@@ -46,23 +46,15 @@ venv
4646
# IDE
4747
.idea
4848
.vscode
49-
planner/.classpath
50-
planner/.project
51-
planner/.settings/
52-
planner/.idea
53-
planner/*.iml
5449
*.swp
5550

5651
# project specific
57-
planner/dependency-reduced-pom.xml
58-
planner/target/
59-
dask_sql/jar
60-
.next/
6152
dask-worker-space/
6253
node_modules/
6354
docs/source/_build/
6455
tests/unit/queries
6556
tests/unit/data
57+
target/*
6658

6759
# Ignore development specific local testing files
6860
dev_tests

.pre-commit-config.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ repos:
2020
rev: v1.0
2121
hooks:
2222
- id: cargo-check
23-
args: ['--manifest-path', './dask_planner/Cargo.toml', '--verbose', '--']
23+
args: ['--manifest-path', './Cargo.toml', '--verbose', '--']
2424
- id: clippy
25-
args: ['--manifest-path', './dask_planner/Cargo.toml', '--verbose', '--', '-D', 'warnings']
25+
args: ['--manifest-path', './Cargo.toml', '--verbose', '--', '-D', 'warnings']
2626
- repo: https://github.com/pre-commit/pre-commit-hooks
2727
rev: v4.2.0
2828
hooks:
@@ -39,4 +39,4 @@ repos:
3939
entry: cargo +nightly fmt
4040
language: system
4141
types: [rust]
42-
args: ['--manifest-path', './dask_planner/Cargo.toml', '--verbose', '--']
42+
args: ['--manifest-path', './Cargo.toml', '--verbose', '--']

0 commit comments

Comments
 (0)