Skip to content

fix: restrict Substrait physical local file imports#23172

Open
ametel01 wants to merge 2 commits into
apache:mainfrom
ametel01:fix-substrait-local-file-policy
Open

fix: restrict Substrait physical local file imports#23172
ametel01 wants to merge 2 commits into
apache:mainfrom
ametel01:fix-substrait-local-file-policy

Conversation

@ametel01

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Imported Substrait physical plans currently turn ReadRel.LocalFiles entries into a local filesystem-backed Parquet DataSourceExec without requiring the embedding host to approve the referenced local paths. In hosts that accept physical Substrait plans from lower-trust callers, that can let serialized plan input select process-local Parquet files outside intended dataset roots.

This change makes local file access during physical plan import explicit instead of ambient.

What changes are included in this PR?

  • Adds PhysicalPlanConsumerOptions for Substrait physical plan import.
  • Keeps from_substrait_rel as a default-deny wrapper for local file imports.
  • Adds from_substrait_rel_with_options so callers can opt in with allowed local file roots.
  • Canonicalizes imported local file paths and configured roots before comparing them.
  • Rejects local file globs and folders in physical plan import rather than accepting them without a policy.
  • Updates physical roundtrip tests to pass explicit allowed roots.
  • Adds regression coverage for missing allowed roots and paths outside the allowed root.

Are these changes tested?

Yes.

  • cargo fmt --all --check
  • cargo test -p datafusion-substrait
  • cargo clippy -p datafusion-substrait --all-targets --all-features -- -D warnings
  • cargo clippy --all-targets --all-features -- -D warnings

Are there any user-facing changes?

Yes. Substrait physical plan consumers that import ReadRel.LocalFiles now need to call from_substrait_rel_with_options and explicitly configure allowed local file roots. The existing from_substrait_rel API no longer imports local files by default.

This is an intentional security hardening change. It may require the api change label because it changes behavior for the existing public API and adds a new opt-in API.

@github-actions github-actions Bot added the substrait Changes to the substrait crate label Jun 24, 2026
@ametel01 ametel01 marked this pull request as ready for review June 24, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Substrait physical plan import should not accept arbitrary local file paths

1 participant