[CT-983] Support dbt Python models in OSS Apache Spark #510

jtcohen6 · 2022-08-03T12:51:43Z

The current implementation depends on Databricks APIs that are not available in OSS Apache Spark. We would like help from knowledgeable and interested community members, who could spec out an implementation using Spark-only functionality.

The entry point is submit_python_job:

https://github.com/dbt-labs/dbt-spark/blob/7f6cffecf38b7c41aa441eb020d464ba1e20bf9e/dbt/adapters/spark/impl.py#L392

potentially useful Spark doc: Submitting Applications

The text was updated successfully, but these errors were encountered:

lostmygithubaccount · 2022-11-01T17:24:45Z

let us know if you'd like to help on this issue!

Waltherr · 2022-11-03T07:41:45Z

Hi Cody, thanks for reaching out. Yes, I would like to help, but my current time available is really strechted. Regards Sebastian Am Di., 1. Nov. 2022 um 18:24 Uhr schrieb Cody Peterson < ***@***.***>:

…

let us know if you'd like to help on this issue! — Reply to this email directly, view it on GitHub <#510>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGWEUZAANPKZONHDPDPMJMTWGFG6PANCNFSM55OXUO6Q> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Adricu8 · 2023-02-28T14:43:27Z

Hi, is there any update or current plan on this?

huydeelll · 2023-11-21T04:53:32Z

Would like to vouch that this will be an important features, and open dbt up to a different data engineers who mostly work with Spark but at the same time wanted the rigor and data quality framework of dbt.

timvw · 2023-11-21T07:32:48Z

I have some non-production grady sample that uses the same approach as duckdb to run python models.. https://github.com/timvw/dbt-spark/tree/support-sparksession-python-local

amychen1776 · 2025-02-07T16:19:04Z

I think there's a very meaningful approach we should think about in terms of how we expand support for Spark in dbt. For scalability and maintenance, it would make more sense to split apart the adapter to support specific Spark services but we haven't have capacity to prioritize this work yet. Due to the age of this issue and this broader plan, I'm going to go ahead and close this issue.

jtcohen6 added type:enhancement New feature request help-wanted Extra attention is needed feature:python-models Issues related to python models labels Aug 3, 2022

github-actions bot changed the title ~~Support dbt Python models in OSS Apache Spark~~ [CT-983] Support dbt Python models in OSS Apache Spark Aug 3, 2022

mikealfare added the pkg:dbt-spark Issue affects dbt-spark label Jan 13, 2025

mikealfare transferred this issue from dbt-labs/dbt-spark Jan 13, 2025

amychen1776 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-983] Support dbt Python models in OSS Apache Spark #510

[CT-983] Support dbt Python models in OSS Apache Spark #510

jtcohen6 commented Aug 3, 2022 •

edited by dataders

Loading

lostmygithubaccount commented Nov 1, 2022

Waltherr commented Nov 3, 2022 via email

Adricu8 commented Feb 28, 2023

huydeelll commented Nov 21, 2023

timvw commented Nov 21, 2023

amychen1776 commented Feb 7, 2025

[CT-983] Support dbt Python models in OSS Apache Spark #510

[CT-983] Support dbt Python models in OSS Apache Spark #510

Comments

jtcohen6 commented Aug 3, 2022 • edited by dataders Loading

lostmygithubaccount commented Nov 1, 2022

Waltherr commented Nov 3, 2022 via email

Adricu8 commented Feb 28, 2023

huydeelll commented Nov 21, 2023

timvw commented Nov 21, 2023

amychen1776 commented Feb 7, 2025

jtcohen6 commented Aug 3, 2022 •

edited by dataders

Loading