Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-983] Support dbt Python models in OSS Apache Spark #510

Closed
jtcohen6 opened this issue Aug 3, 2022 · 6 comments
Closed

[CT-983] Support dbt Python models in OSS Apache Spark #510

jtcohen6 opened this issue Aug 3, 2022 · 6 comments
Labels
feature:python-models Issues related to python models help-wanted Extra attention is needed pkg:dbt-spark Issue affects dbt-spark type:enhancement New feature request

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 3, 2022

Context: dbt-labs/dbt-spark#407

The current implementation depends on Databricks APIs that are not available in OSS Apache Spark. We would like help from knowledgeable and interested community members, who could spec out an implementation using Spark-only functionality.

The entry point is submit_python_job:

https://github.com/dbt-labs/dbt-spark/blob/7f6cffecf38b7c41aa441eb020d464ba1e20bf9e/dbt/adapters/spark/impl.py#L392

potentially useful Spark doc: Submitting Applications

@jtcohen6 jtcohen6 added type:enhancement New feature request help-wanted Extra attention is needed feature:python-models Issues related to python models labels Aug 3, 2022
@github-actions github-actions bot changed the title Support dbt Python models in OSS Apache Spark [CT-983] Support dbt Python models in OSS Apache Spark Aug 3, 2022
@lostmygithubaccount
Copy link

let us know if you'd like to help on this issue!

@Waltherr
Copy link

Waltherr commented Nov 3, 2022 via email

@Adricu8
Copy link

Adricu8 commented Feb 28, 2023

Hi, is there any update or current plan on this?

@huydeelll
Copy link

Would like to vouch that this will be an important features, and open dbt up to a different data engineers who mostly work with Spark but at the same time wanted the rigor and data quality framework of dbt.

@timvw
Copy link

timvw commented Nov 21, 2023

I have some non-production grady sample that uses the same approach as duckdb to run python models.. https://github.com/timvw/dbt-spark/tree/support-sparksession-python-local

@mikealfare mikealfare added the pkg:dbt-spark Issue affects dbt-spark label Jan 13, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-spark Jan 13, 2025
@amychen1776
Copy link
Contributor

I think there's a very meaningful approach we should think about in terms of how we expand support for Spark in dbt. For scalability and maintenance, it would make more sense to split apart the adapter to support specific Spark services but we haven't have capacity to prioritize this work yet. Due to the age of this issue and this broader plan, I'm going to go ahead and close this issue.

@amychen1776 amychen1776 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:python-models Issues related to python models help-wanted Extra attention is needed pkg:dbt-spark Issue affects dbt-spark type:enhancement New feature request
Projects
None yet
Development

No branches or pull requests

8 participants