Skip to content

Commit

Permalink
Kubeflow spark operator launch blog
Browse files Browse the repository at this point in the history
Signed-off-by: Vara Bonthu <[email protected]>
  • Loading branch information
vara-bonthu committed Apr 14, 2024
1 parent 7b921bd commit 339630b
Showing 1 changed file with 138 additions and 0 deletions.
138 changes: 138 additions & 0 deletions _posts/2024-04-15-kubeflow-spark-operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: "Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community"
layout: post
toc: false
comments: true
image:
hide: false
categories: [operators]
permalink:
author: "<a href='https://www.linkedin.com/in/varaprofile/'>Vara Bonthu</a>, <a href='https://www.linkedin.com/in/yuchaoran/'>Chaoran Yu</a>, <a href='https://www.linkedin.com/in/andrey-velichkevich/'>Andrey Velichkevich</a>, <a href='https://www.linkedin.com/in/wielgusmarcin/'>Marcin Wielgus</a>"
---

We're excited to announce the migration of Google's Spark Operator to
the [Kubeflow Spark Operator](https://github.com/kubeflow/spark-operator),
marking the launch of a significant addition to the [Kubeflow](https://www.kubeflow.org/) ecosystem. The
Kubeflow Spark Operator simplifies the deployment and management of
[Apache
Spark](https://spark.apache.org/docs/latest/index.html)
applications on [Kubernetes](https://kubernetes.io/). This
announcement isn't just about a new piece of technology, it's about
building a stronger, open-governed, and more collaborative community
around Spark on Kubernetes.

## The Journey to Kubeflow Spark Operator

The journey of the Kubeflow Spark Operator began with Google Cloud
Platform's Spark on Kubernetes Operator
(https://cloud.google.com/blog/products/data-analytics/data-analytics-meet-containers-kubernetes-operator-for-apache-spark-now-in-beta).
With over 2.3k stars and 1.3k forks on GitHub, this project laid the
foundation for a robust Spark on Kubernetes experience, enabling users
to deploy Spark workloads seamlessly across Kubernetes clusters.

Growth and innovation require not just code but also community.
Acknowledging the resource and time limitations faced by Google Cloud's
original maintainers, Kubeflow has taken up the mantle.This transition
is not merely administrative but a strategic move towards fostering a
vibrant, diverse, and more actively engaged community.

## Why Kubeflow?

- **Enhanced Community Engagement:** Transitioning to Kubeflow opens
the door to a broader developer base, encouraging contributions and
collaboration. Since Kubeflow is a CNCF incubating project this
transition will help consolidate Cloud Native and Spark communities
to work more closely to build robust infrastructure to run Spark
applications on Kubernetes.

- **Stronger Governance**: Kubeflow's governance model provides a
structured environment for decision-making and project management,
ensuring sustainable growth for the Spark Operator.

- **Unified Ecosystem**: By bringing the Spark Operator under the
Kubeflow umbrella, we're not just merging projects; we're building
a cohesive ecosystem that enhances the Spark on Kubernetes
experience.

- **Integration with AI/ML:** Kubeflow provides several components to
address many stages of the AI/ML lifecycle. The Spark distributed
data processing capabilities are a natural expansion, allowing the
Spark community to closely collaborate and better integrate within
the end-to-end ML lifecycle.

## What's Next?

We are dedicated to not just maintaining but enhancing the Kubeflow
Spark Operator for the long term. Here's what you can look forward to:

- **Upcoming roadmap**: As part of the first release, we aim to update
the documentation with references to Kubeflow, address GitHub
workflow issues, and update the container registry with Kubeflow,
along with any other critical issues.

- **Ongoing Support and Enhancements**: At the time of migration to
the Kubeflow repository, the repository comprised 450+ issues and
60+ pull requests. We kindly request contributors to rebase their
code and update the PR with a comment indicating its continued
relevance. As for open issues, they will be considered for
resolution as the broader community and contributors engage in
upcoming releases.The operator will continue to evolve,
incorporating new features and improvements to stay at the forefront
of Kubernetes deployments.

- **Rich Community Resources**: From detailed documentation to
hands-on tutorials, we're crafting resources to help you succeed
with the Spark Operator. We are planning to host regular Spark
Operator calls to discuss users issues, questions, and future
roadmaps.

- **Open Doors for Contributions**: This is a call to arms for
developers, writers, and enthusiasts! Your contributions are the
lifeblood of this project, and there's a place for everyone to make
a mark.

- **Kubeflow Working Group Data:** To consolidate efforts around new
data tools in the Kubeflow ecosystem such as Spark Operator and
Model Registry the new Working Group Data will be formalized soon.
Feel free to review [this PR](https://github.com/kubeflow/community/pull/673) to
get involved and provide your feedback on the charter.

## Join the Movement

The Kubeflow Spark Operator is more than just software. It's a
community endeavor. Here's how you can be a part of this journey:

- **Dive In**: Visit our [GitHub repository](https://github.com/kubeflow/spark-operator)
to start your journey with the Kubeflow Spark Operator.

- **Contribute**: Every code snippet, documentation update, and piece
of feedback counts. Find out how you can contribute on GitHub.

- **Be Part of the Community**: Join the conversation in the
[#kubeflow-spark-operator](https://kubeflow.slack.com/archives/C06627U3XU3)
channel on Kubeflow Slack. Whether you're seeking advice, sharing
insights, or just listening in, your presence enriches us. Follow
[this guide](https://www.kubeflow.org/docs/about/community/)
to join Kubeflow Slack and learn more about Kubeflow community.

- **Spark Operator Community Call:** We are planning to host regular
community calls for Spark Operator questions and roadmap. Please
help us to find the best time by taking this poll:
[https://forms.gle/foTkwkuv3U4gk7M77](https://forms.gle/foTkwkuv3U4gk7M77)

In the spirit of collaboration fostered on platforms like Slack, and
with the generous support of the Google Cloud team, we're set to sail
into a promising future. The Kubeflow Spark Operator isn't just a tool,
it's our collective step towards harnessing the true potential of Spark
on Kubernetes. Together, let's shape the future of cloud-native big
data processing.

**_Reference Issues_**

- [Action items for adoption of Spark Kubernetes Operator in Kubeflow](https://github.com/kubeflow/spark-operator/issues/1928#issue-2066490838)

- [WG Data(name provisional)proposal](https://github.com/kubeflow/community/pull/673)

- [Update Documentation: Redirect Helm Chart Installation Links to Bflow Repository](https://github.com/kubeflow/spark-operator/issues/1929)

- [Update Release Workflows: Change Container Registry to Kubeflow's ghcr.io](https://github.com/kubeflow/spark-operator/issues/1930)

0 comments on commit 339630b

Please sign in to comment.