From 339630b60214eacae30217801dcaec0a5230e81d Mon Sep 17 00:00:00 2001 From: Vara Bonthu Date: Sun, 14 Apr 2024 00:46:19 -0700 Subject: [PATCH] Kubeflow spark operator launch blog Signed-off-by: Vara Bonthu --- _posts/2024-04-15-kubeflow-spark-operator.md | 138 +++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 _posts/2024-04-15-kubeflow-spark-operator.md diff --git a/_posts/2024-04-15-kubeflow-spark-operator.md b/_posts/2024-04-15-kubeflow-spark-operator.md new file mode 100644 index 0000000..e1630f1 --- /dev/null +++ b/_posts/2024-04-15-kubeflow-spark-operator.md @@ -0,0 +1,138 @@ +--- +title: "Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community" +layout: post +toc: false +comments: true +image: +hide: false +categories: [operators] +permalink: +author: "Vara Bonthu, Chaoran Yu, Andrey Velichkevich, Marcin Wielgus" +--- + +We're excited to announce the migration of Google's Spark Operator to +the [Kubeflow Spark Operator](https://github.com/kubeflow/spark-operator), +marking the launch of a significant addition to the [Kubeflow](https://www.kubeflow.org/) ecosystem. The +Kubeflow Spark Operator simplifies the deployment and management of +[Apache +Spark](https://spark.apache.org/docs/latest/index.html) +applications on [Kubernetes](https://kubernetes.io/). This +announcement isn't just about a new piece of technology, it's about +building a stronger, open-governed, and more collaborative community +around Spark on Kubernetes. + +## The Journey to Kubeflow Spark Operator + +The journey of the Kubeflow Spark Operator began with Google Cloud +Platform's Spark on Kubernetes Operator +(https://cloud.google.com/blog/products/data-analytics/data-analytics-meet-containers-kubernetes-operator-for-apache-spark-now-in-beta). +With over 2.3k stars and 1.3k forks on GitHub, this project laid the +foundation for a robust Spark on Kubernetes experience, enabling users +to deploy Spark workloads seamlessly across Kubernetes clusters. + +Growth and innovation require not just code but also community. +Acknowledging the resource and time limitations faced by Google Cloud's +original maintainers, Kubeflow has taken up the mantle.This transition +is not merely administrative but a strategic move towards fostering a +vibrant, diverse, and more actively engaged community. + +## Why Kubeflow? + +- **Enhanced Community Engagement:** Transitioning to Kubeflow opens + the door to a broader developer base, encouraging contributions and + collaboration. Since Kubeflow is a CNCF incubating project this + transition will help consolidate Cloud Native and Spark communities + to work more closely to build robust infrastructure to run Spark + applications on Kubernetes. + +- **Stronger Governance**: Kubeflow's governance model provides a + structured environment for decision-making and project management, + ensuring sustainable growth for the Spark Operator. + +- **Unified Ecosystem**: By bringing the Spark Operator under the + Kubeflow umbrella, we're not just merging projects; we're building + a cohesive ecosystem that enhances the Spark on Kubernetes + experience. + +- **Integration with AI/ML:** Kubeflow provides several components to + address many stages of the AI/ML lifecycle. The Spark distributed + data processing capabilities are a natural expansion, allowing the + Spark community to closely collaborate and better integrate within + the end-to-end ML lifecycle. + +## What's Next? + +We are dedicated to not just maintaining but enhancing the Kubeflow +Spark Operator for the long term. Here's what you can look forward to: + +- **Upcoming roadmap**: As part of the first release, we aim to update + the documentation with references to Kubeflow, address GitHub + workflow issues, and update the container registry with Kubeflow, + along with any other critical issues. + +- **Ongoing Support and Enhancements**: At the time of migration to + the Kubeflow repository, the repository comprised 450+ issues and + 60+ pull requests. We kindly request contributors to rebase their + code and update the PR with a comment indicating its continued + relevance. As for open issues, they will be considered for + resolution as the broader community and contributors engage in + upcoming releases.The operator will continue to evolve, + incorporating new features and improvements to stay at the forefront + of Kubernetes deployments. + +- **Rich Community Resources**: From detailed documentation to + hands-on tutorials, we're crafting resources to help you succeed + with the Spark Operator. We are planning to host regular Spark + Operator calls to discuss users issues, questions, and future + roadmaps. + +- **Open Doors for Contributions**: This is a call to arms for + developers, writers, and enthusiasts! Your contributions are the + lifeblood of this project, and there's a place for everyone to make + a mark. + +- **Kubeflow Working Group Data:** To consolidate efforts around new + data tools in the Kubeflow ecosystem such as Spark Operator and + Model Registry the new Working Group Data will be formalized soon. + Feel free to review [this PR](https://github.com/kubeflow/community/pull/673) to + get involved and provide your feedback on the charter. + +## Join the Movement + +The Kubeflow Spark Operator is more than just software. It's a +community endeavor. Here's how you can be a part of this journey: + +- **Dive In**: Visit our [GitHub repository](https://github.com/kubeflow/spark-operator) + to start your journey with the Kubeflow Spark Operator. + +- **Contribute**: Every code snippet, documentation update, and piece + of feedback counts. Find out how you can contribute on GitHub. + +- **Be Part of the Community**: Join the conversation in the + [#kubeflow-spark-operator](https://kubeflow.slack.com/archives/C06627U3XU3) + channel on Kubeflow Slack. Whether you're seeking advice, sharing + insights, or just listening in, your presence enriches us. Follow + [this guide](https://www.kubeflow.org/docs/about/community/) + to join Kubeflow Slack and learn more about Kubeflow community. + +- **Spark Operator Community Call:** We are planning to host regular + community calls for Spark Operator questions and roadmap. Please + help us to find the best time by taking this poll: + [https://forms.gle/foTkwkuv3U4gk7M77](https://forms.gle/foTkwkuv3U4gk7M77) + +In the spirit of collaboration fostered on platforms like Slack, and +with the generous support of the Google Cloud team, we're set to sail +into a promising future. The Kubeflow Spark Operator isn't just a tool, +it's our collective step towards harnessing the true potential of Spark +on Kubernetes. Together, let's shape the future of cloud-native big +data processing. + +**_Reference Issues_** + +- [Action items for adoption of Spark Kubernetes Operator in Kubeflow](https://github.com/kubeflow/spark-operator/issues/1928#issue-2066490838) + +- [WG Data(name provisional)proposal](https://github.com/kubeflow/community/pull/673) + +- [Update Documentation: Redirect Helm Chart Installation Links to Bflow Repository](https://github.com/kubeflow/spark-operator/issues/1929) + +- [Update Release Workflows: Change Container Registry to Kubeflow's ghcr.io](https://github.com/kubeflow/spark-operator/issues/1930) \ No newline at end of file