Skip to content

Create a single binary distribution bundle #1589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
May 27, 2025

Conversation

flyrain
Copy link
Contributor

@flyrain flyrain commented May 14, 2025

What’s included in the PR:

  1. Introduced a new module that combines both Admin Tools and Server. Please refer to my original email for the motivation behind this change.
  2. Removed the now-redundant run-scripts module.
  3. Consolidated the README to reflect the unified binary distribution.
  4. Standardized the binary distribution package naming to polaris-{version}.tgz and polaris-{version}.zip, following common conventions used by other projects (e.g., spark-3.5.5-bin-hadoop3.tgz).

TODOs:

  • Consolidate LICENSE and NOTICE files from both Admin Tools and Server.
  • Consolidate the shared libs between Admin Tool and Server

The PR is technically ready, but I plan to wait until the 0.10 release is finalized to avoid disrupting the release process.

@jbonofre
Copy link
Member

@flyrain thanks for this draft PR ! Just a note: we should keep the changes proposed in #1588 and #1568 . I will double check here.

@flyrain
Copy link
Contributor Author

flyrain commented May 14, 2025

@flyrain thanks for this draft PR ! Just a note: we should keep the changes proposed in #1588 and #1568 . I will double check here.

Yes, I added the disclaimer in the new commit.

├── README.md
├── admin/ # Admin tool files
├── server/ # Server files
└── run.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark was mentioned as an example, but IIRC, Spark's distribution has a bin directory for runnable files... Why not put run.sh into bin in Polaris too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of reasons Spark has a dedicated bin directory is due to the number of items under the root directory like the following. Our distribution is different though, we don't have a lot of top level items, but I'm OK either way.

drwxr-xr-x@   3 ygu  staff    96B Aug  6  2024 yarn/
drwxr-xr-x@  31 ygu  staff   992B Aug  6  2024 sbin/
-rw-r--r--@   1 ygu  staff   166B Aug  6  2024 RELEASE
-rw-r--r--@   1 ygu  staff   4.5K Aug  6  2024 README.md
drwxr-xr-x@   3 ygu  staff    96B Aug  6  2024 R/
drwxr-xr-x@  19 ygu  staff   608B Aug  6  2024 python/
-rw-r--r--@   1 ygu  staff    56K Aug  6  2024 NOTICE
drwxr-xr-x@  58 ygu  staff   1.8K Aug  6  2024 licenses/
-rw-r--r--@   1 ygu  staff    22K Aug  6  2024 LICENSE
drwxr-xr-x@   4 ygu  staff   128B Aug  6  2024 kubernetes/
drwxr-xr-x@   4 ygu  staff   128B Aug  6  2024 examples/
drwxr-xr-x@   6 ygu  staff   192B Aug  6  2024 data/
drwxr-xr-x@   8 ygu  staff   256B Aug  6  2024 conf/
drwxr-xr-x@  30 ygu  staff   960B Aug  6  2024 bin/
drwxr-xr-x@ 255 ygu  staff   8.0K Oct 14  2024 jars/

Copy link
Contributor

@dimas-b dimas-b May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with that rationale. I believe bin directories have roots in the old *nix tradition as the location of runnable program files.

If having a shared bin does not look convenient, how about admin/bin and server/bin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If having a shared bin does not look convenient, how about admin/bin and server/bin?

Please disregard this proposal.

Since this PR provides a single distribution for all Polaris binaries, I think having one bin directory is preferable from the UX perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will add a bin directory under the root. I think the executables are buried too deep if we follow the proposal here, #1589 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed in the new commit


description = "Apache Polaris Binary Distribution"

val adminProject = project(":polaris-quarkus-admin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we do this change, I would suggest to rename polaris-quarkus-admin and polaris-quarkus-server as polaris-admin and polaris-server, and also maybe rename quarkus folder as dist folder or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I don't think we need quarkus within the module name. Quarkus is the way in Polaris, not a way. Adding Quarkus into name isn't necesary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I think it makes sense to do it as part of this PR (for consistency).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to polaris-admin and polaris-server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Let me do it as a follow-up.

snazy
snazy previously requested changes May 15, 2025
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change uses a bunch of highly discouraged Gradle practices:

  • It's accessing another Gradle projects and those tasks - that's bad practice and highly discouraged.
  • This leads to task-dependency and in turn build and Gradle caching issues

Please take a look at Gradle artifacts and configurations and defining portable artifact dependencies including the proper task dependencies implied by these mechanisms.

@@ -0,0 +1,69 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this file is placed under quarkus/distribution/distribution – is that intentional? I'd advocate for moving it one level up (it would become the README for the distribution module, but I think that's a good thing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional, which follows the same structure in admin and server module.

@adutra
Copy link
Contributor

adutra commented May 15, 2025

@flyrain just to confirm, we are not planning to remove the individual distributions for server and tool, are we?

@flyrain
Copy link
Contributor Author

flyrain commented May 15, 2025

@flyrain just to confirm, we are not planning to remove the individual distributions for server and tool, are we?

Good question. We will remove the individual distribution once the single binary distribution is ready. Their jar file will still be published individually though.

@adutra
Copy link
Contributor

adutra commented May 15, 2025

@flyrain just to confirm, we are not planning to remove the individual distributions for server and tool, are we?

Good question. We will remove the individual distribution once the single binary distribution is ready. Their jar file will still be published individually though.

Are we sure about this? I assume that most people downloading the distribution would be interested in the server, not the tool. Imposing the combined distribution would mean for them to download a bunch of jars they don't need. And the opposite is true as well: if I only need the tool, why would I need to download 200Mb of server jars?

@dimas-b
Copy link
Contributor

dimas-b commented May 16, 2025

With real (not in-memory) persistence the admin tool is pretty much always required for proper bootstrapping, I guess. So it makes sense to bundle it with the server.

As to whether to keep a separate downloadable archive just for the admin tool, I suppose it may be useful only if the server runs in docker. However, in that case, it might be preferable to have a separate docker image for the admin tool and have only the combined archive-based distribution. WDYT?

@adutra
Copy link
Contributor

adutra commented May 16, 2025

it might be preferable to have a separate docker image for the admin tool and have only the combined archive-based distribution. WDYT?

If that's the plan I can live with that 👍

@snazy
Copy link
Member

snazy commented May 19, 2025

@flyrain just to confirm, we are not planning to remove the individual distributions for server and tool, are we?

Good question. We will remove the individual distribution once the single binary distribution is ready. Their jar file will still be published individually though.

Are we sure about this? I assume that most people downloading the distribution would be interested in the server, not the tool. Imposing the combined distribution would mean for them to download a bunch of jars they don't need. And the opposite is true as well: if I only need the tool, why would I need to download 200Mb of server jars?

I second @adutra's concern. Polaris server and admin-tool are separate things. You deploy the server and let it run in k8s (or whatever). Bootstrapping happens rarely and is rather performed from an administrator's machine, whereas the servers run elsewhere. So server and admin-tool are run in very different ways.

The admin-tool is (at best) just overhead in a server and increases the size of the image + container. Playing devil's advocate here, having the admin-tool executable accessible from the server gives easy access to the purge functionality. So if you're (in whatever way) able to instruct the Polaris server container to start an arbitrary executable, it could nuke existing data.

Yufei Gu added 12 commits May 22, 2025 10:45
This script combines the Polaris admin tool and server distributions into a single package: Maintains separate admin and server components, provides a unified run script to launch either component, preserves all necessary dependencies and configurations, and simplifies deployment by having both components in one distribution.
… run script - Remove repository configuration from plugins, restore run.sh script, fix build issues
…rojects.main.properties - Remove run-script dependency from admin and server modules - Fix code formatting in build files
@flyrain
Copy link
Contributor Author

flyrain commented May 22, 2025

This change uses a bunch of highly discouraged Gradle practices:

  • It's accessing another Gradle projects and those tasks - that's bad practice and highly discouraged.
  • This leads to task-dependency and in turn build and Gradle caching issues

Please take a look at Gradle artifacts and configurations and defining portable artifact dependencies including the proper task dependencies implied by these mechanisms.

This is fixed in the new commit, cc @snazy.

@flyrain
Copy link
Contributor Author

flyrain commented May 23, 2025

Local test passed. Here are the structure now in the binary bundle. All comments should be addressed. Please take a look @jbonofre @dimas-b @adutra @snazy @singhpk234. Consolidation of LICENSE and NOTICE files from both Admin Tools and Server will be done in a follow-up PR.

polaris-quarkus-distribution-@version@/
├── LICENSE
├── NOTICE
├── README.md
├── admin/           # Admin tool files
├── bin/             # Executable scripts
│   ├── admin
│   └── server
└── server/          # Server files

@jbonofre
Copy link
Member

@flyrain I will do the LICENSE/NOTICE/DISCLAIMER work on another PR.

@jbonofre
Copy link
Member

By the way, as we still have separate docker images, we also have to keep LICENSE/NOTICE/DISCLAIMER in both admin and server.

@flyrain
Copy link
Contributor Author

flyrain commented May 23, 2025

By the way, as we still have separate docker images, we also have to keep LICENSE/NOTICE/DISCLAIMER in both admin and server.

Thanks @jbonofre ! I intentionally left them in these two modules in this PR, but I'd love to figure out a way to have a combined one later.

@jbonofre
Copy link
Member

@flyrain you are good here. My previous comment is "for information" (to explain the reason why we have to keep those files :)).

@jbonofre
Copy link
Member

Any artifact we distribute should have license/notice/disclaimer


## Prerequisites

- Java SE 21 or higher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just say 21 here, please ref thread here : #1517 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that Polaris target jdk version is 21,

tasks.withType(JavaCompile::class.java).configureEach { options.release = 21 }
. However, this is a binary distribution. Users should only care about the runtime jdk version here. I think it's the right call to say 21+ here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 21+ is fine. We can always adjust docs if we hit incompatibilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, 21+ is fine.

exec "${JAVACMD}" -jar "${script_dir}/quarkus-run.jar" $@
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "${SCRIPT_DIR}/../admin" || exit
java -jar quarkus-run.jar "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "${SCRIPT_DIR}/../server" || exit
java -jar quarkus-run.jar "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@singhpk234
Copy link
Contributor

we still have separate docker images

I have a doubt in this area, and would like to understand this more, why we need two separate image if we agree on same binary distribution ?
My understanding is, When an admin tool is supposed to run, the DB user needs to have a different set of privileges (CREATE TABLE and DROP) rather than what DB user a server need to be functional ? Is the concern that it might be confusing for the user and they forget they need to do this and might end up giving the same ?

@flyrain
Copy link
Contributor Author

flyrain commented May 23, 2025

we still have separate docker images

I have a doubt in this area, and would like to understand this more, why we need two separate image if we agree on same binary distribution ? My understanding is, When an admin tool is supposed to run, the DB user needs to have a different set of privileges (CREATE TABLE and DROP) rather than what DB user a server need to be functional ? Is the concern that it might be confusing for the user and they forget they need to do this and might end up giving the same ?

I also lean to a single docker image, which makes users and developers life much easier. We just need to a different entry for different functionality. As the comment pointed out, size shouldn't be a problem. But I'd suggest to move the discussion to the ML thread, https://lists.apache.org/thread/h2zxo9ho2xl6l2dpsyovllwjgv8hqp08

@dimas-b
Copy link
Contributor

dimas-b commented May 23, 2025

why we need two separate image if we agree on same binary distribution ?

Docker images define what runs in a container. Normally there's only one entry point. Therefore, it is more natural to have one image per tool / executable (rather than controlling what actually runs via args or cryptic env. vars, etc.).

That said, reducing image size is also valuable. A user running 100 server replicas and 1 admin tool probably would not want the extra burden of 100 admin tool binaries replicated wherever servers run.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, but it looks like README and scripts are not aligned 🤷

}
}
}
add("distributionElements", layout.buildDirectory.dir("quarkus-app")) { builtBy("quarkusBuild") }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After untar:

$ ls server/
app  lib  logs  quarkus  quarkus-app-dependencies.txt  quarkus-run.jar

Is quarkus-app-dependencies.txt intended to be in the distro? (same for admin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with it. Any concern about the file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is noise and distraction from my POV. Not useful to end users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in the new commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we exclude quarkus-app-dependencies.txt here , rather than in the main dist module?


```bash
# Configure server port
JAVA_OPTS="-Dquarkus.http.port=8080" ./bin/server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe env JAVA_OPTS="-Dquarkus.http.port=8080" bin/server is more portable. IIRC, current syntax assumes running in sh or bash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also suggest using env. vars like POLARIS_JAVA_OPTS so that they could be saved in local env. without affecting other applications.

Copy link
Contributor Author

@flyrain flyrain May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, it doesn't work with the current script anymore. Removed this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for POLARIS_JAVA_OPTS. All POSIX sh should work well without env, given this is a readme doc, I think it should be good.

JAVA_OPTS="-Dquarkus.http.port=8080" ./bin/server

# Configure admin tool
JAVA_OPTS="-Dpolaris.persistence.type=relational-jdbc" ./bin/admin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think JAVA_OPTS work with the latest scripts 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this config example got removed from the README. How is the user supposed to pass config options to the server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only options 2 and 3 are applicable when using these distribution's scripts. It would be nice to mention that in the README and perhaps include an sample config/application.properties in the archive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me fix it, I think we should allow user to use option 1 as well.
option 4(The application.properties files packaged in Polaris.) should also work, right? User will need to put a application.properties file in the classpath.

Copy link
Contributor

@dimas-b dimas-b May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think users of the distribution bundle should put extra application.properties on the class path. The behaviour will be very uncertain in that case (with respect to the built-in application.properties in Polaris jars).

### Start the Server

```bash
./bin/server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bin/server (./ is not necessary, PATH is not used anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -58,12 +58,16 @@ distributions {
contents {
// Copy admin distribution contents
into("admin") {
from(adminDistribution)
from(adminDistribution) {
exclude("quarkus-app-dependencies.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably preferable to do this excluding in the server and admin modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to exclude it in server and admin module. Didn't find a clean solution. It may not be worth it. Let me know if you got better solutioin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is sub-optimal from the gradle build perspective, but I do not have a solution ATM, so I'm fine merging this change "as is" unless someone has a solution handy.

@dimas-b
Copy link
Contributor

dimas-b commented May 23, 2025

@snazy : gradle script changes LGTM now... WDYT?


```bash
# Configure server port
POLARIS_JAVA_OPTS="-Dquarkus.http.port=8080" bin/server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: env POLARIS_JAVA_OPTS="-Dquarkus.http.port=8080" bin/server is more portable, AFAIK.

}
}
}
add("distributionElements", layout.buildDirectory.dir("quarkus-app")) { builtBy("quarkusBuild") }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we exclude quarkus-app-dependencies.txt here , rather than in the main dist module?

@@ -58,12 +58,16 @@ distributions {
contents {
// Copy admin distribution contents
into("admin") {
from(adminDistribution)
from(adminDistribution) {
exclude("quarkus-app-dependencies.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is sub-optimal from the gradle build perspective, but I do not have a solution ATM, so I'm fine merging this change "as is" unless someone has a solution handy.

@flyrain
Copy link
Contributor Author

flyrain commented May 27, 2025

The only solution I found for excluding quarkus-app-dependencies.txt from server and admin module is to zip it, in which case, the distribution module has to unzip it. I don't think it's worth to add additional zip and unzip process. My concern is that it is a complicated solution with worse build perf.

val quarkusAppZip by
  tasks.registering(Zip::class) {
    dependsOn(quarkusBuild)
    archiveClassifier.set("dist") 
    from(layout.buildDirectory.dir("quarkus-app")) {
      exclude("quarkus-app-dependencies.txt")
    }
  }

artifacts {
  add(quarkusRunner.name, provider { quarkusBuild.get().fastJar.resolve("quarkus-run.jar") }) {
    builtBy(quarkusBuild)
  }

  add("distributionElements", quarkusAppZip) {
    builtBy(quarkusAppZip) 
  }
}

@flyrain flyrain dismissed snazy’s stale review May 27, 2025 17:18

Comments have been resolved. The module uses artifact dependency now.

@flyrain flyrain merged commit 9001785 into apache:main May 27, 2025
9 checks passed
@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board May 27, 2025
@flyrain
Copy link
Contributor Author

flyrain commented May 27, 2025

Thanks @jbonofre @adutra @snazy @singhpk234 @dimas-b for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants