Add Polaris benchmarks that use the REST APIs #1208

pingtimeout · 2025-03-19T14:40:06Z

This pull request introduces a comprehensive set of benchmarks to Polaris. The current set includes:

A benchmark that populates an empty Polaris server with a dataset that have predefined attributes
A benchmark that issues only read queries over that dataset
A benchmark that issues read and write queries (entity updates) over that dataset, with a configurable read/write ratio

A documentation if provided in the README.md file, including examples of the different datasets that can be generated. The datasets are procedural, which means that given the same input parameters, the same datasets will be generated, thus enabling reproducible benchmarks.

pingtimeout · 2025-03-19T14:51:50Z

Related to #1120 although these are Gatling benchmarks, not JMeter.

jbonofre

Some files don't contain ASF header.
Would is possible to use svg instead of png (to avoid to package binary files in the source distribution) ?
Do we want to include this tool in the main polaris repo ? Maybe we can consider https://github.com/apache/polaris-tools ?

.scalafmt.conf

benchmarks/README.md

benchmarks/src/gatling/resources/logback-test.xml

snazy

(I ran these benchmarks before)
This work is very useful not only to measure performance but also to identify problems and issues and bugs.

The code is pretty solid IMO - nice work!

A couple build related issues need to be fixed, but from my PoV it's ready then.

.scalafmt.conf

benchmarks/build.gradle.kts

snazy · 2025-03-19T14:48:42Z

benchmarks/build.gradle.kts

+spotless {
+    scala {
+        // Use scalafmt for Scala formatting
+        scalafmt("3.9.3").configFile("../.scalafmt.conf")
+    }
+}


It's probably better to have the spotless parts in polaris-java.gradle.kts. Adding polaris-server as a plugin would then pull that in.

Do you think there should be a polaris-scala.gradle.kts file? I wouldn't mind adding polaris-server there, really. But the code does not strictly depend on Polaris server. It might actually be moved to the tools repository

Opened a PR against your branch...

Or better: #1211

benchmarks/docs/dataset-shape-1-1000-7.puml

benchmarks/src/gatling/resources/logback-test.xml

snazy · 2025-03-19T14:54:05Z

benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/CatalogActions.scala

+    maxRetries: Int = 10,
+    retryableHttpCodes: Set[Int] = Set(409, 500)
+) {
+  private val logger = LoggerFactory.getLogger(getClass)


Not sure about the usual Scala nomenclature - should logger be all upper case to indicate that it's a constant?

It's not a constant. It's an immutable field

Indeed, this one is a bit confusing. As per the naming conventions, it is not a constant. Given that although it is an immutable, final field, it is not static.

And really, a logger should not be created in each instance of a class. So that's where the confusion comes from. To be completely correct, there should be a companion object CatalogActions where that constant is defined. But I think it would make the code unnecessarily more complex, especially considering that there should be only one instance of CatalogActions per simulation.

Ah, true - Scala stuff. All good.

snazy · 2025-03-19T14:58:34Z

getting-started/assets/polaris/create-catalog.sh

@@ -50,7 +50,7 @@ curl -s -H "Authorization: Bearer ${token}" \
       "storageConfigInfo": {
         "storageType": "FILE",
         "allowedLocations": [
-           "file:///tmp"
+           "file:///tmp/polaris/"


Unrelated change?

Kind of. That one is a bit of a 🥷. The current getting-started code prevents any catalog to be created under /tmp/ altogether, even though a single catalog is created under /tmp/polaris. See L48 of this file. Updating the allowedLocations attribute to match that of the catalog makes it possible to created any number of catalogs under /tmp/.

gradle/libs.versions.toml

snazy · 2025-03-19T14:59:32Z

gradle/projects.main.properties

@@ -36,6 +36,7 @@ aggregated-license-report=aggregated-license-report
 polaris-immutables=tools/immutables
 polaris-container-spec-helper=tools/container-spec-helper
 polaris-version=tools/version
+polaris-benchmarks=benchmarks


Wonder if this work should go under benchmarks/iceberg-rest so we can have all benchmarks grouped together.

It could be. I am also considering removing the Iceberg mention from the benchmarks folder altogether as the benchmarks also relies on the Management API, and is not just an Iceberg REST API benchmark. Let's wait until we reach consensus on where the benchmarks should live before we move this around though.

benchmarks/README.md

Co-authored-by: Robert Stupp <[email protected]>

Adds build-infra related work for Scala projects, like apache#1208

snazy · 2025-03-19T17:20:06Z

benchmarks/docs/dataset-shape-2-3-5.svg

The SVGs also need a (C) header (it's effectively just XML)

flyrain · 2025-03-19T23:13:22Z

Thanks @pingtimeout for the benchmark tools. It's pretty useful. Can we convert the code to Java code? I think we should avoid introducing a new language into the Polaris main repo. It increases complexity and reduce maintainability, especially considering the Scala compatibilities issue across multiple versions.

dimas-b · 2025-03-20T01:04:37Z

Scala is natural to Gatling. I do not see it as a blocker for adopting a well-written tool (like this PR)

snazy · 2025-03-20T07:07:04Z

(same comment as in #1211)

So, let me summarize:
a) the project wants a Spark plugin, and Spark relies on Scala
b) the project wants the benchmarks, and those run on Gatling, ẃhich relies on Scala

I'm not a Scala fan, but for Spark + Gatling there's just no way around that.

pingtimeout · 2025-03-20T09:00:09Z

@flyrain let's continue this discussion on the dev mailing list and keep the PR comments for actual code review. JB has already raised the question of the best location for this tool (polaris or polaris-tools) and no decision has been made yet. Depending on the decision, that could answer your comment "I think we should avoid introducing a new language into the Polaris main repo.".

eric-maynard · 2025-03-21T20:05:43Z

b) the project wants the benchmarks, and those run on Gatling, ẃhich relies on Scala

You can write gatling benchmarks in a language other than Scala.

There are also frameworks other than gatling.

flyrain · 2025-03-21T20:32:18Z

@pingtimeout , I'm not against a new language like Scala, my main concern is how we manage the complexity. I'm supportive on the idea of putting this tool in a different repo, either Polaris-tools or other one if needed. This is also what a lot of other OSS projects do for implementations and tools in different languages.
An alternative is converting it in Java might also not a bad idea, we could avoid Scala version issues largely, while still keep the tool within one repo.

pingtimeout · 2025-03-22T17:09:18Z

@flyrain In case it can help, Gatling is only compatible and compiled against Scala 2.13. There is no other supported version whatsoever. So assuming we only use Gatling (as opposed to creating Gatling plugins), the integration should be limited to pointing to the right executable.

@eric-maynard @snazy @dimas-b I just sent a response on the Polaris benchmarks proposal thread on the dev ML. This is a discussion that should happen outside of PR comments. Let's have it there please.

pingtimeout · 2025-04-01T15:34:55Z

Closed this PR and opened a new one against the polaris-tools repository: apache/polaris-tools#2

Add benchmarks that rely on the REST APIs

3958266

github-project-automation bot added this to Basic Kanban Board Mar 19, 2025

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Mar 19, 2025

Remove code unrelated to the benchmarks

bb164ba

Adjust diagrams to match description

1289402

jbonofre reviewed Mar 19, 2025

View reviewed changes

.scalafmt.conf Show resolved Hide resolved

benchmarks/README.md Show resolved Hide resolved

benchmarks/src/gatling/resources/logback-test.xml Show resolved Hide resolved

snazy reviewed Mar 19, 2025

View reviewed changes

pingtimeout and others added 5 commits March 19, 2025 16:27

Replace PNG images by SVG

42ebbba

Add ASF header

afc8dc9

Update benchmarks/README.md

8020e21

Co-authored-by: Robert Stupp <[email protected]>

Remove unnecessary files

c81d220

Add trailing linebreak

e19f09c

snazy added a commit to snazy/polaris that referenced this pull request Mar 19, 2025

Scala build infra

e398895

Adds build-infra related work for Scala projects, like apache#1208

snazy mentioned this pull request Mar 19, 2025

Scala build infra #1211

Closed

snazy added a commit to snazy/polaris that referenced this pull request Mar 19, 2025

Scala build infra

922961f

Adds build-infra related work for Scala projects, like apache#1208

snazy reviewed Mar 19, 2025

View reviewed changes

benchmarks/docs/dataset-shape-2-3-5.svg Outdated

Copy link

Member

snazy Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SVGs also need a (C) header (it's effectively just XML)

This was referenced Mar 19, 2025

Provide JMeter test cases #1120

Closed

Polaris Spark Plugin: setup repository code structure and build #1190

Merged

Add missing closing block delimiter

961861f

pingtimeout closed this Apr 1, 2025

github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Apr 1, 2025

pingtimeout deleted the persistence-benchmarks branch April 2, 2025 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Polaris benchmarks that use the REST APIs #1208

Add Polaris benchmarks that use the REST APIs #1208

pingtimeout commented Mar 19, 2025

pingtimeout commented Mar 19, 2025

jbonofre left a comment

snazy left a comment

snazy Mar 19, 2025

pingtimeout Mar 19, 2025

snazy Mar 19, 2025

snazy Mar 19, 2025

snazy Mar 19, 2025

collado-mike Mar 23, 2025

pingtimeout Mar 23, 2025

snazy Mar 24, 2025

snazy Mar 19, 2025

pingtimeout Mar 19, 2025

snazy Mar 19, 2025

pingtimeout Mar 19, 2025

snazy Mar 19, 2025

flyrain commented Mar 19, 2025

dimas-b commented Mar 20, 2025

snazy commented Mar 20, 2025

pingtimeout commented Mar 20, 2025

eric-maynard commented Mar 21, 2025 •

edited

Loading

flyrain commented Mar 21, 2025

pingtimeout commented Mar 22, 2025

pingtimeout commented Apr 1, 2025

Add Polaris benchmarks that use the REST APIs #1208

Add Polaris benchmarks that use the REST APIs #1208

Conversation

pingtimeout commented Mar 19, 2025

pingtimeout commented Mar 19, 2025

jbonofre left a comment

Choose a reason for hiding this comment

snazy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flyrain commented Mar 19, 2025

dimas-b commented Mar 20, 2025

snazy commented Mar 20, 2025

pingtimeout commented Mar 20, 2025

eric-maynard commented Mar 21, 2025 • edited Loading

flyrain commented Mar 21, 2025

pingtimeout commented Mar 22, 2025

pingtimeout commented Apr 1, 2025

eric-maynard commented Mar 21, 2025 •

edited

Loading