Skip to content

Add Polaris benchmarks that use the REST APIs #1208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

pingtimeout
Copy link
Contributor

This pull request introduces a comprehensive set of benchmarks to Polaris. The current set includes:

  • A benchmark that populates an empty Polaris server with a dataset that have predefined attributes
  • A benchmark that issues only read queries over that dataset
  • A benchmark that issues read and write queries (entity updates) over that dataset, with a configurable read/write ratio

A documentation if provided in the README.md file, including examples of the different datasets that can be generated. The datasets are procedural, which means that given the same input parameters, the same datasets will be generated, thus enabling reproducible benchmarks.

@pingtimeout
Copy link
Contributor Author

Related to #1120 although these are Gatling benchmarks, not JMeter.

Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files don't contain ASF header.
Would is possible to use svg instead of png (to avoid to package binary files in the source distribution) ?
Do we want to include this tool in the main polaris repo ? Maybe we can consider https://github.com/apache/polaris-tools ?

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I ran these benchmarks before)
This work is very useful not only to measure performance but also to identify problems and issues and bugs.

The code is pretty solid IMO - nice work!

A couple build related issues need to be fixed, but from my PoV it's ready then.

Comment on lines +32 to +37
spotless {
scala {
// Use scalafmt for Scala formatting
scalafmt("3.9.3").configFile("../.scalafmt.conf")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to have the spotless parts in polaris-java.gradle.kts. Adding polaris-server as a plugin would then pull that in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there should be a polaris-scala.gradle.kts file? I wouldn't mind adding polaris-server there, really. But the code does not strictly depend on Polaris server. It might actually be moved to the tools repository

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a PR against your branch...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or better: #1211

maxRetries: Int = 10,
retryableHttpCodes: Set[Int] = Set(409, 500)
) {
private val logger = LoggerFactory.getLogger(getClass)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the usual Scala nomenclature - should logger be all upper case to indicate that it's a constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a constant. It's an immutable field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this one is a bit confusing. As per the naming conventions, it is not a constant. Given that although it is an immutable, final field, it is not static.

And really, a logger should not be created in each instance of a class. So that's where the confusion comes from. To be completely correct, there should be a companion object CatalogActions where that constant is defined. But I think it would make the code unnecessarily more complex, especially considering that there should be only one instance of CatalogActions per simulation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true - Scala stuff. All good.

@@ -50,7 +50,7 @@ curl -s -H "Authorization: Bearer ${token}" \
"storageConfigInfo": {
"storageType": "FILE",
"allowedLocations": [
"file:///tmp"
"file:///tmp/polaris/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of. That one is a bit of a 🥷. The current getting-started code prevents any catalog to be created under /tmp/ altogether, even though a single catalog is created under /tmp/polaris. See L48 of this file. Updating the allowedLocations attribute to match that of the catalog makes it possible to created any number of catalogs under /tmp/.

@@ -36,6 +36,7 @@ aggregated-license-report=aggregated-license-report
polaris-immutables=tools/immutables
polaris-container-spec-helper=tools/container-spec-helper
polaris-version=tools/version
polaris-benchmarks=benchmarks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if this work should go under benchmarks/iceberg-rest so we can have all benchmarks grouped together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be. I am also considering removing the Iceberg mention from the benchmarks folder altogether as the benchmarks also relies on the Management API, and is not just an Iceberg REST API benchmark. Let's wait until we reach consensus on where the benchmarks should live before we move this around though.

snazy added a commit to snazy/polaris that referenced this pull request Mar 19, 2025
Adds build-infra related work for Scala projects, like apache#1208
@snazy snazy mentioned this pull request Mar 19, 2025
snazy added a commit to snazy/polaris that referenced this pull request Mar 19, 2025
Adds build-infra related work for Scala projects, like apache#1208
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SVGs also need a (C) header (it's effectively just XML)

@flyrain
Copy link
Contributor

flyrain commented Mar 19, 2025

Thanks @pingtimeout for the benchmark tools. It's pretty useful. Can we convert the code to Java code? I think we should avoid introducing a new language into the Polaris main repo. It increases complexity and reduce maintainability, especially considering the Scala compatibilities issue across multiple versions.

@dimas-b
Copy link
Contributor

dimas-b commented Mar 20, 2025

Scala is natural to Gatling. I do not see it as a blocker for adopting a well-written tool (like this PR)

@snazy
Copy link
Member

snazy commented Mar 20, 2025

(same comment as in #1211)

So, let me summarize:
a) the project wants a Spark plugin, and Spark relies on Scala
b) the project wants the benchmarks, and those run on Gatling, ẃhich relies on Scala

I'm not a Scala fan, but for Spark + Gatling there's just no way around that.

@pingtimeout
Copy link
Contributor Author

@flyrain let's continue this discussion on the dev mailing list and keep the PR comments for actual code review. JB has already raised the question of the best location for this tool (polaris or polaris-tools) and no decision has been made yet. Depending on the decision, that could answer your comment "I think we should avoid introducing a new language into the Polaris main repo.".

@eric-maynard
Copy link
Contributor

eric-maynard commented Mar 21, 2025

b) the project wants the benchmarks, and those run on Gatling, ẃhich relies on Scala

You can write gatling benchmarks in a language other than Scala.

There are also frameworks other than gatling.

@flyrain
Copy link
Contributor

flyrain commented Mar 21, 2025

@pingtimeout , I'm not against a new language like Scala, my main concern is how we manage the complexity. I'm supportive on the idea of putting this tool in a different repo, either Polaris-tools or other one if needed. This is also what a lot of other OSS projects do for implementations and tools in different languages.
An alternative is converting it in Java might also not a bad idea, we could avoid Scala version issues largely, while still keep the tool within one repo.

@pingtimeout
Copy link
Contributor Author

@flyrain In case it can help, Gatling is only compatible and compiled against Scala 2.13. There is no other supported version whatsoever. So assuming we only use Gatling (as opposed to creating Gatling plugins), the integration should be limited to pointing to the right executable.

@eric-maynard @snazy @dimas-b I just sent a response on the Polaris benchmarks proposal thread on the dev ML. This is a discussion that should happen outside of PR comments. Let's have it there please.

@pingtimeout pingtimeout closed this Apr 1, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Apr 1, 2025
@pingtimeout
Copy link
Contributor Author

Closed this PR and opened a new one against the polaris-tools repository: apache/polaris-tools#2

@pingtimeout pingtimeout deleted the persistence-benchmarks branch April 2, 2025 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants