-
Notifications
You must be signed in to change notification settings - Fork 236
Add Polaris benchmarks that use the REST APIs #1208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Related to #1120 although these are Gatling benchmarks, not JMeter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some files don't contain ASF header.
Would is possible to use svg instead of png (to avoid to package binary files in the source distribution) ?
Do we want to include this tool in the main polaris repo ? Maybe we can consider https://github.com/apache/polaris-tools ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I ran these benchmarks before)
This work is very useful not only to measure performance but also to identify problems and issues and bugs.
The code is pretty solid IMO - nice work!
A couple build related issues need to be fixed, but from my PoV it's ready then.
spotless { | ||
scala { | ||
// Use scalafmt for Scala formatting | ||
scalafmt("3.9.3").configFile("../.scalafmt.conf") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably better to have the spotless parts in polaris-java.gradle.kts
. Adding polaris-server
as a plugin would then pull that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there should be a polaris-scala.gradle.kts
file? I wouldn't mind adding polaris-server
there, really. But the code does not strictly depend on Polaris server. It might actually be moved to the tools repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a PR against your branch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better: #1211
maxRetries: Int = 10, | ||
retryableHttpCodes: Set[Int] = Set(409, 500) | ||
) { | ||
private val logger = LoggerFactory.getLogger(getClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the usual Scala nomenclature - should logger
be all upper case to indicate that it's a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a constant. It's an immutable field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this one is a bit confusing. As per the naming conventions, it is not a constant. Given that although it is an immutable, final field, it is not static.
And really, a logger should not be created in each instance of a class. So that's where the confusion comes from. To be completely correct, there should be a companion object CatalogActions
where that constant is defined. But I think it would make the code unnecessarily more complex, especially considering that there should be only one instance of CatalogActions
per simulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, true - Scala stuff. All good.
@@ -50,7 +50,7 @@ curl -s -H "Authorization: Bearer ${token}" \ | |||
"storageConfigInfo": { | |||
"storageType": "FILE", | |||
"allowedLocations": [ | |||
"file:///tmp" | |||
"file:///tmp/polaris/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of. That one is a bit of a 🥷. The current getting-started code prevents any catalog to be created under /tmp/
altogether, even though a single catalog is created under /tmp/polaris
. See L48 of this file. Updating the allowedLocations attribute to match that of the catalog makes it possible to created any number of catalogs under /tmp/
.
@@ -36,6 +36,7 @@ aggregated-license-report=aggregated-license-report | |||
polaris-immutables=tools/immutables | |||
polaris-container-spec-helper=tools/container-spec-helper | |||
polaris-version=tools/version | |||
polaris-benchmarks=benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if this work should go under benchmarks/iceberg-rest
so we can have all benchmarks grouped together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be. I am also considering removing the Iceberg
mention from the benchmarks folder altogether as the benchmarks also relies on the Management API, and is not just an Iceberg REST API benchmark. Let's wait until we reach consensus on where the benchmarks should live before we move this around though.
Co-authored-by: Robert Stupp <[email protected]>
Adds build-infra related work for Scala projects, like apache#1208
Adds build-infra related work for Scala projects, like apache#1208
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SVGs also need a (C) header (it's effectively just XML)
Thanks @pingtimeout for the benchmark tools. It's pretty useful. Can we convert the code to Java code? I think we should avoid introducing a new language into the Polaris main repo. It increases complexity and reduce maintainability, especially considering the Scala compatibilities issue across multiple versions. |
Scala is natural to Gatling. I do not see it as a blocker for adopting a well-written tool (like this PR) |
(same comment as in #1211) So, let me summarize: I'm not a Scala fan, but for Spark + Gatling there's just no way around that. |
@flyrain let's continue this discussion on the dev mailing list and keep the PR comments for actual code review. JB has already raised the question of the best location for this tool (polaris or polaris-tools) and no decision has been made yet. Depending on the decision, that could answer your comment "I think we should avoid introducing a new language into the Polaris main repo.". |
You can write gatling benchmarks in a language other than Scala. There are also frameworks other than gatling. |
@pingtimeout , I'm not against a new language like Scala, my main concern is how we manage the complexity. I'm supportive on the idea of putting this tool in a different repo, either |
@flyrain In case it can help, Gatling is only compatible and compiled against Scala 2.13. There is no other supported version whatsoever. So assuming we only use Gatling (as opposed to creating Gatling plugins), the integration should be limited to pointing to the right executable. @eric-maynard @snazy @dimas-b I just sent a response on the |
Closed this PR and opened a new one against the polaris-tools repository: apache/polaris-tools#2 |
This pull request introduces a comprehensive set of benchmarks to Polaris. The current set includes:
A documentation if provided in the README.md file, including examples of the different datasets that can be generated. The datasets are procedural, which means that given the same input parameters, the same datasets will be generated, thus enabling reproducible benchmarks.