-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark for comparing the merge/galloping join algorithm with the hash join algorithm #956
Conversation
…ning of the benchmark infrastructure development and then became deprecated.
…hmark and tried updating it to work with the new version of the benchmark infrastructure.
…ark configuration.
…c benchmark table generation into configuration options.
…n a template function.
…esult as a table column.
…Added getter for the size of the table.
…t to a ResultTable.
…now no longer creates names, but takes them as an argument.
…mplementation and added the general point in time, in which the benchmark measuring was finished, to the general metadata.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #956 +/- ##
==========================================
- Coverage 88.85% 88.85% -0.01%
==========================================
Files 326 326
Lines 28926 28926
Branches 3205 3205
==========================================
- Hits 25703 25702 -1
Misses 2070 2070
- Partials 1153 1154 +1 ☔ View full report in Codecov by Sentry. |
…ames to fit standard.
…iers to make things more readable.
…tures with explicit captures.
…rsion with explicit conversion.
…ames to fit standard.
…ing the time to a string in accordance with Sonarclouds wishes.
…here the smaller table keeps the same size and the bigger table grows. Now there are multiple benchmark tables, each with a different number of rows in the smaller table.
…sing, inside the master branch.
…setting the minimum amount of rows in the smaller table.
…arking classes for debugging purpose.
…e benchmarking classes for debugging purpose." This reverts commit aeef1f1.
… memory has to be big enoughfor the min number of rows given via configuration options.
…n [1,10] to the row ratios, used by the benchmark.
|
/* | ||
The number of rows. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* | |
The number of rows. | |
*/ | |
// The number of rows. |
/* | ||
The number of columns. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
/* | ||
If nobody played around with the private member variables, every row | ||
should have the same amount of columns and there should be AT LEAST one row, | ||
and one column. So we can just return the length of the first row. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* | |
If nobody played around with the private member variables, every row | |
should have the same amount of columns and there should be AT LEAST one row, | |
and one column. So we can just return the length of the first row. | |
*/ | |
// Every row has the same number of columns, so we just return the length of the first row. |
Note: This will cause an exception, if the row and column is bigger | ||
than the table. However, such a situation should only happen, if this | ||
function was called with the wrong column numbers, in which case it | ||
ain't our problem and can only be fixed by the user. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace this Note
by the three AD_CONTRACT_CHECK(columnTo... < table->numColumns())
.
// Author of the file this file is based on: Björn Buchhold | ||
// ([email protected]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Author of the file this file is based on: Björn Buchhold | |
// ([email protected]) |
(I think you wrote this one all by yourself, didn't you?)
benchmark/JoinAlgorithmBenchmark.cpp
Outdated
static void createOverlapRandomly(IdTableAndJoinColumn* const smallerTable, | ||
const IdTableAndJoinColumn& biggerTable, | ||
const double probabilityToCreateOverlap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this helper should be in the same helper/util
file that also creates random IdTable
s.
And all of those should also work with multiple join columns to make life easier later when we also benchmark join on multiple columns.
This requires basically changing IdTableAndJoinColumn
to IdTableAndJoinColumns
etc.
benchmark/JoinAlgorithmBenchmark.cpp
Outdated
*/ | ||
void parseConfiguration(const BenchmarkConfiguration& config) { | ||
numberRows = | ||
config.getValueByNestedKeys<size_t>("numberRows").value_or(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rather numRows
. numberRows
sounds strange and numberOfRows
is too long:)
An exhaustive benchmark for the comparison of the two implementations that also showcases the usage of our benchmarking library.