Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial refactoring to use sane OSITrace reader #70

Merged
merged 2 commits into from
May 15, 2024

Conversation

pmai
Copy link
Contributor

@pmai pmai commented Apr 14, 2024

This is currently a mostly minimally invasive attempt to sanitize the trace input handling of osi-validation. It rips out all of the old complexity, which does not give any performance benefits as far as I can tell, and replaces it with very streamlined sequential I/O. This simple code is currently around 4x as fast as the original code on my limited testing, and just uses the new OSI osi3trace module.

It is for the most part bug-for-bug and to some extent performance-bug-for-performance-bug compatible with the old code (except for the 4x speed up), as it tries to not touch the other code that desperately needs refactoring. This will have to be handled in seperate PRs, with the log handling an obvious next candidate, as that is currently mostly performance limiting.

This also does not update the referenced submodule right now, as that needs other fixes that are part of e.g. PR #61.

@pmai pmai self-assigned this Apr 14, 2024
@pmai pmai force-pushed the refactor/simplify-trace-handling branch from 958206e to 2db9449 Compare April 14, 2024 13:41
@pmai pmai force-pushed the refactor/simplify-trace-handling branch from 2db9449 to ce4ec62 Compare April 14, 2024 14:27
@jdsika jdsika requested a review from ClemensLinnhoff April 15, 2024 10:09
@jdsika jdsika added the quality Quality improvements. label Apr 15, 2024
@jdsika
Copy link
Contributor

jdsika commented Apr 15, 2024

@masipp can you tell us where the requirements for parallel came form?

@pmai
Copy link
Contributor Author

pmai commented Apr 15, 2024

@masipp can you tell us where the requirements for parallel came form?

And just to be clear: There is of course potential for single-trace speed-ups from proper parallelization, however:

  • Currently it just was not there, likely also due to too fine-grained parallelization and other problems with the approach taken
  • It only makes sense to introduce this once the rest of the code is refactored, as currently the log-handling is fully memory-bound and memory-limited, and is the principal bottleneck
  • It really should be handled transparently to the user, i.e. if anything there should only be an option to disable parallelization, for cases where multiple traces are validated in parallel, as that is a much better use of multiple cores, instead of intra-file parallelization.

The same is btw. true for most of the other argument options: This really should be handled internally (e.g. limiting memory use, buffering where useful, etc.), and should not be exposed as user arguments. But that has to be handled in a separate PR.

@pmai
Copy link
Contributor Author

pmai commented Apr 15, 2024

And as another aside: The code retains the influence of the blast argument on resetting of the LOGS variable. However given that all log entries are currently retained in the logger component, I do not see the point in the LOGS variable and especially the resetting of it: It definitely just looks like a bug somewhere, however this is mostly related to logging and not the I/O, hence I retained the resetting for now.

Similarly I did not rewrite the use of global variables, and various other suspect aspects of data flow, as that should be handled in a separate PR.

@ClemensLinnhoff
Copy link
Contributor

I fully agree. We should limit the user options to the things that actually should be set by the user. In my (somewhat limited) tests with a couple of trace files I did not notice any performance issues from a user perspective. I can really only see the need for performance improvements, if a big database of trace files should be tested. And then I also agree, that testing multiple trace files should be parallelized and not parallelization within individual trace file. And if at some point we want to implement cross-time-step rules (see #65) parallelizing time steps within one trace file will not work anyways.

@pmai pmai marked this pull request as ready for review May 10, 2024 11:26
@jdsika jdsika merged commit 71e8e24 into master May 15, 2024
6 checks passed
@jdsika jdsika deleted the refactor/simplify-trace-handling branch May 15, 2024 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quality Quality improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants