-
Notifications
You must be signed in to change notification settings - Fork 26
Run importers in parallel within a single importer instance #1278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a concept of what parallel means.
I think it makes sense to:
- Allow running multiple instances of the importer instance
- Allow running multiple importer runs in parallel
But I don't think it makes sense to run a single importer in a parallel mode.
Sure
I'm thinking we'll need another column in the
Can you clarify this? Are you referring to the
You're specifically referring to the |
also ... I wonder at what point do we would consider bringing in a job queue crate versus building one ourselves ... I have no experience with any rust crates so maybe there are none fit for purpose ? |
I'm thinking that threshold is very high. Because if we're requiring a distributed job queue, then I expect any crate would wrap redis/rabbit/etc. |
I think there's a bunch of way we could deal with this. One way could be a postgres based locking mechanism. Downside, it only works with postgres. Which might be ok. Another way could be do have a hash approach: have a max number of importers, and an individual instance number, and then only process those jobs. Should work with a statefulset. we could also hold a state in the database. But then we'd need to define such things as timeouts, stale detection, etc. |
If we think StatefulSet is the best option, then I'll re-assign this to you. :) |
StatefulSet is just a way of ensuring we have X number of importers running and are able to assign ascending numbers to those instances. The input to the importer would be "x of y", which could be transported via env-vars, even when using ansible. Having a default of "1 of 1", which means "process them all". |
To my understanding, this PR simply checks all importers in parallel. With no limitation. Waiting for the joint outcome of a single check run. I think this would result in the case that one importer runs for a long time, blocking the next check, and thus not really improving the situation. On an initial startup with pending import runs, yes. But afterwards, no. I think we'd need a strategy first, with the goal mentioned above, before having an implementation. |
True. So no worse than what we have, but does address the primary motivation for this PR: that QE will only have to wait as long as it takes the longest enabled importer to run for them all to be finished. Plus, better logging.
Agree. Still thinking about that. |
I would not want to merge something which only suits some specific QE task. Potentially causing issues for the overall system. Like the case that most of the times it runs sequential, while sometimes it does run in parallel. Without any way of controlling that behavior. |
It's not a QE task. They just happen to be the first users who care about how long it takes the app to import data. We already have a means to disable importers. I think there's a cancellation mechanism in there, no? I don't love that those git-based importers are sync and block threads, but it's not like we expect to have hundreds of them. Is your reluctance so high that you'd like me to close this PR? |
I added a limitation. Defaults to the old behavior: concurrency=1. |
Signed-off-by: Jim Crossley <[email protected]>
Signed-off-by: Jim Crossley <[email protected]>
Signed-off-by: Jim Crossley <[email protected]>
Signed-off-by: Jim Crossley <[email protected]>
55ad1bb
to
eaa5c2c
Compare
@@ -294,7 +282,7 @@ where | |||
} | |||
} | |||
|
|||
#[instrument(skip(self), err)] | |||
#[instrument(skip(self))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep errors. It's ok to reduce the level to e.g. INFO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think that's better. And it should also be possible to easily extend that implementation to a distributed version (running X instances at the same time).
I'd just prefer keeping the err
instrumentation. Also converting the ret
to err
instead. And maybe lowering that level to Info
, as it's not an application level error. But that's not a blocker to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this in general. But it also feels incomplete due to ongoing discussions on the ADR. Let's finish that discussion first, and then merge implementations.
Come on, man! They're independent. This PR simply adds the ability for a single instance to run multiple import jobs concurrently. Heck, this PR alone might mitigate the need for distributing work at all. It certainly satisfies the need that motivated the creation of the original downstream issue. |
Before we go distributed, we should make sure the DB is not bottlenecking us. |
Amen. 100%. I think it's also worth looking at the logic within individual importers. The git-based ones are flawed since they're sync, but they only take a few minutes to run. The sbom importer's rate varies widely during a single run, and the CSAF importer just seems broken -- I've never seen it run to completion. |
IIRC the run on the tokio worker thread pool. Not blocking anything.
I have. Multiple times: https://console-openshift-console.apps.cluster.trustification.rocks/search/ns/trustify-dumps?kind=core~v1~Pod&q=app.kubernetes.io%2Fcomponent%3Dcreate-dump |
Blocks a clean shutdown when those sync |
Maybe that's a local problem then. It looks ok on our OCP instance: |
Using the timestamps in your image, if you had attempted to Try it for yourself. |
you might want to include an update to https://github.com/trustification/trustify/blob/main/docs/env-vars.md too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Relates trustification#1207 Allows enabled/disabled state of jobs to be reflected immediately when manipulated by the UX. Signed-off-by: Jim Crossley <[email protected]>
Signed-off-by: Jim Crossley <[email protected]>
Signed-off-by: Jim Crossley <[email protected]>
Done, thx |
Suggestion: |
The default value of the new My goal for the PR was to match current behavior by default after merging, so would the |
ya I think so! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the caveat that testing might reveal need for more work in this area. I think this should go in.
Partial impl of #1207