-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create thread for mask calculation #59
Conversation
IMHO it's an advantage to have the grabber thread and the mask calculation in different threads. Ideally, and this is somewhat suboptimal right now, the grabber thread just provides images it captured from the camera while the mask calculation thread latches onto the last grabbed frame whenever it is ready for the next frame for processing. That way frames could be silently discarded when processing a mask is too slow. |
My fork (https://github.com/phlash/deepbacksub/) takes a similar approach to that suggested by Ben: a grab-mask-output thread that runs at full video frame rate using only OpenCV and V4L2, into which the mask calculator thread inserts an updated mask, generated at whatever rate it can maintain, from the latest frame grabbed. This is also the approach taken by ViBa, and Jitsi to minimise video lag, even if the ML is a bit challenged on cycles. Note that the grab thread at present exists to ensure we read input frames promptly, again to reduce lag, as OpenCV will queue up video rather than discard it (and there are few/no controls over this). As long as we continue to read-mask-write quickly I don't mind which thread has which responsibility :) |
The approach with updating the mask async and passing frames thru with a maybe slightly outdated mask seems fine too for most situations. Might be an option to either discard frames or use the old mask. |
@peckto I have now read through your changes in detail, and tried them out locally, it looks good. I'll mark up a couple of syntax things I spotted, but otherwise great stuff! Do we think anyone would want this to operate synchronously below the input frame rate (ie: old behaviour)? If so, how? |
deepseg.cc
Outdated
// copy frame from shared buffer | ||
cv::Mat raw; | ||
pthread_mutex_lock(&info.lock_raw); | ||
raw = info.raw.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid the double cloning (in/out of shared buffer)? Unsure of performance impact... see below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structurally we could pre-allocate a few picture buffers and just hand out the pointers to them. This would require proper tracking which pointers are in use at any time and would require some preparations not yet present in the current code.
deepseg.cc
Outdated
pthread_mutex_unlock(&capinfo.lock); | ||
// we can now guarantee capinfo.raw will remain unchanged while we process it.. | ||
calcinfo.raw = *capinfo.raw; | ||
calcinfo.raw = raw.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above comment regarding double clone, we may want to avoid this one, if there is any material performance improvement.
Regarding opencv and the grab thread: With this, I think we can work with the single thread option. |
@phlash: Thank you for your feedback! I will address your points the next days. I'm just not sure, how we should proceed with the "double copy" thing. |
It's preferable to avoid any large copies of memory buffers. Both for performance reasons and for reasons of memory/resource usage. An easy solution is to pre-allocate several buffers large enough to hold a frame and just hand pointers to these allocated objects around. With the current architecture of one thread for grabbing, one for mask generation and the main thread handling the final frame merging this means we'd be fine with a pool of about 8 buffers (one active and one inactive per thread, plus some small reserve). That way each thread could hold on to the last active buffer it acquired from the previous step each without blocking and still allow for enough slack that slow mask calculations won't block everything on the main thread too much, all ensuring constant memory usage.. |
I'd like to hold off on a buffer pool until we know the impact of a couple of clone() calls between threads - the timing we have should make it obvious if we have a problem or not. Thanks for the speedy work @peckto! |
I added some more measurements for the ai thread.
(For some reason I currently don't get 30fps??) I agree with you, that in general we should avoid double copy. |
@peckto, @BenBE: I'm not sure where we are with this PR? To me it looks like @peckto is waiting on a question about double buffering, and then we'll want to check performance is ok before we merge? We can address a switch to the C++11 thread toolkit in a separate PR (as per experimental branch). Regarding the double buffering - I think that should work, with the 'slower' thread (always mask calculation) swapping buffer pointers (while holding a lock) as it either consumes or writes them. This avoids buffer pool management, while also avoiding multiple copying of video frames. |
Haven't tested this yet and with the larger structural rework I think having this land on experimental first is preferable (@floe What do you think?). This also has the advantage of already having the C++11 patches as a basis to work from. Haven't had too much time for thorough reviews and tests of the experimental branch the last days, so I'd recommend the experimental changes ripen there for another few days before we merge them to main. Will try to get a review for this PR tomorrow or on Monday as time permits. |
I added the double buffer structure for the frame buffer Mat. The main thread copies the received frame to the |
I had a misunderstanding regarding the |
You might be looking for this: |
@peckto - yep you are correct, I re-read the manual (https://linux.die.net/man/3/pthread_cond_wait):
So there should indeed be a new frame indicator, set just before raising the signal, checked/cleared after wakeup in the other thread eg: // In calc_mask()
// ...
pthread_mutex_lock(&info.lock_raw);
while (!info.new_frame) {
pthread_cond_wait(&info.condition_new_frame, &info.lock_raw);
}
info.new_frame = false;
// ...any other stuff requiring the lock
pthread_mutex_unlock(&info.lock_raw);
...
// In main()
// ...
pthread_mutex_lock(&calcinfo.lock_raw);
// ...any other stuff requiring the lock
calcinfo.new_frame = true;
pthread_cond_signal(&calcinfo.condition_new_frame);
pthread_mutex_unlock(&calcinfo.lock_raw); this will avoid the issue you noticed where the mask calculation rate drops to half frame rate, since it's waiting for the next signal even when a new frame is already available. |
Combining |
The major refactoring work normally should to to experimental, main should be left for minor fixups and continuous improvements, that are unlikely to cause stability issues. Cf. here. Thus this should probably be going to experimental too, as note above. |
ACK. |
I'm working on the rebase to the experimental branch. I noticed the PR #86, which seems to have overlapping code changes with this PR. |
@peckto Any preference on your side? IMHO splitting things into a library first and afterwards extracting some operations into their own thread is probably the more efficient way to go, as the library split forces a separation of concerns in some sense. What do you think? |
I think #86 is going to need a fair amount of tidying up done before it gets merged and I'd like this one in sooner to improve UX. IMO any threading should be external to TF/the library (which aims to do one job well), so there shouldn't be a clash (although I expect a few function name changes to come along with the separation, they should be trivial fixes). |
The MR is now rebased to experimental branch and commits are squashed. |
@peckto Do you mind to rebase to resolve the merge conflicts? |
Hi @BenBE, sorry for the delay. A rebase turns out to be not straight forward. |
Sure, don't worry. Absolutely no need to rush anything. |
Having the actual mask calculation in the lib, makes this MR much cleaner I think. |
Thanks @peckto, that's looking really tidy! |
I adapted the logging. |
This LGTM, works as expected, merging! |
This is a new implementation of #38
I moved the grab functionality into the main thread and created a new thread for the mask calculation.