Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create face detection demo app #29

Open
goeddea opened this issue Jan 22, 2019 · 4 comments
Open

create face detection demo app #29

goeddea opened this issue Jan 22, 2019 · 4 comments
Assignees
Labels
app a set of components which fulfill a (business) use case

Comments

@goeddea
Copy link
Contributor

goeddea commented Jan 22, 2019

We want an application which provides detection of human faces in a live video stream and can show this in a browser-based frontend.

This is microservice-based, and I see three components as part of this:

  • camera capture component - This runs on a device to which a camera is connected. It captures a video stream/stream of images from the camera and transmits this as subscription events.
  • face detection component - this receives a stream of images and emits the position of human faces in this stream
  • video display component - this receives a stream of images and may receive a stream of human face positions regarding this stream. It displays the stream as a video and may mark the face positions within that stream.

Interaction:

  • The display and processing in our demo is triggered from the video display component, e.g. by a user there selecting a camera from a menu, and additionally picking whether to display just the video or video and marked faces (potentially: how to mark the faces).
  • Based on this the camera capture component is instructed to start transmitting images. (In addition to a simple on/off switch, this may also take a topic as an argument, if a single camera may serve consumers across multiple apps)
  • if face detection is desired, the face detection component is given the topic of the camera capture component, as well as the topic to which to publish the face detection data
  • in the video display component, the video data from the camera capture component is displayed, and markings of face positions are overlayed if desired

With this, we can use the video display component twice, once to show the raw image stream, once to show the face positions as part of our demo.

My initial (naive) assumption regarding coordination between the two data streams is that this is via a timecode generated by the camera capture component, which is then also used for the face detection data stream. The video display component can then cache either until the required pairs are present.

@om26er - does the above sound reasonable?

@goeddea goeddea added the app a set of components which fulfill a (business) use case label Jan 22, 2019
@goeddea goeddea added this to the embedded world 2019 showcase milestone Jan 22, 2019
@goeddea
Copy link
Contributor Author

goeddea commented Jan 22, 2019

related/kinda superset of #27

@oberstet
Copy link
Contributor

Let me add some infos and background from my side, in particular regarding the ML stuff.

In ML software is often split into 2 pieces: a) training/learning and b) detection/run-time.

"face detection component": this would be the b) in above. It needs to load/access an already trained model, and only apply that model to new incoming data and output prediction ("Is this picture/video frame a human face, yes or no?")

so we actually should have 2 components for the ML part:

  • pattern detection component
  • pattern detection training component

for the demo, pattern == human face is perfect. but we should design the components in a way that generalizes to pattern (see below)


The specific ML algorithm that we should use for this is "Haar cascades". A good intro can be found here: http://www.willberger.org/cascade-haar-explained/

The output (when using OpenCV for haar cascades via cv2.CascadeClassifier) is exactly 1 XML file == trained model.

The OpenCV project provides a bunch of ready-to-use trained models here: https://github.com/opencv/opencv/blob/master/data/haarcascades/

One model provided is haarcascade_frontalface_default.xml, which is a haar cascade model trained to detect human faces.

Being XML, it is verbose, and can be compressed 10x: https://gist.github.com/oberstet/5f91645cb6d4497676b8cca7b83d12e5


The training component (different from the run-time component) essentially needs to do:

  • Input: Two sets of raw images (positive and negative examples)
  • Preprocessing/normalization of all images (eg size, color/grayscale, etc)
  • Split latter preprocessed images into training and test set
  • Train a model using the training set
  • Test the trained model on the test set (to compute expected precision/recall and such)
  • Output: Store the model as XML file

The detection run-time component (processing the live video frames) needs to do:

  • load the model into a run-time classifier (which does not contain training ability)
  • shuffles video frames through the detector
  • publish notifications when faces are detected

We could for example have WAMP procs in the ML run-time component:

  1. store_model(compressed_xml, label, description) -> UUID (= SHA fingerprint of XML): store the xml locally within the run-time component disk
  2. load_model(UUID) -> ok|error: load a previously stored model - only works if no model is currently running
  3. run_model() -> ok|error: start the previously loaded model, will begin to process live video frames (received from the camera capture component)
  4. stop_model() -> ok|error: stop the currently running model (if any)
  5. list_models() -> [{UUID, label, descripton}]

And then eg have the ML training component call into store_model etc etc


some more links:

@oberstet
Copy link
Contributor

the reason for above generalization (pattern vs only faces plus run-time and training component) is: doing so makes this actually much more than a demo!

eg we could add further down the line UI that allows an end user to upload and define training sets of arbitrary pictures/images for other application:

  • industrial user wants to detect "broken parts vs ok parts" in an industrial imaging setup

because: face detection is obviously not sth an industrial use would practically do. however, "broken parts vs ok parts" is actually very very relevant

@goeddea
Copy link
Contributor Author

goeddea commented Feb 5, 2019

For the initial version, which can use the existing model that Open CV provides

  • the video capture component and the analysis component are in Python
  • video display is in the browser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app a set of components which fulfill a (business) use case
Projects
None yet
Development

No branches or pull requests

3 participants