Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

classifier is very slow #69

Open
keighrim opened this issue Feb 10, 2024 · 6 comments
Open

classifier is very slow #69

keighrim opened this issue Feb 10, 2024 · 6 comments
Assignees
Labels
🐛B Something isn't working

Comments

@keighrim
Copy link
Member

keighrim commented Feb 10, 2024

Bug Description

Classifier class has process_video that iterates video frame by frame (with a sample rate) and extract image, feature vectors, run classification. We found that the method is very slow, event on GPU. I did some manual profiling, and I believe we should work on improving video seeking code.

Reproduction steps

With

$ clams source video:/some.mp4 | curl -d@- "localhost:23231?stopAt=120000"  # use first 12 secs only

logging says;

2024-02-08 12:22:25 http://apps.clams.ai/swt-detection/unresolvable INFO     139987732137536 Processing video d1                                                               
2024-02-08 12:22:38 http://apps.clams.ai/swt-detection/unresolvable INFO     139987732137536 Processing took 12.783082723617554 seconds

Hmm, that's real-time. This was measured on a GPU machine, and GPU utilization was confirmed.

So I wrote a quick and simple profiling

    def process_video(self, mp4_file: str) -> list:
        """Loops over the frames in a video and for each frame extracts the features
        and applies the classifier. Returns a list of predictions, where each prediction
        is an instance of numpy.ndarray."""
        featurizing_time = 0
        classifier_time = 0
        extract_time = 0
        seek_time = 0
        if self.debug:
            print(f'Processing {mp4_file}...')
            print(f'Labels: {self.prebin_labels}')
        logging.info(f'processing {mp4_file}...')
        predictions = []
        vidcap = cv2.VideoCapture(mp4_file)
        if not vidcap.isOpened():
            raise IOError(f'Could not open {mp4_file}')
        fps = round(vidcap.get(cv2.CAP_PROP_FPS), 2)
        fc = vidcap.get(cv2.CAP_PROP_FRAME_COUNT)
        dur = round(fc / fps, 3) * 1000
        for ms in range(0, sys.maxsize, self.sample_rate):
            if ms < self.start_at:
                continue
            if ms > self.stop_at:
                break
            t = time.time()
            vidcap.set(cv2.CAP_PROP_POS_MSEC, ms)
            seek_time += time.time() - t
            t = time.time()
            success, image = vidcap.read()
            extract_time += time.time() - t
            if not success:
                break
            img = Image.fromarray(image[:,:,::-1])
            t = time.time()
            features = self.featurizer.get_full_feature_vectors(img, ms, dur)
            featurizing_time += time.time() - t
            t = time.time()
            prediction = self.classifier(features).detach()
            prediction = Prediction(ms, self.prebin_labels, prediction)
            classifier_time += time.time() - t
            if self.debug:
                print(prediction)
            predictions.append(prediction)
        sys.stderr.write(f'Featurizing time: {featurizing_time:.2f} seconds\n')
        sys.stderr.write(f'Classifier time: {classifier_time:.2f} seconds\n')
        sys.stderr.write(f'Extract time: {extract_time:.2f} seconds\n')
        sys.stderr.write(f'Seeking time: {seek_time:.2f} seconds\n')
        return predictions

And with the same curl call, I got

2024-02-10 02:38:13 http://apps.clams.ai/swt-detection/unresolvable INFO     139892458784320 Processing video d1
Featurizing time: 6.11 seconds
Classifier time: 0.05 seconds
Extract time: 0.07 seconds
Seeking time: 5.98 seconds
2024-02-10 02:38:26 http://apps.clams.ai/swt-detection/unresolvable INFO     139892458784320 Processing took 12.812922716140747 seconds

So it turns out "moving cursor" in the video is taking as much time as the complex CNN feature extraction.

The cv2 instance I was using was compiled with ffmpeg, and to my knowledge, seeking to a keyframe is a constant-time operation (correct me if I'm wrong, please).

I can only guess that using POS_MSEC instead of frame number adds a big overhead. Probably further experiment is needed.

Expected behavior

No response

Screenshots

No response

Additional context

https://forum.opencv.org/t/cap-prop-pos-frames-is-abnormal-slow/11651/3 (this is possibly related to #67 too)

video file info

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/llc_data/clams/wgbh/Peabody/cpb-aacip-526-4b2x34nn45.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42isom
    creation_time   : 2020-01-22T10:27:50.000000Z
  Duration: 00:33:09.37, start: 0.000000, bitrate: 795 kb/s
  Stream #0:0[0x1](und): Video: h264 (Main) (avc1 / 0x31637661), none(progressive), 480x360, 657 kb/s, SAR 1:1 DAR 4:3, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      creation_time   : 2020-01-22T10:27:50.000000Z
      vendor_id       : TELE
      encoder         : AVC
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2020-01-22T10:27:50.000000Z
      vendor_id       : [0][0][0][0]
@keighrim keighrim added the 🐛B Something isn't working label Feb 10, 2024
@clams-bot clams-bot added this to apps Feb 10, 2024
@github-project-automation github-project-automation bot moved this to Todo in apps Feb 10, 2024
@keighrim
Copy link
Member Author

keighrim commented Feb 10, 2024

Another data point;

$ ffmpeg -i /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf56.4.101
  Duration: 00:59:25.22, start: 0.000000, bitrate: 830 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), none(progressive), 480x360, 694 kb/s, 29.97 fps, 29.97 tbr, 60k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]

then, running this video to SWT @ develop, with all default params,

2024-02-10 04:27:52 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Processing video d1 at /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4
2024-02-10 04:27:52 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 processing /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4...
2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Featurizing time: 166.46 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Classifier time: 1.53 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Extract time: 1.76 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Seeking time: 37.90 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Processing took 224.91390013694763 seconds
2024-02-10 04:31:37 werkzeug INFO     140602549245504 127.0.0.1 - - [10/Feb/2024 04:31:37] "POST /?pretty HTTP/1.1" 200 -

@keighrim
Copy link
Member Author

So for NH videos, it takes up to 5 mins to process the whole (that's RTx12). Does the Peabody video has something special? Does stopAt parameter do something wonky?

@marcverhagen
Copy link
Contributor

Interesting. Here is another data point.

On my 9 year old iMac (no usable GPU) I get the following on a 10 second video:

total time: 11.32s
seek time: 0.59s

So that would be less than a second seek time for 10 seconds of video, versus the 6 seconds you see for a 12 second video.

And for what it's worth, at work on my iMac Pro total processing time for a one hour video is about 15 minutes.

Does stopAt parameter do something wonky?

Maybe, but all it should be doing is break out of a loop:

if ms < self.start_at:
continue
if ms > self.stop_at:
break

@keighrim
Copy link
Member Author

keighrim commented Mar 1, 2024

Another data point from a fresh experiment.

2024-02-29 19:02:25,869 - INFO - Initiating classifier with /home/krim/Projects/clams/clamsproject/app-swt-detection/modeling/models/20240212-131937.convnext_tiny.kfold_000.pt
2024-02-29 19:02:26,655 - INFO - Classifier initiation took 0.7860872745513916 seconds
2024-02-29 19:02:26,656 - INFO - Minimum time frame score: 0.5
2024-02-29 19:02:26,657 - INFO - Processing video d1 at /home/krim/cpb-aacip-259-c53f255d.mp4
2024-02-29 19:04:06,317 - INFO - Featurizing time: 35.16 seconds

2024-02-29 19:04:06,328 - INFO - Classifier time: 1.20 seconds

2024-02-29 19:04:06,329 - INFO - Extract time: 0.87 seconds

2024-02-29 19:04:06,330 - INFO - Seeking time: 53.57 seconds

2024-02-29 19:04:06,331 - INFO - Processing took 99.6482183933258 seconds
2024-02-29 19:04:07,520 - INFO - Initiating classifier with /home/krim/Projects/clams/clamsproject/app-swt-detection/modeling/models/20240212-132306.convnext_lg.kfold_000.pt
2024-02-29 19:04:11,093 - INFO - Classifier initiation took 3.5734410285949707 seconds
2024-02-29 19:04:11,097 - INFO - Minimum time frame score: 0.5
2024-02-29 19:04:11,098 - INFO - Processing video d1 at /home/krim/cpb-aacip-259-c53f255d.mp4
2024-02-29 19:06:43,625 - INFO - Featurizing time: 87.94 seconds

2024-02-29 19:06:43,626 - INFO - Classifier time: 1.26 seconds

2024-02-29 19:06:43,627 - INFO - Extract time: 0.87 seconds

2024-02-29 19:06:43,628 - INFO - Seeking time: 53.59 seconds

2024-02-29 19:06:43,629 - INFO - Processing took 152.51135778427124 seconds

Ran on a desktop GPU machine. For this file (cpb-aacip-259-c53f255d), seeking time still is a significant portion of the runtime both with tiny and lg backbones. In term of CNN application time, tiny and lg models took 35 secs, 88 secs, respectively.

I will keep experimenting with other files (newhours and other videos @owencking shared ealier this week)

@keighrim
Copy link
Member Author

keighrim commented Mar 4, 2024

Ran some additional experiments over the weekend, and here are the results;

https://gist.github.com/keighrim/fac746647af02c039c1dbcca2a27db6d

Some findings from simple eyeballing over the numbers;

  1. seeking time very much varied on the "new" 8 videos, while on NH videos seeking took roughly consistently ~35-40 secs per 60-min video
  2. feature extraction time grows roughly linear to the total number of frames.
  3. image extraction and classification time compose very small portion of the total times.

@keighrim
Copy link
Member Author

With a new implementation of frame image extraction in #86 , (although I haven't accurately measure the timing for a number of videos yet) the seeking + extraction time for 2 videos I was experimenting with is seen significantly reduced. (50sec > 22sec, 54sec > 24sec)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛B Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

3 participants