classifier is very slow #69

keighrim · 2024-02-10T08:08:29Z

Bug Description

Classifier class has process_video that iterates video frame by frame (with a sample rate) and extract image, feature vectors, run classification. We found that the method is very slow, event on GPU. I did some manual profiling, and I believe we should work on improving video seeking code.

Reproduction steps

With

$ clams source video:/some.mp4 | curl -d@- "localhost:23231?stopAt=120000"  # use first 12 secs only

logging says;

2024-02-08 12:22:25 http://apps.clams.ai/swt-detection/unresolvable INFO     139987732137536 Processing video d1                                                               
2024-02-08 12:22:38 http://apps.clams.ai/swt-detection/unresolvable INFO     139987732137536 Processing took 12.783082723617554 seconds

Hmm, that's real-time. This was measured on a GPU machine, and GPU utilization was confirmed.

So I wrote a quick and simple profiling

    def process_video(self, mp4_file: str) -> list:
        """Loops over the frames in a video and for each frame extracts the features
        and applies the classifier. Returns a list of predictions, where each prediction
        is an instance of numpy.ndarray."""
        featurizing_time = 0
        classifier_time = 0
        extract_time = 0
        seek_time = 0
        if self.debug:
            print(f'Processing {mp4_file}...')
            print(f'Labels: {self.prebin_labels}')
        logging.info(f'processing {mp4_file}...')
        predictions = []
        vidcap = cv2.VideoCapture(mp4_file)
        if not vidcap.isOpened():
            raise IOError(f'Could not open {mp4_file}')
        fps = round(vidcap.get(cv2.CAP_PROP_FPS), 2)
        fc = vidcap.get(cv2.CAP_PROP_FRAME_COUNT)
        dur = round(fc / fps, 3) * 1000
        for ms in range(0, sys.maxsize, self.sample_rate):
            if ms < self.start_at:
                continue
            if ms > self.stop_at:
                break
            t = time.time()
            vidcap.set(cv2.CAP_PROP_POS_MSEC, ms)
            seek_time += time.time() - t
            t = time.time()
            success, image = vidcap.read()
            extract_time += time.time() - t
            if not success:
                break
            img = Image.fromarray(image[:,:,::-1])
            t = time.time()
            features = self.featurizer.get_full_feature_vectors(img, ms, dur)
            featurizing_time += time.time() - t
            t = time.time()
            prediction = self.classifier(features).detach()
            prediction = Prediction(ms, self.prebin_labels, prediction)
            classifier_time += time.time() - t
            if self.debug:
                print(prediction)
            predictions.append(prediction)
        sys.stderr.write(f'Featurizing time: {featurizing_time:.2f} seconds\n')
        sys.stderr.write(f'Classifier time: {classifier_time:.2f} seconds\n')
        sys.stderr.write(f'Extract time: {extract_time:.2f} seconds\n')
        sys.stderr.write(f'Seeking time: {seek_time:.2f} seconds\n')
        return predictions

And with the same curl call, I got

2024-02-10 02:38:13 http://apps.clams.ai/swt-detection/unresolvable INFO     139892458784320 Processing video d1
Featurizing time: 6.11 seconds
Classifier time: 0.05 seconds
Extract time: 0.07 seconds
Seeking time: 5.98 seconds
2024-02-10 02:38:26 http://apps.clams.ai/swt-detection/unresolvable INFO     139892458784320 Processing took 12.812922716140747 seconds

So it turns out "moving cursor" in the video is taking as much time as the complex CNN feature extraction.

The cv2 instance I was using was compiled with ffmpeg, and to my knowledge, seeking to a keyframe is a constant-time operation (correct me if I'm wrong, please).

I can only guess that using POS_MSEC instead of frame number adds a big overhead. Probably further experiment is needed.

Expected behavior

No response

Screenshots

No response

Additional context

https://forum.opencv.org/t/cap-prop-pos-frames-is-abnormal-slow/11651/3 (this is possibly related to #67 too)

video file info

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/llc_data/clams/wgbh/Peabody/cpb-aacip-526-4b2x34nn45.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42isom
    creation_time   : 2020-01-22T10:27:50.000000Z
  Duration: 00:33:09.37, start: 0.000000, bitrate: 795 kb/s
  Stream #0:0[0x1](und): Video: h264 (Main) (avc1 / 0x31637661), none(progressive), 480x360, 657 kb/s, SAR 1:1 DAR 4:3, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      creation_time   : 2020-01-22T10:27:50.000000Z
      vendor_id       : TELE
      encoder         : AVC
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2020-01-22T10:27:50.000000Z
      vendor_id       : [0][0][0][0]

The text was updated successfully, but these errors were encountered:

keighrim · 2024-02-10T09:34:30Z

Another data point;

$ ffmpeg -i /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf56.4.101
  Duration: 00:59:25.22, start: 0.000000, bitrate: 830 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), none(progressive), 480x360, 694 kb/s, 29.97 fps, 29.97 tbr, 60k tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]

then, running this video to SWT @ develop, with all default params,

2024-02-10 04:27:52 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Processing video d1 at /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4
2024-02-10 04:27:52 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 processing /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4...
2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Featurizing time: 166.46 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Classifier time: 1.53 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Extract time: 1.76 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Seeking time: 37.90 seconds

2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO     140602549245504 Processing took 224.91390013694763 seconds
2024-02-10 04:31:37 werkzeug INFO     140602549245504 127.0.0.1 - - [10/Feb/2024 04:31:37] "POST /?pretty HTTP/1.1" 200 -

keighrim · 2024-02-10T09:38:15Z

So for NH videos, it takes up to 5 mins to process the whole (that's RTx12). Does the Peabody video has something special? Does stopAt parameter do something wonky?

marcverhagen · 2024-02-13T15:06:42Z

Interesting. Here is another data point.

On my 9 year old iMac (no usable GPU) I get the following on a 10 second video:

total time: 11.32s
seek time: 0.59s

So that would be less than a second seek time for 10 seconds of video, versus the 6 seconds you see for a 12 second video.

And for what it's worth, at work on my iMac Pro total processing time for a one hour video is about 15 minutes.

Does stopAt parameter do something wonky?

Maybe, but all it should be doing is break out of a loop:

app-swt-detection/modeling/classify.py

Lines 97 to 100 in e399e92

    
           if ms < self.start_at: 
        
               continue 
        
           if ms > self.stop_at: 
        
               break

keighrim · 2024-03-01T00:13:48Z

Another data point from a fresh experiment.

2024-02-29 19:02:25,869 - INFO - Initiating classifier with /home/krim/Projects/clams/clamsproject/app-swt-detection/modeling/models/20240212-131937.convnext_tiny.kfold_000.pt
2024-02-29 19:02:26,655 - INFO - Classifier initiation took 0.7860872745513916 seconds
2024-02-29 19:02:26,656 - INFO - Minimum time frame score: 0.5
2024-02-29 19:02:26,657 - INFO - Processing video d1 at /home/krim/cpb-aacip-259-c53f255d.mp4
2024-02-29 19:04:06,317 - INFO - Featurizing time: 35.16 seconds

2024-02-29 19:04:06,328 - INFO - Classifier time: 1.20 seconds

2024-02-29 19:04:06,329 - INFO - Extract time: 0.87 seconds

2024-02-29 19:04:06,330 - INFO - Seeking time: 53.57 seconds

2024-02-29 19:04:06,331 - INFO - Processing took 99.6482183933258 seconds
2024-02-29 19:04:07,520 - INFO - Initiating classifier with /home/krim/Projects/clams/clamsproject/app-swt-detection/modeling/models/20240212-132306.convnext_lg.kfold_000.pt
2024-02-29 19:04:11,093 - INFO - Classifier initiation took 3.5734410285949707 seconds
2024-02-29 19:04:11,097 - INFO - Minimum time frame score: 0.5
2024-02-29 19:04:11,098 - INFO - Processing video d1 at /home/krim/cpb-aacip-259-c53f255d.mp4
2024-02-29 19:06:43,625 - INFO - Featurizing time: 87.94 seconds

2024-02-29 19:06:43,626 - INFO - Classifier time: 1.26 seconds

2024-02-29 19:06:43,627 - INFO - Extract time: 0.87 seconds

2024-02-29 19:06:43,628 - INFO - Seeking time: 53.59 seconds

2024-02-29 19:06:43,629 - INFO - Processing took 152.51135778427124 seconds

Ran on a desktop GPU machine. For this file (cpb-aacip-259-c53f255d), seeking time still is a significant portion of the runtime both with tiny and lg backbones. In term of CNN application time, tiny and lg models took 35 secs, 88 secs, respectively.

I will keep experimenting with other files (newhours and other videos @owencking shared ealier this week)

keighrim · 2024-03-04T17:44:34Z

Ran some additional experiments over the weekend, and here are the results;

https://gist.github.com/keighrim/fac746647af02c039c1dbcca2a27db6d

Some findings from simple eyeballing over the numbers;

seeking time very much varied on the "new" 8 videos, while on NH videos seeking took roughly consistently ~35-40 secs per 60-min video
feature extraction time grows roughly linear to the total number of frames.
image extraction and classification time compose very small portion of the total times.

keighrim · 2024-03-11T14:08:43Z

With a new implementation of frame image extraction in #86 , (although I haven't accurately measure the timing for a number of videos yet) the seeking + extraction time for 2 videos I was experimenting with is seen significantly reduced. (50sec > 22sec, 54sec > 24sec)

keighrim added the 🐛B Something isn't working label Feb 10, 2024

clams-bot added this to apps Feb 10, 2024

github-project-automation bot moved this to Todo in apps Feb 10, 2024

clams-bot assigned keighrim Mar 5, 2024

This was referenced Mar 5, 2024

added logging of computation times #82

Merged

Optimizing video seek time for still image extraction #86

Merged

clams-bot assigned MrSqually Mar 18, 2024

keighrim unassigned MrSqually Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classifier is very slow #69

classifier is very slow #69

keighrim commented Feb 10, 2024 •

edited

Loading

keighrim commented Feb 10, 2024 •

edited

Loading

keighrim commented Feb 10, 2024

marcverhagen commented Feb 13, 2024

keighrim commented Mar 1, 2024

keighrim commented Mar 4, 2024

keighrim commented Mar 11, 2024

classifier is very slow #69

classifier is very slow #69

Comments

keighrim commented Feb 10, 2024 • edited Loading

Bug Description

Reproduction steps

Expected behavior

Screenshots

Additional context

keighrim commented Feb 10, 2024 • edited Loading

keighrim commented Feb 10, 2024

marcverhagen commented Feb 13, 2024

keighrim commented Mar 1, 2024

keighrim commented Mar 4, 2024

keighrim commented Mar 11, 2024

keighrim commented Feb 10, 2024 •

edited

Loading

keighrim commented Feb 10, 2024 •

edited

Loading