Add NVDEC reconfiguration capability to improve NVDEC cache hit rate and reduce initialization overhead. #1176

IteratorandIterator · 2026-01-21T03:53:32Z

Added NVDEC decoder reconfiguration capability on top of the NVDEC cache, which improves cache hit rate. Re-creating a completely new decoder takes approximately 50–70 milliseconds to initialize, whereas reconfiguring an existing decoder reduces initialization time to about 5–6 milliseconds—a 10× reduction in initialization latency (tested on an H100 GPU).

…and reduce initialization overhead.

meta-cla · 2026-01-21T03:53:38Z

Hi @IteratorandIterator!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2026-01-21T04:06:47Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

NicolasHug

Thanks a lot for the great PR @IteratorandIterator ! This is definitely something we'll want to include if we can verify the performance improvements.

Just checking my understanding first: before, we would never reconfigure a decoder, so to have a cache hit we expected all of codecType, width , height, chromaFormat, bitDepthLumaMinus8, and numDecodeSurfaces to match exactly. Now, we still require all of these to match to re-use the decoder, but we allow width , height, and numDecodeSurfaces to differ as long as width and height are smaller than the cache entry - in which case we can reconfigure.

Is that understanding correct?

Regarding the performance improvement, would you be able to share a quick reproducible benchmark showing the performance gain? I assume we should observe the most gain when decoding multiple videos sequentially with decreasing height and width?

Let me know if that's something you have the bandwidth for, and thanks a lot for the PR!

NicolasHug · 2026-01-21T12:14:05Z

src/torchcodec/_core/NVDECCache.cpp

+        context.second.ulMaxWidth >= videoFormat->coded_width &&
+        context.second.ulMaxHeight >= videoFormat->coded_height


Quick question: does there need to be any condition on numDecodeSurfaces for the decoder to be eligible for re-configuration?

The numDecodeSurfaces value returned by each reconfiguration should be less than the ulMaxNumDecodeSurfaces specified when creating the video parser. Since the current code fixes ulMaxNumDecodeSurfaces to 8 when creating the video parser, if the tests pass under this setting, it indicates that for most videos the required ulNumDecodeSurfaces is less than or equal to 8. Therefore, I did not add an explicit check here, but it is indeed necessary and important to enforce this constraint.

I will add this constraint soon~

IteratorandIterator · 2026-01-21T14:53:22Z

Thanks a lot for the great PR @IteratorandIterator ! This is definitely something we'll want to include if we can verify the performance improvements.

Just checking my understanding first: before, we would never reconfigure a decoder, so to have a cache hit we expected all of codecType, width , height, chromaFormat, bitDepthLumaMinus8, and numDecodeSurfaces to match exactly. Now, we still require all of these to match to re-use the decoder, but we allow width , height, and numDecodeSurfaces to differ as long as width and height are smaller than the cache entry - in which case we can reconfigure.

Is that understanding correct?

Regarding the performance improvement, would you be able to share a quick reproducible benchmark showing the performance gain? I assume we should observe the most gain when decoding multiple videos sequentially with decreasing height and width?

Let me know if that's something you have the bandwidth for, and thanks a lot for the PR!

Hi, @NicolasHug !

When creating a decoder for the first time, it is usually necessary to specify ulMaxHeight and ulMaxWidth. During subsequent reconfiguration, it is only required that the new bitstream’s coded_height and coded_width are less than or equal to the ulMaxHeight and ulMaxWidth specified at creation time. In other words, If relatively large ulMaxHeight and ulMaxWidth values (for example, 4K) are specified manually when creating the decoder, it will result in a higher cache hit rate. In this PR, the reason for setting these values to the resolution of the first video that uses the decoder is to remain consistent with the existing code logic, and I believe this approach is also reasonable.

Regarding ulNumDecodeSurfaces, based on my reading of the official NVIDIA documentation as well as my own experimental results, the value set when creating the decoder usually does not affect the outcome. This is because the pfnSequenceCallback function returns the actual ulNumDecodeSurfaces required by the current bitstream and updates the value dynamically. However, the ulNumDecodeSurfaces value returned during reconfiguration must not exceed the ulMaxNumDecodeSurfaces specified when creating the video parser; otherwise, the decoding result will be incorrect.

Oh— I just realized that I did not add a constraint to ensure that the current bitstream’s ulNumDecodeSurfaces is less than or equal to ulMaxNumDecodeSurfaces. That said, the current code fixes ulMaxNumDecodeSurfaces to 8, so the test videos likely all use a ulNumDecodeSurfaces value that is less than or equal to 8 by default.

I would be very happy to provide a reproducible benchmark. I also noticed that some test results did not pass, and I am not sure whether this is due to a logic issue in the code I provided or some edge cases that I did not consider. If you could let me know how to inspect the source code of the tests being run, I would be glad to fix these bugs and help merge the reconfiguration functionality into torchcodec.

NicolasHug · 2026-01-21T15:27:10Z

Thanks for the details!

Regarding the test failures, if you click on one of those links

you should then have access to the logs like this one: https://github.com/meta-pytorch/torchcodec/actions/runs/21201605282/job/61015703549?pr=1176

The logs aren't always easy to read because stdout and stderr are interleaved. Here, I think the relevant errors are:

/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp: In member function ‘int facebook::torchcodec::BetaCudaDeviceInterface::reconfigureNVDECDecoder(CUVIDEOFORMAT*)’:
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::ulHeight’ [-Werror=missing-field-initializers]
  392 |   CUVIDRECONFIGUREDECODERINFO info = { 0 };
      |                                          ^
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::ulTargetWidth’ [-Werror=missing-field-initializers]
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::ulTargetHeight’ [-Werror=missing-field-initializers]
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::ulNumDecodeSurfaces’ [-Werror=missing-field-initializers]
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::reserved1’ [-Werror=missing-field-initializers]
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::display_area’ [-Werror=missing-field-initializers]
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::target_rect’ [-Werror=missing-field-initializers]
/__w/torchcodec/torchcodec/meta-pytorch/torchcodec/src/torchcodec/_core/BetaCudaDeviceInterface.cpp:392:42: error: missing initializer for member ‘_CUVIDRECONFIGUREDECODERINFO::reserved2’ [-Werror=missing-field-initializers]
cc1plus: all warnings being treated as errors

… ulMaxNumDecodeSurfaces.

IteratorandIterator · 2026-01-22T02:52:42Z

Thanks a lot!

I've already fixed the bug. If this code passes testing, I'll start preparing the reproducible benchmark.

IteratorandIterator · 2026-01-22T08:27:54Z

Sorry @NicolasHug , I misexplained the part about ulNumDecodeSurfaces yesterday. It seems that ulMaxNumDecodeSurfaces, specified when creating the parser, is actually the one that can be dynamically changed, and the number of surfaces used during reconfiguration should be less than the ulNumDecodeSurfaces set when creating the decoder. If this passes testing, the conclusion should be correct.

Lingxiao Zhao added 2 commits January 21, 2026 11:35

Add NVDEC reconfiguration capability to improve NVDEC cache hit rate …

c6a1e2c

…and reduce initialization overhead.

Add NVDEC reconfiguration capability to improve NVDEC cache hit rate …

ff02729

…and reduce initialization overhead.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 21, 2026

fix decoder reconfig: take into account ulMaxWidth and ulMaxHeight

3f26db7

NicolasHug reviewed Jan 21, 2026

View reviewed changes

Lingxiao Zhao added 3 commits January 22, 2026 10:18

fix: eliminated compilation warnings and added a validation check for…

ba1a65b

… ulMaxNumDecodeSurfaces.

fix: eliminated compilation warnings and added a validation check for…

2beea71

… ulMaxNumDecodeSurfaces.

fix: eliminated compilation warnings and added a validation check for…

134527b

… ulMaxNumDecodeSurfaces.

fix reconfig bug: change the decode_surfaces constraint

5d46195

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NVDEC reconfiguration capability to improve NVDEC cache hit rate and reduce initialization overhead. #1176

Add NVDEC reconfiguration capability to improve NVDEC cache hit rate and reduce initialization overhead. #1176

IteratorandIterator commented Jan 21, 2026

Uh oh!

meta-cla bot commented Jan 21, 2026

Uh oh!

meta-cla bot commented Jan 21, 2026

Uh oh!

NicolasHug left a comment

Uh oh!

NicolasHug Jan 21, 2026

Uh oh!

IteratorandIterator Jan 21, 2026

Uh oh!

IteratorandIterator Jan 21, 2026

Uh oh!

IteratorandIterator commented Jan 21, 2026

Uh oh!

NicolasHug commented Jan 21, 2026

Uh oh!

IteratorandIterator commented Jan 22, 2026

Uh oh!

IteratorandIterator commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		context.second.ulMaxWidth >= videoFormat->coded_width &&
		context.second.ulMaxHeight >= videoFormat->coded_height

Add NVDEC reconfiguration capability to improve NVDEC cache hit rate and reduce initialization overhead. #1176

Are you sure you want to change the base?

Add NVDEC reconfiguration capability to improve NVDEC cache hit rate and reduce initialization overhead. #1176

Conversation

IteratorandIterator commented Jan 21, 2026

Uh oh!

meta-cla bot commented Jan 21, 2026

Action Required

Process

Uh oh!

meta-cla bot commented Jan 21, 2026

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

IteratorandIterator Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

IteratorandIterator Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

IteratorandIterator commented Jan 21, 2026

Uh oh!

NicolasHug commented Jan 21, 2026

Uh oh!

IteratorandIterator commented Jan 22, 2026

Uh oh!

IteratorandIterator commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants