Skip to content

Conversation

@jzadnik
Copy link
Contributor

@jzadnik jzadnik commented May 8, 2025

No description provided.

@jzadnik jzadnik requested a review from a team as a code owner May 8, 2025 14:52
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd like to put this out for review in order to get feedback from the wider SPIR-V community.

@againull againull changed the title Add SPV_INTEL_function_variants extension [SYCL][Doc] Add SPV_INTEL_function_variants extension May 9, 2025
@juliusikkala
Copy link

juliusikkala commented May 14, 2025

I don't fully understand what new capability the decoration provides, compared to using specialization constants with conditionals. E.g., assuming platformId is a specialization constant, what will this decoration do better than:

void func()
{
    if(platformId == INTEL_IGPU)
    {
        code for intel integrated
    }
    else if(platformId == RADEON)
    {
        code for radeon
    }
    else if(platformId == NVIDIA)
    {
        code for nvidia
    }
    else
    {
        fallback code
    }
}

The above is what I've usually done to have functions with different behaviour on separate platforms.

Conditional extensions & capabilities are very welcome in the computer graphics context as well. I've had to generate separate binaries for Nvidia in some cases, when I want to use e.g. the shader execution reordering extension. This would allow shipping only one.

What happens with OpConditionalCopyObjectINTEL if multiple booleans are true? I know it's "not allowed", but it doesn't seem like it can be picked up during validation either? Perhaps specifying that it takes the "first pair with true condition" would be nice? Since this happens at the JIT level and not in hardware, I don't think leaving room for undefined behaviour here is useful for optimization.

Also, I don't fully see the value of OpSpecConstantTarget & OpSpecConstantArchitecture & OpSpecConstantCapabilities, isn't this information already available through whichever API is being used, and the user could pass them along as external specialization constants?

For portions of the "targets registry" (at least for GPU vendors), you may be able to leverage or sync with existing Vulkan or OpenCL vendor / device IDs.

One capability that would be nice to have (although probably not in scope for this extension) would be target-specific structure types. There are sometimes entries that only make sense on one platform. This would likely cause lots of complications for validation though.

@jzadnik
Copy link
Contributor Author

jzadnik commented May 14, 2025

I don't fully understand what new capability the decoration provides

The ConditionalINTEL gives you #ifdef-like functionality. It turns the annotated instruction on/off during the specialization pass. In your example, you'd need to rely on optimizations to clear the unused branches away. You can also annotate types and global variables. E.g., if some variant uses a special type only present on that platform, you can annotate it with ConditionalINTEL as well. I think that would be problematic to do in your example.

it doesn't seem like it can be picked up during validation either? Perhaps specifying that it takes the "first pair with true condition" would be nice?

You're right, we can only check it in the consumer during specialization. The "first pair" makes the rules simpler, sounds like a good idea.

Also, I don't fully see the value of OpSpecConstantTarget & OpSpecConstantArchitecture & OpSpecConstantCapabilities

They make the module standalone, i.e., all the required information is contained within the module in a standardized format. You can still choose to use external specialization and pass the side information separately, and that's fine if you control both the producer and the consumer (e.g., that's how SYCL intends to use it). But, one potential use case can be, for example, generating the multi-target SPIR-V from the OpenMP declare variant construct, then shipping it to a customer device where it will be consumed by an OpenCL implementation, like PoCL. Without these spec constants, OpenMP and PoCL would need to come up with some ad-hoc mechanism to pass this information along.

target-specific structure types

You can annotate types with ConditionalINTEL. Maybe something like the following could be used?

OpDecorate %struct1 ConditionalINTEL %b1
OpDecorate %struct2 ConditionalINTEL %b2

%b1 = OpSpecConstantTarget <nvptx64>
%b2 = OpSpecConstantTarget <amdgcn>

%struct1 = OpTypeStruct ...
%struct2 = OpTypeStruct ...

@juliusikkala
Copy link

juliusikkala commented May 14, 2025

Is there an existing public implementation of this extension? Do you have an implementation / plan on how to expose this functionality in a high-level language like OpenCL C (or something other than OpenMP)? If some day in the future a Vulkan implementation happens to gain support for this extension, I may be interested in implementing this for Slang.

@jzadnik
Copy link
Contributor Author

jzadnik commented May 15, 2025

Not public yet, but the goal is to upstream a proof-of-concept producer to SPIRV-Tools, extending spirv-link, and consumer to SPIRV-LLVM-Translator, that's currently WIP. Using the spirv-link requires to compile the module for separate targets separately, so it's really just a way to get the basic functionality, but exposing it in a high-level language is definitely the end goal.

@pjaaskel
Copy link

@juliusikkala note that thanks to the possibility to pick the variant in a preprocessing step, the driver doesn't need to support the extension directly.

@juliusikkala
Copy link

Nice, I didn't think of that. Implementing this in a Vulkan Layer would be a great way to make this immediately available there as well!

@pjaaskel
Copy link

Yep, this was one of the design ideas in this extension and one of the reasons we didn't want to require complicated compiler passes to specialize. It should ease adoption a lot.

@againull againull merged commit 637476e into intel:sycl May 15, 2025
4 checks passed
@devshgraphicsprogramming

IMHO Anything thats not a full blown metaprogramming system, is but a bandaid.

We deal with it by just compiling a SPIR-V with multiple entry points.

Ideally the way that SPIR-V modules declare the capabilities and extensions should change, because right now if any entry point uses a capability you declare it for the whole module.

This leads to obvious problems (try linking/combining two SPIR-V entry points meant for vastly different capability GPUs).

As stated in my Vulkanised 2023 and 2024 presentations, what I'd REALLY appreciate is constant (as in SPIR-V constants and spec constants) function pointers (and decorate those with capabilities and extensions), then you could fill a constant struct will constant function pointers and gather your extensions and capabilities from walking the static call graph of some entry point (which would also be a constant function pointer).

@jzadnik jzadnik deleted the SPV_INTEL_function_variants branch June 24, 2025 06:21
@jzadnik
Copy link
Contributor Author

jzadnik commented Jun 24, 2025

@devshgraphicsprogramming You mean that if an entry point requires a capability/extension it will be implicitly declared for the whole module? It is ultimately up to the producer algorithm, but each entry point has an associated boolean spec constant, so when implicitly declaring a cap/ext, declare it as a conditional cap/ext using the same spec constant. In practice, this can be easily done by compiling a module for each architecture separately. In that case, each compiled module ends up with all the capabilities/extensions. These modules can then be linked with each getting its own boolean spec constant to be used as the Condition operand in instructions from this extensions as necessary.

What you're proposing sounds like being able to associate capabilities/extensions directly with the entry points, but I believe the same thing can also be achieved with extension. Note that you can combine spec constants with OpSpecConstantOp, so if, let's say, 2 out of 3 entry points use the same capability, you combine their spec constants with the OR operation and use that to annotate the conditional capability. So instead of using a struct, the hierarchy is encoded with spec constants, and it is indeed with an entry point rather than module granularity.

Walking a static call tree can be used to track which capabilities/extensions are used for which , but it would work only for code that doesn't use dynamic function pointers.

(Constant function pointers are a part of the SPV_INTEL_function_pointers extension btw.)

@devshgraphicsprogramming

It is ultimately up to the producer algorithm, but each entry point has an associated boolean spec constant, so when implicitly declaring a cap/ext, declare it as a conditional cap/ext using the same spec constant.

That just moves the problem into the application space or tooling.

My personal very biased take is that specialization constants are near useless, everything I do that requires that sort of flex I do through HLSL2021 (and soon C++ with vcc I hope) metaprogramming. TL;DR Spec constants don't let me programmatically codegen, e.g. iterations of FFTs, shared memory size, etc.

They also inhibit many optimizations/transformations, I have to trust the IHV driver/runtime actually does them after turning spec constants into constants.

But again I'm a Vulkan developer and not a SYCL one.

Also its a problem we've already solved, we take massive unity build SPIR-Vs and run SPIR-V Optimizers after specialization to reamove all entry points which aren't used by a Vulkan pipeline, then do several rounds of aggressive DCE, such that any types, intrinsics etc. requiring capabilities or extensions drop out.

Then we run the capability stripping pass to get what we want.

Walking a static call tree can be used to track which capabilities/extensions are used for which , but it would work only for code that doesn't use dynamic function pointers.

Yes I'm aware, but dynamic function pointers should match some signature (like in C, specifically WASM) , you can't just cast a (void)(float*) to a (int*)(void) thats UB in C if I remember correctly (makes sense cause calling conventions, stack frames, resetting, unwinding, etc.).

IMHO the caps should be part of the signature (kinda how storage class should be part of a type).

(Constant function pointers are a part of the SPV_INTEL_function_pointers extension btw.)

I know, but its for the Kernel environment, I really wish it was more cross vendor and supported in vulkan.

@juliusikkala
Copy link

TL;DR Spec constants don't let me programmatically codegen, e.g. iterations of FFTs, shared memory size, etc.

Just a quick note, GLSL and Slang do allow specifying shared memory size with a specialization constant, so it has to already be supported in SPIR-V. That specific problem is then probably just an HLSL problem.

@devshgraphicsprogramming

TL;DR Spec constants don't let me programmatically codegen, e.g. iterations of FFTs, shared memory size, etc.

Just a quick note, GLSL and Slang do allow specifying shared memory size with a specialization constant, so it has to already be supported in SPIR-V. That specific problem is then probably just an HLSL problem.

Nice to know.

What's the GLSL syntax for making complex OpSpecConstantOp chains?

e.g. to compute the amount of groupshared memory needed for a prefix sum, you need to do the geometric series up to a point.

@juliusikkala
Copy link

You can do basic arithmetic but AFAIK no loops or function calls unfortunately, so no complex codegen there :( In complex cases like that, I've just computed the necessary size on the host and passed it as a separate specialization constant (like SHARED_MEM_SIZE).

So basically you can do this:

layout (constant_id = 0) const uint BLOCK_SIZE;
shared float buf_a[(BLOCK_SIZE + 5) * 4];

but that's about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants