Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align UR_API fields (8 byte) for optimization create/move/copy structs on x64 cpus #17683

Open
wants to merge 1 commit into
base: sycl
Choose a base branch
from

Conversation

GermanAizek
Copy link

PR migrated from oneapi-src/unified-runtime#2747

Would you like to organize migration to aligned structures your Unified Runtime API for modern x64 processors? This should be guaranteed to lead to more frequent structs entry into CPU cache, which can greatly affect performance if aligned structures are used frequently. I hope that you (as Intel employees) know advantage this optimization method at the architectural level codebase and its inconveniences as stylistic ABI breakdown.

Very briefly, your API is not badly broken, I have changed only first and second field in all aligned structures.
(pNext and sType)

More info about technique:
https://stackoverflow.com/a/20882083
https://zijishi.xyz/post/optimization-technique/learning-to-use-data-alignment/
https://en.wikipedia.org/wiki/Data_structure_alignment

Affected API structures:

  • ur_image_desc_t 80 -> 72 bytes
  • ur_exp_command_buffer_update_value_arg_desc_t 48 -> 40 bytes
  • ur_exp_command_buffer_update_memobj_arg_desc_t 40 -> 32 bytes
  • ur_exp_command_buffer_update_pointer_arg_desc_t 40 -> 32 bytes
  • ur_sampler_desc_t 32 -> 24 bytes
  • ur_program_properties_t 32 -> 24 bytes
  • ur_exp_sampler_addr_modes_t 32 -> 24 bytes
  • ur_platform_native_properties_t 24 -> 16 bytes
  • ur_device_native_properties_t 24 -> 16 bytes
  • ur_context_properties_t 24 -> 16 bytes
  • ur_context_native_properties_t 24 -> 16 bytes
  • ur_buffer_channel_properties_t 24 -> 16 bytes
  • ur_buffer_alloc_location_properties_t 24 -> 16 bytes
  • ur_mem_native_properties_t 24 -> 16 bytes
  • ur_sampler_native_properties_t 24 -> 16 bytes
  • ur_usm_host_desc_t 24 -> 16 bytes
  • ur_usm_device_desc_t 24 -> 16 bytes
  • ur_usm_alloc_location_desc_t 24 -> 16 bytes
  • ur_usm_pool_desc_t 24 -> 16 bytes
  • ur_physical_mem_properties_t 24 -> 16 bytes
  • ur_program_native_properties_t 24 -> 16 bytes
  • ur_kernel_arg_mem_obj_properties_t 24 -> 16 bytes
  • ur_kernel_native_properties_t 24 -> 16 bytes
  • ur_queue_properties_t 24 -> 16 bytes
  • ur_queue_index_properties_t 24 -> 16 bytes
  • ur_queue_native_properties_t 24 -> 16 bytes
  • ur_event_native_properties_t 24 -> 16 bytes
  • ur_exp_async_usm_alloc_properties_t 24 -> 16 bytes
  • ur_exp_file_descriptor_t 24 -> 16 bytes
  • ur_exp_sampler_cubemap_properties_t 24 -> 16 bytes
  • ur_exp_command_buffer_desc_t 24 -> 16 bytes
  • ur_exp_enqueue_ext_properties_t 24 -> 16 bytes
  • ur_exp_enqueue_native_command_properties_t 24 -> 16 bytes

…s on x64 cpus

Affected API structures:
- ur_image_desc_t 80 -> 72 bytes
- ur_exp_command_buffer_update_value_arg_desc_t 48 -> 40 bytes
- ur_exp_command_buffer_update_memobj_arg_desc_t 40 -> 32 bytes
- ur_exp_command_buffer_update_pointer_arg_desc_t 40 -> 32 bytes
- ur_sampler_desc_t 32 -> 24 bytes
- ur_program_properties_t 32 -> 24 bytes
- ur_exp_sampler_addr_modes_t 32 -> 24 bytes
- ur_platform_native_properties_t 24 -> 16 bytes
- ur_device_native_properties_t 24 -> 16 bytes
- ur_context_properties_t 24 -> 16 bytes
- ur_context_native_properties_t 24 -> 16 bytes
- ur_buffer_channel_properties_t 24 -> 16 bytes
- ur_buffer_alloc_location_properties_t 24 -> 16 bytes
- ur_mem_native_properties_t 24 -> 16 bytes
- ur_sampler_native_properties_t 24 -> 16 bytes
- ur_usm_host_desc_t 24 -> 16 bytes
- ur_usm_device_desc_t 24 -> 16 bytes
- ur_usm_alloc_location_desc_t 24 -> 16 bytes
- ur_usm_pool_desc_t 24 -> 16 bytes
- ur_physical_mem_properties_t 24 -> 16 bytes
- ur_program_native_properties_t 24 -> 16 bytes
- ur_kernel_arg_mem_obj_properties_t 24 -> 16 bytes
- ur_kernel_native_properties_t 24 -> 16 bytes
- ur_queue_properties_t 24 -> 16 bytes
- ur_queue_index_properties_t 24 -> 16 bytes
- ur_queue_native_properties_t 24 -> 16 bytes
- ur_event_native_properties_t 24 -> 16 bytes
- ur_exp_async_usm_alloc_properties_t 24 -> 16 bytes
- ur_exp_file_descriptor_t 24 -> 16 bytes
- ur_exp_sampler_cubemap_properties_t 24 -> 16 bytes
- ur_exp_command_buffer_desc_t 24 -> 16 bytes
- ur_exp_enqueue_ext_properties_t 24 -> 16 bytes
- ur_exp_enqueue_native_command_properties_t 24 -> 16 bytes

Signed-off-by: Herman Semenov <[email protected]>
@GermanAizek GermanAizek requested a review from a team as a code owner March 27, 2025 14:55
@aarongreig
Copy link
Contributor

Thanks for the contribution, we're discussing this internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants