[CUDA] Exclude lean attention from linux build (#25203)

tianleiwu · web-flow · commit f3c18ed1b070 · 2025-06-27T15:57:23.000-07:00
### Description

Exclude lean attention from linux build.

### Motivation and Context

Previously, lean attention was built in Linux but not in Windows.
It is not used Gen AI so far, so we disable it in build to reduce binary
size and build time.
diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt
@@ -93,7 +93,7 @@ option(onnxruntime_BUILD_BENCHMARKS "Build ONNXRuntime micro-benchmarks" OFF)
 option(onnxruntime_USE_VSINPU "Build with VSINPU support" OFF)
 
 cmake_dependent_option(onnxruntime_USE_FLASH_ATTENTION "Build flash attention kernel for scaled dot product attention" ON "onnxruntime_USE_CUDA" OFF)
-cmake_dependent_option(onnxruntime_USE_LEAN_ATTENTION "Build lean attention kernel for scaled dot product attention" ON "onnxruntime_USE_CUDA; NOT WIN32" OFF)
+option(onnxruntime_USE_LEAN_ATTENTION "Build lean attention kernel for scaled dot product attention" OFF)
 option(onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION "Build memory efficient attention kernel for scaled dot product attention" ON)
 
 option(onnxruntime_BUILD_FOR_NATIVE_MACHINE "Enable this option for turning on optimization specific to this machine" OFF)