Skip to content

Commit 734df81

Browse files
tikikunggerganov
authored andcommitted
ggml : add ggml_cpu_has_avx_vnni() (ggml-org#4589)
* feat: add avx_vnni based on intel documents * ggml: add avx vnni based on intel document * llama: add avx vnni information display * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * docs: add more details about using oneMKL and oneAPI for intel processors * Update ggml.c Fix indentation upgate Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
1 parent a0f383b commit 734df81

File tree

5 files changed

+33
-8
lines changed

5 files changed

+33
-8
lines changed

README.md

+22-8
Original file line numberDiff line numberDiff line change
@@ -385,16 +385,30 @@ Building the program with BLAS support may lead to some performance improvements
385385
386386
Check [BLIS.md](docs/BLIS.md) for more information.
387387
388-
- #### Intel MKL
388+
- #### Intel oneMKL
389+
- Using manual oneAPI installation:
390+
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
391+
```bash
392+
mkdir build
393+
cd build
394+
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-runtime docker image, only required for manual installation
395+
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON
396+
cmake --build . --config Release
397+
```
389398
390-
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. You may also specify it by:
399+
- Using oneAPI docker image:
400+
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel docker container: [oneAPI-runtime](https://hub.docker.com/r/intel/oneapi-runtime)
391401
392-
```bash
393-
mkdir build
394-
cd build
395-
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
396-
cmake --build . --config Release
397-
```
402+
```bash
403+
mkdir build
404+
cd build
405+
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_NATIVE=ON
406+
cmake --build . --config Release
407+
```
408+
409+
Building through oneAPI compilers will make avx_vnni instruction set available for intel processors that do not support avx512 and avx512_vnni.
410+
411+
Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information.
398412
399413
- #### cuBLAS
400414

common/common.cpp

+1
Original file line numberDiff line numberDiff line change
@@ -1394,6 +1394,7 @@ void dump_non_result_info_yaml(FILE * stream, const gpt_params & params, const l
13941394
fprintf(stream, "build_number: %d\n", LLAMA_BUILD_NUMBER);
13951395
fprintf(stream, "cpu_has_arm_fma: %s\n", ggml_cpu_has_arm_fma() ? "true" : "false");
13961396
fprintf(stream, "cpu_has_avx: %s\n", ggml_cpu_has_avx() ? "true" : "false");
1397+
fprintf(stream, "cpu_has_avx_vnni: %s\n", ggml_cpu_has_avx_vnni() ? "true" : "false");
13971398
fprintf(stream, "cpu_has_avx2: %s\n", ggml_cpu_has_avx2() ? "true" : "false");
13981399
fprintf(stream, "cpu_has_avx512: %s\n", ggml_cpu_has_avx512() ? "true" : "false");
13991400
fprintf(stream, "cpu_has_avx512_vbmi: %s\n", ggml_cpu_has_avx512_vbmi() ? "true" : "false");

ggml.c

+8
Original file line numberDiff line numberDiff line change
@@ -19638,6 +19638,14 @@ int ggml_cpu_has_avx(void) {
1963819638
#endif
1963919639
}
1964019640

19641+
int ggml_cpu_has_avx_vnni(void) {
19642+
#if defined(__AVXVNNI__)
19643+
return 1;
19644+
#else
19645+
return 0;
19646+
#endif
19647+
}
19648+
1964119649
int ggml_cpu_has_avx2(void) {
1964219650
#if defined(__AVX2__)
1964319651
return 1;

ggml.h

+1
Original file line numberDiff line numberDiff line change
@@ -2198,6 +2198,7 @@ extern "C" {
21982198
//
21992199

22002200
GGML_API int ggml_cpu_has_avx (void);
2201+
GGML_API int ggml_cpu_has_avx_vnni (void);
22012202
GGML_API int ggml_cpu_has_avx2 (void);
22022203
GGML_API int ggml_cpu_has_avx512 (void);
22032204
GGML_API int ggml_cpu_has_avx512_vbmi(void);

llama.cpp

+1
Original file line numberDiff line numberDiff line change
@@ -10780,6 +10780,7 @@ const char * llama_print_system_info(void) {
1078010780

1078110781
s = "";
1078210782
s += "AVX = " + std::to_string(ggml_cpu_has_avx()) + " | ";
10783+
s += "AVX_VNNI = " + std::to_string(ggml_cpu_has_avx_vnni()) + " | ";
1078310784
s += "AVX2 = " + std::to_string(ggml_cpu_has_avx2()) + " | ";
1078410785
s += "AVX512 = " + std::to_string(ggml_cpu_has_avx512()) + " | ";
1078510786
s += "AVX512_VBMI = " + std::to_string(ggml_cpu_has_avx512_vbmi()) + " | ";

0 commit comments

Comments
 (0)