-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[SYCL] Overcoming workaround for mmap() allocation on Windows and remove useless wait #13482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@s-Nick Could you clear other code change in this PR? |
The default queue is in order so many synchronization with the host are useless.
After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary.
SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag
0e1009f
to
083f56b
Compare
All wait() in SYCL backend have been confirmed with the value. |
Thank your for your review @NeoZhangJianyu. I modified the description adding many logs of |
Note: | ||
|
||
- When using SYCL backend, there would be hang issue in some cases. Please set `--mmp 0`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this still exist on Linux ? , and hence we still have the workaround for Linux
Maybe we mention linux has it at the moment and windows does not ?
This PR removes the usage of a workaround for mmap bug on some Intel GPUs on Linux. The bug is not present on Windows, so there is no meaning of having it in place.
This causes a small split in the codebase according to the OS in use, but it shows good performance improvements.
Moreover, it also removes some
wait()
on copy that are not necessary in SYCL backend, due to the usage of in_order queues.The work introduced here is based on #13109
N.B All numbers assessed with
GGML_SYCL_DISABLE_OPT=0
Lunar Lake's performance (this PR)
build: 0e1009f (5334)
Lunar Lake's performance (#13109)
build: f7e7d2a (5331)
Battlemage(B580) performance (this PR)
build: 0e1009f (5334)
Battlemage(B580) performance(#13109 )
build: f7e7d2a (5331)
LOG for different GPUs on Linux
In this section there are many logs about this patch working on Linux without affecting performance and or correctness.
Lunar Lake
lnl-test.txt
lnl_bench.txt
master_lnl.txt
Battlemage B580
bmg-test.txt
bmg_bench.txt
master_bmg.txt
PVC
pvc-test.txt
pvc_bench.txt
master_pvc.txt
ARC A770
arc-test.txt
arc_bench.txt
master_arc.txt
llama-cli output
bmg_cli_output.txt
lnl_cli_output.txt
pvc_cli_output.txt
arc_cli_output.txt