Skip to content

Commit 5ca7694

Browse files
tye1jingxu10huiyan2021
authored
Doc update for IPEX v2.5.10 (#5119)
* Update deepspeed version in requirement.txt * Fix IPEX version in compile_bundle.bat/compile_bundle.sh * Correct typo in transformers * Update releases.md and known_issues.md * update basekit components version in helper script * add model list for BMG in README and llm/inference Readme * change BMG to B580 * align footnotes * remove === --------- Co-authored-by: Jing Xu <[email protected]> Co-authored-by: Huiyan2021 <[email protected]>
1 parent 19f53fa commit 5ca7694

File tree

8 files changed

+93
-123
lines changed

8 files changed

+93
-123
lines changed

README.md

Lines changed: 24 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
<div align="center">
22

33
Intel® Extension for PyTorch*
4-
===========================
4+
=============================
55

6-
[💻Examples](./docs/tutorials/examples.md)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖CPU Documentations](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖GPU Documentations](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)
76
</div>
87

9-
8+
**CPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/llm) <br>
9+
**GPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main/examples/gpu/llm)<br>
1010

1111
Intel® Extension for PyTorch\* extends PyTorch\* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel X<sup>e</sup> Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, through PyTorch\* `xpu` device, Intel® Extension for PyTorch\* provides easy GPU acceleration for Intel discrete GPUs with PyTorch\*.
1212

@@ -21,29 +21,31 @@ The extension can be loaded as a Python module for Python programs or linked as
2121

2222
## Large Language Models (LLMs) Optimization
2323

24-
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/llm) for details.
24+
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/llm) and [LLM optimizations GPU](./examples/gpu/llm) for details.
2525

2626
### Optimized Model List
2727

2828
#### LLM Inference
2929

30-
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Arc™ A-Series Graphics (A770) |
31-
|---|:---:|:---:|:---:|:---:|:---:|
32-
|Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |🟩| 🟩|🟩|🟩|
33-
|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B" |🟩| 🟩|🟩|🟩|
34-
|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct" |🟩| 🟩|🟩|🟩|
35-
|GPT-J| "EleutherAI/gpt-j-6b" | 🟩 | 🟩 |🟩 | 🟩|
36-
|Qwen|"Qwen/Qwen-7B"|🟩 | 🟩 |🟩 | 🟩|
37-
|OPT|"facebook/opt-6.7b", "facebook/opt-30b"| 🟩 | 🟥 |🟩 | 🟥 |
38-
|Bloom|"bigscience/bloom-7b1", "bigscience/bloom"| 🟩 | 🟥 |🟩 | 🟥 |
39-
|ChatGLM3-6B|"THUDM/chatglm3-6b"| 🟩 | 🟥 |🟩 | 🟥 |
40-
|Baichuan2-13B|"baichuan-inc/Baichuan2-13B-Chat"| 🟩 | 🟥 |🟩 | 🟥 |
30+
| MODEL FAMILY | Verified < MODEL ID > (Huggingface hub)| FP16 | Weight only quantization INT4 | Optimized on Intel® Data Center GPU Max Series (1550/1100) | Optimized on Intel® Arc™ A-Series Graphics (A770) | Optimized on Intel® Arc™ B-Series Graphics (B580) |
31+
|---|:---:|:---:|:---:|:---:|:---:|:---:|
32+
|Llama 2| "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf", "meta-llama/Llama-2-70b-hf" |🟩| 🟩|🟩|🟩|$🟩^1$|
33+
|Llama 3| "meta-llama/Meta-Llama-3-8B", "meta-llama/Meta-Llama-3-70B" |🟩| 🟩|🟩|🟩|$🟩^2$|
34+
|Phi-3 mini| "microsoft/Phi-3-mini-128k-instruct", "microsoft/Phi-3-mini-4k-instruct" |🟩| 🟩|🟩|🟩|$🟩^3$|
35+
|GPT-J| "EleutherAI/gpt-j-6b" | 🟩 | 🟩 |🟩 | 🟩||
36+
|Qwen|"Qwen/Qwen2-7B"|🟩 | 🟩 |🟩 | 🟩||
37+
|Qwen|"Qwen/Qwen2-7B-Instruct"| | | | |🟩|
38+
|OPT|"facebook/opt-6.7b", "facebook/opt-30b"| 🟩 | 🟥 |🟩 | 🟥 ||
39+
|Bloom|"bigscience/bloom-7b1", "bigscience/bloom"| 🟩 | 🟥 |🟩 | 🟥 ||
40+
|ChatGLM3-6B|"THUDM/chatglm3-6b"| 🟩 | 🟥 |🟩 | 🟥 ||
41+
|Baichuan2-13B|"baichuan-inc/Baichuan2-13B-Chat"| 🟩 | 🟥 |🟩 | 🟥 ||
4142

4243
| Benchmark mode | FP16 | Weight only quantization INT4 |
4344
|---|:---:|:---:|
4445
|Single instance | 🟩 | 🟩 |
4546
| Distributed (autotp) | 🟩 | 🟥 |
4647

48+
4749
#### LLM fine-tuning
4850

4951
**Note**:
@@ -67,82 +69,16 @@ In the current technological landscape, Generative AI (GenAI) workloads and mode
6769
- 🟩 signifies that it is supported.
6870

6971
- 🟥 signifies that it is not supported yet.
72+
73+
- 1: signifies that Llama-2-7b-hf is verified.
7074

75+
- 2: signifies that Meta-Llama-3-8B is verified.
76+
77+
- 3: signifies that Phi-3-mini-4k-instruct is verified.
7178

72-
## Installation
73-
74-
### CPU version
75-
76-
You can use either of the following 2 commands to install Intel® Extension for PyTorch\* CPU version.
77-
78-
```bash
79-
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
80-
python -m pip install intel-extension-for-pytorch --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
81-
# for PRC user, you can check with the following link
82-
python -m pip install intel-extension-for-pytorch --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/cn/
83-
```
84-
85-
**Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check more detailed information via the URL below.
86-
87-
More installation methods can be found at [CPU Installation Guide](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
88-
89-
Compilation instruction of the latest CPU code base `main` branch can be found in the session Package `source` at [CPU Installation Guide](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
90-
91-
### GPU version
92-
93-
You can install Intel® Extension for PyTorch\* for GPU via command below.
94-
95-
```bash
96-
python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
97-
# for PRC user, you can check with the following link
98-
python -m pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
99-
```
100-
101-
More installation methods can be found at [GPU Installation Guide](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html).
102-
103-
Compilation instruction of the latest GPU code base `xpu-main` branch can be found in the session Package `source` at [GPU Installation Guide](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html).
104-
105-
## Getting Started
106-
107-
Minor code changes are required for users to get start with Intel® Extension for PyTorch\*. Both PyTorch imperative mode and TorchScript mode are supported. You just need to import Intel® Extension for PyTorch\* package and apply its optimize function against the model object. If it is a training workload, the optimize function also needs to be applied against the optimizer object.
108-
109-
The following code snippet shows an inference code with FP32 data type. More examples on CPU, including training and C++ examples, are available at [CPU Example page](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/examples.html). More examples on GPU are available at [GPU Example page](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/examples.html).
110-
111-
### Inference on CPU
112-
113-
```python
114-
import torch
115-
import torchvision.models as models
116-
117-
model = models.resnet50(pretrained=True)
118-
model.eval()
119-
data = torch.rand(1, 3, 224, 224)
120-
121-
import intel_extension_for_pytorch as ipex
122-
model = ipex.optimize(model)
123-
124-
with torch.no_grad():
125-
model(data)
126-
```
127-
128-
### Inference on GPU
129-
130-
```python
131-
import torch
132-
import torchvision.models as models
133-
134-
model = models.resnet50(pretrained=True)
135-
model.eval()
136-
data = torch.rand(1, 3, 224, 224)
137-
138-
import intel_extension_for_pytorch as ipex
139-
model = model.to('xpu')
140-
data = data.to('xpu')
141-
model = ipex.optimize(model)
79+
## Support
14280

143-
with torch.no_grad():
144-
model(data)
145-
```
81+
The team tracks bugs and enhancement requests using [GitHub issues](https://github.com/intel/intel-extension-for-pytorch/issues/). Before submitting a suggestion or bug report, search the existing GitHub issues to see if your issue has already been reported.
14682

14783
## License
14884

docs/tutorials/known_issues.md

Lines changed: 5 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -18,26 +18,20 @@ Troubleshooting
1818
```bash
1919
ImportError: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev
2020
```
21-
- **Cause**: DPC++ does not support `_GLIBCXX_USE_CXX11_ABI=0`, Intel® Extension for PyTorch\* is always compiled with `_GLIBCXX_USE_CXX11_ABI=1`. This symbol undefined issue appears when PyTorch\* is
21+
- **Cause**: Intel® Extension for PyTorch\* is compiled with `_GLIBCXX_USE_CXX11_ABI=1`. This symbol undefined issue appears when PyTorch\* is
2222
compiled with `_GLIBCXX_USE_CXX11_ABI=0`.
2323
- **Solution**: Pass `export GLIBCXX_USE_CXX11_ABI=1` and compile PyTorch\* with particular compiler which supports `_GLIBCXX_USE_CXX11_ABI=1`. We recommend using prebuilt wheels
24-
in [download server](https:// developer.intel.com/ipex-whl-stable-xpu) to avoid this issue.
25-
- **Problem**: `-997 runtime error` when running some AI models on Intel® Arc™ A-Series GPUs.
26-
- **Cause**: Some of the `-997 runtime error` are actually out-of-memory errors. As Intel® Arc™ A-Series GPUs have less device memory than Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU
27-
Max Series, running some AI models on them may trigger out-of-memory errors and cause them to report failure such as `-997 runtime error` most likely. This is expected. Memory usage optimization is a work in progress to allow Intel® Arc™ A-Series GPUs to support more AI models.
24+
in [download server](https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/) to avoid this issue.
25+
- **Problem**: `-997 runtime error` when running some AI models on Intel® Arc™ Graphics family.
26+
- **Cause**: Some of the `-997 runtime error` are actually out-of-memory errors. As Intel® Arc™ Graphics GPUs have less device memory than Intel® Data Center GPU Flex Series 170 and Intel® Data Center GPU
27+
Max Series, running some AI models on them may trigger out-of-memory errors and cause them to report failure such as `-997 runtime error` most likely. This is expected. Memory usage optimization is working in progress to allow Intel® Arc™ Graphics GPUs to support more AI models.
2828
- **Problem**: Building from source for Intel® Arc™ A-Series GPUs fails on WSL2 without any error thrown.
2929
- **Cause**: Your system probably does not have enough RAM, so Linux kernel's Out-of-memory killer was invoked. You can verify this by running `dmesg` on bash (WSL2 terminal).
3030
- **Solution**: If the OOM killer had indeed killed the build process, then you can try increasing the swap-size of WSL2, and/or decreasing the number of parallel build jobs with the environment
3131
variable `MAX_JOBS` (by default, it's equal to the number of logical CPU cores. So, setting `MAX_JOBS` to 1 is a very conservative approach that would slow things down a lot).
3232
- **Problem**: Some workloads terminate with an error `CL_DEVICE_NOT_FOUND` after some time on WSL2.
3333
- **Cause**: This issue is due to the [TDR feature](https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys#tdrdelay) on Windows.
3434
- **Solution**: Try increasing TDRDelay in your Windows Registry to a large value, such as 20 (it is 2 seconds, by default), and reboot.
35-
- **Problem**: Random bad termination after AI model convergence test (>24 hours) finishes.
36-
- **Cause**: This is a random issue when some AI model convergence test execution finishes. It is not user-friendly as the model execution ends ungracefully.
37-
- **Solution**: Kill the process after the convergence test finished, or use checkpoints to divide the convergence test into several phases and execute separately.
38-
- **Problem**: Runtime error `munmap_chunk(): invalid pointer` when executing some scaling LLM workloads on Intel® Data Center GPU Max Series platform
39-
- **Cause**: Users targeting GPU use, must set the environment variable ‘FI_HMEM=system’ to disable GPU support in underlying libfabric as Intel® MPI Library 2021.13.1 will offload the GPU support instead. This avoids a potential bug in libfabric GPU initialization.
40-
- **Solution**: Set the environment variable ‘FI_HMEM=system’ to workaround this issue when encounter.
4135

4236
## Library Dependencies
4337

@@ -92,7 +86,6 @@ Troubleshooting
9286
conda activate
9387
```
9488

95-
9689
- **Problem**: If you encounter issues Runtime error related to C++ compiler with `torch.compile`. Runtime Error: Failed to find C++ compiler. Please specify via CXX environment variable.
9790
- **Cause**: Not install and activate DPC++/C++ Compiler correctly.
9891
- **Solution**: [Install DPC++/C++ Compiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html) and activate it by following commands.
@@ -111,7 +104,6 @@ Troubleshooting
111104
pip install --pre pytorch-triton-xpu==3.1.0+91b14bf559 --index-url https://download.pytorch.org/whl/nightly/xpu
112105
```
113106

114-
115107
- **Problem**: LoweringException: ImportError: cannot import name 'intel' from 'triton._C.libtriton'
116108
- **Cause**: Installing Triton causes pytorch-triton-xpu to stop working.
117109
- **Solution**: Resolve the issue with following command:
@@ -125,18 +117,9 @@ Troubleshooting
125117
pip install --pre pytorch-triton-xpu==3.1.0+91b14bf559 --index-url https://download.pytorch.org/whl/nightly/xpu
126118
```
127119

128-
129120
## Performance Issue
130121

131122
- **Problem**: Extended durations for data transfers from the host system to the device (H2D) and from the device back to the host system (D2H).
132123
- **Cause**: Absence of certain Dynamic Kernel Module Support (DKMS) packages on Ubuntu 22.04 or earlier versions.
133124
- **Solution**: For those running Ubuntu 22.04 or below, it's crucial to follow all the recommended installation procedures, including those labeled as [optional](https://dgpu-docs.intel.com/driver/client/overview.html#optional-out-of-tree-kernel-mode-driver-install). These steps are likely necessary to install the missing DKMS packages and ensure your system is functioning optimally. The Kernel Mode Driver (KMD) package that addresses this issue has been integrated into the Linux kernel for Ubuntu 23.04 and subsequent releases.
134125
135-
## Unit Test
136-
137-
- Unit test failures on Intel® Data Center GPU Flex Series 170
138-
139-
The following unit test fails on Intel® Data Center GPU Flex Series 170 but the same test case passes on Intel® Data Center GPU Max Series. The root cause of the failure is under investigation.
140-
- `test_weight_norm.py::TestNNMethod::test_weight_norm_differnt_type`
141-
142-

0 commit comments

Comments
 (0)