Skip to content

Commit 9318fae

Browse files
jingxu10XiaobingSuperzhuhaozhechunyuan-w
authored andcommitted
tutorials (#375)
* doc fine tune * add example for ddp, edit c++ example * 1st review * corrected package name in installation guide * add model zoo to examples page * updata int8 doc (#377) * updata int8 doc * version 2 * modify optimizers optiization (#378) * review 20211130 * add INT8 fusion patterns and API in graph_optimization (#380) * add INT8 fusion patterns and API in graph_optimization * add integration with oneDNN graph * Add BN folding for graph_optimization * tutorials for v1.10.0 * int8.md fine tune * finalized for v1.10.0 release Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: zhuhaozhe <[email protected]> Co-authored-by: Chunyuan WU <[email protected]>
1 parent 11dbc83 commit 9318fae

25 files changed

+1132
-582
lines changed

.github/workflows/publish.yml

+31-31
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,33 @@
11
name: Publish
22

3-
on:
4-
push:
5-
branches:
6-
- ghpapers_style
7-
8-
jobs:
9-
build:
10-
11-
runs-on: ubuntu-latest
12-
13-
steps:
14-
- uses: actions/checkout@v1
15-
- name: Install dependencies
16-
run: |
17-
export PATH="$HOME/.local/bin:$PATH"
18-
sudo apt-get install -y python3-setuptools
19-
pip3 install --user --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
20-
pip3 install --user -r requirements.txt
21-
python3 setup.py install
22-
pip3 install --user -r docs/requirements.txt
23-
- name: Build the docs
24-
run: |
25-
export PATH="$HOME/.local/bin:$PATH"
26-
cd docs
27-
make html
28-
- name: Push the docs
29-
uses: peaceiris/actions-gh-pages@v3
30-
with:
31-
github_token: ${{ secrets.GITHUB_TOKEN }}
32-
publish_dir: docs/_build/html
33-
publish_branch: gh-pages
3+
#on:
4+
# push:
5+
# branches:
6+
# - gh-pages
7+
#
8+
#jobs:
9+
# build:
10+
#
11+
# runs-on: ubuntu-latest
12+
#
13+
# steps:
14+
# - uses: actions/checkout@v1
15+
# - name: Install dependencies
16+
# run: |
17+
# export PATH="$HOME/.local/bin:$PATH"
18+
# sudo apt-get install -y python3-setuptools
19+
# pip3 install --user torch=1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
20+
# pip3 install --user -r requirements.txt
21+
# python3 setup.py install
22+
# pip3 install --user -r docs/requirements.txt
23+
# - name: Build the docs
24+
# run: |
25+
# export PATH="$HOME/.local/bin:$PATH"
26+
# cd docs
27+
# make html
28+
# - name: Push the docs
29+
# uses: peaceiris/actions-gh-pages@v3
30+
# with:
31+
# github_token: ${{ secrets.GITHUB_TOKEN }}
32+
# publish_dir: docs/_build/html
33+
# publish_branch: gh-pages

README.md

+21-358
Large diffs are not rendered by default.

docs/index.rst

+2-3
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Welcome to Intel® Extension for PyTorch* documentation!
88

99
Intel® Extension for PyTorch* extends PyTorch with optimizations for extra performance boost on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).
1010

11-
Intel® Extension for PyTorch* is structured as the following figure. It is a runtime extension. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`. It covers optimizations for both imperative mode and graph mode. Optimized operators and kernels are registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel hardware. During execution, Intel® Extension for PyTorch* intercepts invocation of ATen operators, and replace the original ones with these optimized ones. In graph mode, further operator fusions are applied manually by Intel engineers or through a tool named *oneDNN Graph* to reduce operator/kernel invocation overheads, and thus increase performance.
11+
Intel® Extension for PyTorch* is structured as the following figure. It is loaded as a Python module for Python programs or linked as a C++ library for C++ programs. Users can enable it dynamically in script by importing `intel_extension_for_pytorch`. It covers optimizations for both imperative mode and graph mode. Optimized operators and kernels are registered through PyTorch dispatching mechanism. These operators and kernels are accelerated from native vectorization feature and matrix calculation feature of Intel hardware. During execution, Intel® Extension for PyTorch* intercepts invocation of ATen operators, and replace the original ones with these optimized ones. In graph mode, further operator fusions are applied manually by Intel engineers or through a tool named *oneDNN Graph* to reduce operator/kernel invocation overheads, and thus increase performance.
1212

1313
.. image:: ../images/intel_extension_for_pytorch_structure.png
1414
:width: 800
@@ -24,8 +24,7 @@ Intel® Extension for PyTorch* has been released as an open–source project at
2424
:maxdepth: 1
2525

2626
tutorials/features
27-
tutorials/notices
28-
tutorials/release_notes
27+
tutorials/releases
2928
tutorials/installation
3029
tutorials/examples
3130
tutorials/api_doc

docs/tutorials/blogs_publications.md

+2
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,6 @@ Blogs & Publications
33

44
* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
55
* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
6+
* *Note*: APIs mentioned in it are deprecated.
67
* [Scaling up BERT-like model Inference on modern CPU - Part 1 by the launcher of the extension](https://huggingface.co/blog/bert-cpu-scaling-part-1)
8+
* [KT Optimizes Performance for Personalized Text-to-Speech](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/KT-Optimizes-Performance-for-Personalized-Text-to-Speech/post/1337757)

docs/tutorials/contribution.md

-9
Original file line numberDiff line numberDiff line change
@@ -93,15 +93,6 @@ In case you want to reinstall, make sure that you uninstall Intel® Extension fo
9393
ENV_KEY1=ENV_VAL1[, ENV_KEY2=ENV_VAL2]* python setup.py develop
9494
```
9595

96-
## Codebase structure
97-
98-
* [torch_ipex/csrc](https://github.com/intel/intel-extension-for-pytorch/tree/master/torch_ipex/csrc) - C++ library for Intel® Extension for PyTorch\*
99-
* [intel_extension_for_pytorch](https://github.com/intel/intel-extension-for-pytorch/tree/master/intel_extension_for_pytorch) - The actual Intel® Extension for PyTorch\* library. Everything that is not in [csrc](https://github.com/intel/intel-extension-for-pytorch/tree/master/torch_ipex/csrc) is a Python module, following the PyTorch Python frontend module structure.
100-
* [tools](https://github.com/intel/intel-extension-for-pytorch/tree/master/tools) -
101-
* [tests](https://github.com/intel/intel-extension-for-pytorch/tree/master/tests) - Python unit tests for Intel® Extension for PyTorch\* Python frontend.
102-
* [cpu](https://github.com/intel/intel-extension-for-pytorch/tree/master/tests/cpu) -
103-
* [cpp](https://github.com/intel/intel-extension-for-pytorch/tree/master/tests/cpu/cpp) - C++ unit tests for Intel® Extension for PyTorch\* C++ frontend.
104-
10596
## Unit testing
10697

10798
### Python Unit Testing

docs/tutorials/examples.md

+143-23
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ output = model(data)
3030

3131
#### Complete - Float32
3232

33-
3433
```
3534
import torch
3635
import torchvision
@@ -128,7 +127,69 @@ torch.save({
128127

129128
### Distributed Training
130129

131-
Distributed training with PyTorch DDP is accelerated by oneAPI Collective Communications Library Bindings for Pytorch\* (oneCCL Bindings for Pytorch\*). More detailed information and examples are available at its [Github repo](https://github.com/intel/torch-ccl).
130+
Distributed training with PyTorch DDP is accelerated by oneAPI Collective Communications Library Bindings for Pytorch\* (oneCCL Bindings for Pytorch\*). The extension supports FP32 and BF16 data types. More detailed information and examples are available at its [Github repo](https://github.com/intel/torch-ccl).
131+
132+
**Note:** When performing distributed training with BF16 data type, please use oneCCL Bindings for Pytorch\*. Due to a PyTorch limitation, distributed training with BF16 data type with Intel® Extension for PyTorch\* is not supported.
133+
134+
```
135+
import os
136+
import torch
137+
import torch.distributed as dist
138+
import torchvision
139+
import torch_ccl
140+
import intel_extension_for_pytorch as ipex
141+
142+
LR = 0.001
143+
DOWNLOAD = True
144+
DATA = 'datasets/cifar10/'
145+
146+
transform = torchvision.transforms.Compose([
147+
torchvision.transforms.Resize((224, 224)),
148+
torchvision.transforms.ToTensor(),
149+
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
150+
])
151+
train_dataset = torchvision.datasets.CIFAR10(
152+
root=DATA,
153+
train=True,
154+
transform=transform,
155+
download=DOWNLOAD,
156+
)
157+
train_loader = torch.utils.data.DataLoader(
158+
dataset=train_dataset,
159+
batch_size=128
160+
)
161+
162+
os.environ['MASTER_ADDR'] = '127.0.0.1'
163+
os.environ['MASTER_PORT'] = '29500'
164+
os.environ['RANK'] = os.environ.get('PMI_RANK', 0)
165+
os.environ['WORLD_SIZE'] = os.environ.get('PMI_SIZE', 1)
166+
dist.init_process_group(
167+
backend='ccl',
168+
init_method='env://'
169+
)
170+
171+
model = torchvision.models.resnet50()
172+
criterion = torch.nn.CrossEntropyLoss()
173+
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
174+
model.train()
175+
model, optimizer = ipex.optimize(model, optimizer=optimizer)
176+
177+
model = torch.nn.parallel.DistributedDataParallel(model)
178+
179+
for batch_idx, (data, target) in enumerate(train_loader):
180+
optimizer.zero_grad()
181+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
182+
data = data.to(memory_format=torch.channels_last)
183+
output = model(data)
184+
loss = criterion(output, target)
185+
loss.backward()
186+
optimizer.step()
187+
print('batch_id: {}'.format(batch_idx))
188+
torch.save({
189+
'model_state_dict': model.state_dict(),
190+
'optimizer_state_dict': optimizer.state_dict(),
191+
}, 'checkpoint.pth')
192+
```
132193

133194
## Inference
134195

@@ -148,7 +209,7 @@ data = torch.rand(1, 3, 224, 224)
148209
149210
import intel_extension_for_pytorch as ipex
150211
model = model.to(memory_format=torch.channels_last)
151-
model = ipex.optimize(model, dtype=torch.float32, level='O1')
212+
model = ipex.optimize(model)
152213
data = data.to(memory_format=torch.channels_last)
153214
154215
with torch.no_grad():
@@ -170,7 +231,7 @@ seq_length = 512
170231
data = torch.randint(vocab_size, size=[batch_size, seq_length])
171232
172233
import intel_extension_for_pytorch as ipex
173-
model = ipex.optimize(model, dtype=torch.float32, level="O1")
234+
model = ipex.optimize(model)
174235
175236
with torch.no_grad():
176237
model(data)
@@ -190,7 +251,7 @@ data = torch.rand(1, 3, 224, 224)
190251
191252
import intel_extension_for_pytorch as ipex
192253
model = model.to(memory_format=torch.channels_last)
193-
model = ipex.optimize(model, dtype=torch.float32, level='O1')
254+
model = ipex.optimize(model)
194255
data = data.to(memory_format=torch.channels_last)
195256
196257
with torch.no_grad():
@@ -216,7 +277,7 @@ seq_length = 512
216277
data = torch.randint(vocab_size, size=[batch_size, seq_length])
217278
218279
import intel_extension_for_pytorch as ipex
219-
model = ipex.optimize(model, dtype=torch.float32, level="O1")
280+
model = ipex.optimize(model)
220281
221282
with torch.no_grad():
222283
d = torch.randint(vocab_size, size=[batch_size, seq_length])
@@ -242,7 +303,7 @@ data = torch.rand(1, 3, 224, 224)
242303
243304
import intel_extension_for_pytorch as ipex
244305
model = model.to(memory_format=torch.channels_last)
245-
model = ipex.optimize(model, dtype=torch.bfloat16, level='O1')
306+
model = ipex.optimize(model, dtype=torch.bfloat16)
246307
data = data.to(memory_format=torch.channels_last)
247308
248309
with torch.no_grad():
@@ -265,7 +326,7 @@ seq_length = 512
265326
data = torch.randint(vocab_size, size=[batch_size, seq_length])
266327
267328
import intel_extension_for_pytorch as ipex
268-
model = ipex.optimize(model, dtype=torch.bfloat16, level="O1")
329+
model = ipex.optimize(model, dtype=torch.bfloat16)
269330
270331
with torch.no_grad():
271332
with torch.cpu.amp.autocast():
@@ -286,7 +347,7 @@ data = torch.rand(1, 3, 224, 224)
286347
287348
import intel_extension_for_pytorch as ipex
288349
model = model.to(memory_format=torch.channels_last)
289-
model = ipex.optimize(model, dtype=torch.bfloat16, level='O1')
350+
model = ipex.optimize(model, dtype=torch.bfloat16)
290351
data = data.to(memory_format=torch.channels_last)
291352
292353
with torch.no_grad():
@@ -312,7 +373,7 @@ seq_length = 512
312373
data = torch.randint(vocab_size, size=[batch_size, seq_length])
313374
314375
import intel_extension_for_pytorch as ipex
315-
model = ipex.optimize(model, dtype=torch.bfloat16, level="O1")
376+
model = ipex.optimize(model, dtype=torch.bfloat16)
316377
317378
with torch.no_grad():
318379
with torch.cpu.amp.autocast():
@@ -349,13 +410,12 @@ for d in calibration_data_loader():
349410
model(d)
350411
conf.save('int8_conf.json', default_recipe=True)
351412
model = ipex.quantization.convert(model, conf, torch.rand(<shape>))
352-
353-
with torch.no_grad():
354-
model(data)
355413
```
356414

357415
#### Deployment
358416

417+
##### Imperative Mode
418+
359419
```
360420
import torch
361421
@@ -371,15 +431,31 @@ with torch.no_grad():
371431
model(data)
372432
```
373433

434+
##### Graph Mode
435+
436+
```
437+
import torch
438+
import intel_extension_for_pytorch as ipex
439+
440+
model = torch.jit.load('<INT8 model file>')
441+
model.eval()
442+
data = torch.rand(<shape>)
443+
444+
with torch.no_grad():
445+
model(data)
446+
```
447+
374448
## C++
375449

376450
To work with libtorch, C++ library of PyTorch, Intel® Extension for PyTorch\* provides its C++ dynamic library as well. The C++ library is supposed to handle inference workload only, such as service deployment. For regular development, please use Python interface. Comparing to usage of libtorch, no specific code changes are required, except for converting input data into channels last data format. Compilation follows the recommended methodology with CMake. Detailed instructions can be found in [PyTorch tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html#depending-on-libtorch-and-building-the-application).
377451

378452
During compilation, Intel optimizations will be activated automatically once C++ dynamic library of Intel® Extension for PyTorch\* is linked.
379453

454+
The example code below works for all data types.
455+
380456
**example-app.cpp**
381457

382-
```
458+
```cpp
383459
#include <torch/script.h>
384460
#include <iostream>
385461
#include <memory>
@@ -405,25 +481,69 @@ int main(int argc, const char* argv[]) {
405481
406482
**CMakeList.txt**
407483
408-
```
484+
```cmake
409485
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
410486
project(example-app)
411487
412-
find_package(Torch REQUIRED)
413-
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} -Wl,--no-as-needed")
488+
find_package(intel-ext-pt-cpu REQUIRED)
414489
415490
add_executable(example-app example-app.cpp)
416-
# Link the binary against the C++ dynamic library file of Intel® Extension for PyTorch*
417-
target_link_libraries(example-app "${TORCH_LIBRARIES}" "${INTEL_EXTENSION_FOR_PYTORCH_PATH}/lib/libintel-ext-pt-cpu.so")
491+
target_link_libraries(example-app "${TORCH_LIBRARIES}")
418492
419493
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)
420494
```
421495

422-
**Note:** Since Intel® Extension for PyTorch\* is still under development, name of the c++ dynamic library in the master branch may defer to *libintel-ext-pt-cpu.so* shown above. Please check the name out in the installation folder. The so file name starts with *libintel-*.
423-
424496
**Command for compilation**
425497

426-
```
427-
$ cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> -DINTEL_EXTENSION_FOR_PYTORCH_PATH=<INTEL_EXTENSION_FOR_PYTORCH_INSTALLATION_PATH> ..
498+
```bash
499+
$ cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> ..
428500
$ make
429501
```
502+
503+
If *Found INTEL_EXT_PT_CPU* is shown as *TRUE*, the extension had been linked into the binary. This can be verified with Linux command *ldd*.
504+
505+
```bash
506+
$ cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
507+
-- The C compiler identification is GNU 9.3.0
508+
-- The CXX compiler identification is GNU 9.3.0
509+
-- Check for working C compiler: /usr/bin/cc
510+
-- Check for working C compiler: /usr/bin/cc -- works
511+
-- Detecting C compiler ABI info
512+
-- Detecting C compiler ABI info - done
513+
-- Detecting C compile features
514+
-- Detecting C compile features - done
515+
-- Check for working CXX compiler: /usr/bin/c++
516+
-- Check for working CXX compiler: /usr/bin/c++ -- works
517+
-- Detecting CXX compiler ABI info
518+
-- Detecting CXX compiler ABI info - done
519+
-- Detecting CXX compile features
520+
-- Detecting CXX compile features - done
521+
-- Looking for pthread.h
522+
-- Looking for pthread.h - found
523+
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
524+
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
525+
-- Looking for pthread_create in pthreads
526+
-- Looking for pthread_create in pthreads - not found
527+
-- Looking for pthread_create in pthread
528+
-- Looking for pthread_create in pthread - found
529+
-- Found Threads: TRUE
530+
-- Found Torch: /workspace/libtorch/lib/libtorch.so
531+
-- Found INTEL_EXT_PT_CPU: TRUE
532+
-- Configuring done
533+
-- Generating done
534+
-- Build files have been written to: /workspace/build
535+
536+
$ ldd example-app
537+
...
538+
libtorch.so => /workspace/libtorch/lib/libtorch.so (0x00007f3cf98e0000)
539+
libc10.so => /workspace/libtorch/lib/libc10.so (0x00007f3cf985a000)
540+
libintel-ext-pt-cpu.so => /workspace/libtorch/lib/libintel-ext-pt-cpu.so (0x00007f3cf70fc000)
541+
libtorch_cpu.so => /workspace/libtorch/lib/libtorch_cpu.so (0x00007f3ce16ac000)
542+
...
543+
libdnnl_graph.so.0 => /workspace/libtorch/lib/libdnnl_graph.so.0 (0x00007f3cde954000)
544+
...
545+
```
546+
547+
## Model Zoo
548+
549+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.10-models). A bunch of PyTorch use cases for benchmarking are also available on the [Github page](https://github.com/IntelAI/models/tree/pytorch-r1.10-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.

0 commit comments

Comments
 (0)