Skip to content

Commit f87430a

Browse files
XiaobingSuperjingxu10EikanWang
committed
update README (#357)
Co-authored-by: Jing Xu <[email protected]> Co-authored-by: Wang Weihan <[email protected]>
1 parent 8f88675 commit f87430a

File tree

1 file changed

+93
-48
lines changed

1 file changed

+93
-48
lines changed

README.md

Lines changed: 93 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ From 1.8.0, compiling PyTorch from source is not required. If you still want to
5656

5757
### Install via wheel file
5858

59-
```
59+
```python
6060
python -m pip install torch_ipex==1.9.0 -f https://software.intel.com/ipex-whl-stable
6161
```
6262

@@ -84,20 +84,26 @@ python setup.py install
8484
```
8585

8686
## Features
87+
8788
### Ease-of-use Python API
89+
8890
Intel® Extension for PyTorch* provides simple frontend Python APIs and utilities for users to get performance optimizations such as graph optimization and operator optimization with minor code changes. Typically, only 2 to 3 clauses are required to be added to the original code.
8991

9092
### Channels Last
93+
9194
Comparing to the default NCHW memory format, channels_last (NHWC) memory format could further accelerate convolutional neural networks.In Intel® Extension for PyTorch*, NHWC memory format has been enabled for most key CPU operators, though not all of them have been merged to PyTorch master branch yet. They are expected to be fully landed in PyTorch upstream soon.
9295

9396
### Auto Mixed Precision (AMP)
97+
9498
Low precision data type BFloat16 has been natively supported on the 3rd Generation Xeon scalable Servers (aka Cooper Lake) with AVX512 instruction set and will be supported on the next generation of Intel® Xeon® Scalable Processors with Intel® Advanced Matrix Extensions (Intel® AMX) instruction set with further boosted performance. The support of Auto Mixed Precision (AMP) with BFloat16 for CPU and BFloat16 optimization of operators have been massively enabled in Intel® Extension for PyTorch*, and partially upstreamed to PyTorch master branch. Most of these optimizations will be landed in PyTorch master through PRs that are being submitted and reviewed.
9599

96100
### Graph Optimization
101+
97102
To optimize performance further with torchscript, Intel® Extension for PyTorch* supports fusion of frequently used operator patterns, like Conv2D+ReLU, Linear+ReLU, etc. The benefit of the fusions are delivered to users in a transparant fashion.
98103

99104
### Operator Optimization
100-
Intel® Extension for PyTorch* also optimizes operators and implements several customized operators for performance . A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch* via ATen registration mechanism. Moreover, some customized operators are implemented for several popular topologies . For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch* also optimized these customized operators.
105+
106+
Intel® Extension for PyTorch* also optimizes operators and implements several customized operators for performance. A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch* via ATen registration mechanism. Moreover, some customized operators are implemented for several popular topologies . For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch* also optimized these customized operators.
101107

102108
## Getting Started
103109

@@ -110,63 +116,72 @@ For training and inference with BFloat16 data type, torch.cpu.amp has been enabl
110116
The code changes that are required for Intel® Extension for PyTorch* are highlighted with comments in a line above.
111117

112118
### Training
119+
113120
#### Float32
121+
114122
```python
115123
import torch
116124
import torch.nn as nn
117-
# Import intel_extension_for_pytorch
118125
import intel_extension_for_pytorch as ipex
119126

120127
class Model(nn.Module):
121128
def __init__(self):
122129
super(Model, self).__init__()
123-
self.linear = nn.Linear(4, 5)
130+
self.conv = nn.Conv2d(2, 3, 2)
124131

125-
def forward(self, input):
126-
return self.linear(input)
132+
def forward(self, x):
133+
return self.conv(x)
127134

128135
model = Model()
129136
model.set_state_dict(torch.load(PATH))
130137
optimizer.set_state_dict(torch.load(PATH))
131138

139+
model.train()
140+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
141+
model = model.to(memory_format=torch.channels_last)
132142
# Invoke optimize function against the model object and optimizer object
133-
model, optimizer = ipex.optimize(model, optimizer, dtype=torch.float32)
143+
model, optimizer = ipex.optimize(model, optimizer=optimizer)
134144

135145
for images, label in train_loader():
136-
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
146+
# Optional.
137147
images = images.to(memory_format=torch.channels_last)
138148

139149
loss = criterion(model(images), label)
140150
loss.backward()
141151
optimizer.step()
152+
142153
torch.save(model.state_dict(), PATH)
143154
torch.save(optimizer.state_dict(), PATH)
144155
```
156+
145157
#### BFloat16
158+
146159
```python
147160
import torch
148161
import torch.nn as nn
149-
# Import intel_extension_for_pytorch
150162
import intel_extension_for_pytorch as ipex
151163

152164
class Model(nn.Module):
153165
def __init__(self):
154166
super(Model, self).__init__()
155-
self.linear = nn.Linear(4, 5)
167+
self.conv = nn.Conv2d(2, 3, 2)
156168

157-
def forward(self, input):
158-
return self.linear(input)
169+
def forward(self, x):
170+
return self.conv(x)
159171

160172
model = Model()
161173
model.set_state_dict(torch.load(PATH))
162174
optimizer.set_state_dict(torch.load(PATH))
163175

176+
model.train()
177+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
178+
model = model.to(memory_format=torch.channels_last)
164179
# Invoke optimize function against the model object and optimizer object with data type set to torch.bfloat16
165-
model, optimizer = ipex.optimize(model, optimizer, dtype=torch.bfloat16)
180+
model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)
166181

167182
for images, label in train_loader():
168183
with torch.cpu.amp.autocast():
169-
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
184+
# Optional.
170185
images = images.to(memory_format=torch.channels_last)
171186
loss = criterion(model(images), label)
172187
loss.backward()
@@ -176,59 +191,72 @@ torch.save(optimizer.state_dict(), PATH)
176191
```
177192

178193
### Inference - Imperative Mode
194+
179195
#### Float32
196+
180197
```python
181198
import torch
182199
import torch.nn as nn
183-
184-
# Import intel_extension_for_pytorch
185200
import intel_extension_for_pytorch as ipex
186201

187202
class Model(nn.Module):
188203
def __init__(self):
189204
super(Model, self).__init__()
190-
self.linear = nn.Linear(4, 5)
205+
self.conv = nn.Conv2d(2, 3, 2)
191206

192-
def forward(self, input):
193-
return self.linear(input)
207+
def forward(self, x):
208+
return self.conv(x)
194209

195-
input = torch.randn(2, 4)
196210
model = Model()
211+
model.eval()
212+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
213+
model = model.to(memory_format=torch.channels_last)
197214
# Invoke optimize function against the model object
198215
model = ipex.optimize(model)
199-
res = model(input)
216+
with torch.no_grad():
217+
# Optional.
218+
images = images.to(memory_format=torch.channels_last)
219+
220+
res = model(images)
200221
```
222+
201223
#### BFloat16
224+
202225
```python
203226
import torch
204227
import torch.nn as nn
205-
206-
# Import intel_extension_for_pytorch
207228
import intel_extension_for_pytorch as ipex
208229

209230
class Model(nn.Module):
210231
def __init__(self):
211232
super(Model, self).__init__()
212-
self.linear = nn.Linear(4, 5)
233+
self.conv = nn.Conv2d(2, 3, 2)
213234

214-
def forward(self, input):
215-
return self.linear(input)
235+
def forward(self, x):
236+
return self.conv(x)
216237

217-
input = torch.randn(2, 4)
218238
model = Model()
239+
model.eval()
240+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
241+
model = model.to(memory_format=torch.channels_last)
219242
# Invoke optimize function against the model object with data type set to torch.bfloat16
220243
model = ipex.optimize(model, dtype=torch.bfloat16)
221-
with torch.cpu.amp.autocast():
222-
res = model(input)
244+
with torch.no_grad(), torch.cpu.amp.autocast():
245+
# Optional.
246+
images = images.to(memory_format=torch.channels_last)
247+
248+
res = model(images)
223249
```
250+
224251
### Inference - TorchScript Mode
252+
225253
TorchScript mode makes graph optimization possible , hence improves performance for some topologies. Intel® Extension for PyTorch* enables most commonly used operator pattern fusion, and users can get the performance benefit without additional code changes
254+
226255
#### Float32
256+
227257
```python
228258
import torch
229259
import torch.nn as nn
230-
231-
# Import intel_extension_for_pytorch
232260
import intel_extension_for_pytorch as ipex
233261

234262
# oneDNN graph fusion is enabled by default, uncomment the line below to disable it explicitly
@@ -237,25 +265,31 @@ import intel_extension_for_pytorch as ipex
237265
class Model(nn.Module):
238266
def __init__(self):
239267
super(Model, self).__init__()
240-
self.linear = nn.Linear(4, 5)
268+
self.conv = nn.Conv2d(2, 3, 2)
241269

242-
def forward(self, input):
243-
return self.linear(input)
270+
def forward(self, x):
271+
return self.conv(x)
244272

245-
input = torch.randn(2, 4)
246273
model = Model()
274+
model.eval()
275+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
276+
model = model.to(memory_format=torch.channels_last)
247277
# Invoke optimize function against the model object
248278
model = ipex.optimize(model)
249-
model = torch.jit.trace(model, torch.rand(args.batch_size, 3, 224, 224))
250-
model = torch.jit.freeze(model)
251-
res = model(input)
279+
with torch.no_grad():
280+
# Optional.
281+
images = images.to(memory_format=torch.channels_last)
282+
283+
model = torch.jit.trace(model, images)
284+
model = torch.jit.freeze(model)
285+
res = model(images)
252286
```
287+
253288
#### BFloat16
289+
254290
```python
255291
import torch
256292
import torch.nn as nn
257-
258-
# Import intel_extension_for_pytorch
259293
import intel_extension_for_pytorch as ipex
260294

261295
# oneDNN graph fusion is enabled by default, uncomment the line below to disable it explicitly
@@ -264,22 +298,30 @@ import intel_extension_for_pytorch as ipex
264298
class Model(nn.Module):
265299
def __init__(self):
266300
super(Model, self).__init__()
267-
self.linear = nn.Linear(4, 5)
301+
self.conv = nn.Conv2d(2, 3, 2)
268302

269-
def forward(self, input):
270-
return self.linear(input)
303+
def forward(self, x):
304+
return self.conv(x)
271305

272-
input = torch.randn(2, 4)
273306
model = Model()
307+
model.eval()
308+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
309+
model = model.to(memory_format=torch.channels_last)
274310
# Invoke optimize function against the model with data type set to torch.bfloat16
275311
model = ipex.optimize(model, dtype=torch.bfloat16)
276-
with torch.cpu.amp.autocast():
312+
with torch.no_grad(), torch.cpu.amp.autocast():
313+
# Optional.
314+
images = images.to(memory_format=torch.channels_last)
315+
277316
model = torch.jit.trace(model, torch.rand(args.batch_size, 3, 224, 224))
278317
model = torch.jit.freeze(model)
279-
res = model(input)
318+
res = model(images)
280319
```
320+
281321
### Inference - C++
322+
282323
To work with libtorch, C++ library of PyTorch, Intel® Extension for PyTorch* provides its C++ dynamic library as well. The C++ library is supposed to handle inference workload only, such as service deployment. For regular development, please use Python interface. Comparing to usage of libtorch, no specific code changes are required, except for converting input data into channels last data format. During compilation, Intel optimizations will be activated automatically once C++ dynamic library of Intel® Extension for PyTorch* is linked.
324+
283325
```C++
284326
#include <torch/script.h>
285327
#include <iostream>
@@ -303,8 +345,11 @@ int main(int argc, const char* argv[]) {
303345
return 0;
304346
}
305347
```
348+
306349
## Operator Optimizations
350+
307351
### Supported Customized Operators
352+
308353
* ROIAlign
309354
* NMS
310355
* BatchScoreNMS
@@ -328,19 +373,19 @@ int main(int argc, const char* argv[]) {
328373
* View + Transpose + Contiguous + View
329374
330375
## Tutorials
376+
331377
* [Performance Tuning](tutorials/Performance_Tuning.md)
332378
333379
## Joint-blogs
380+
334381
* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
335382
* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
336383
* [Scaling up BERT-like model Inference on modern CPU - Part 1 by the launcher of the extension](https://huggingface.co/blog/bert-cpu-scaling-part-1)
337384
338-
339385
## Contribution
340386
341387
Please submit PR or issue to communicate with us or contribute code.
342388
343-
344389
## License
345390
346391
_Apache License_, Version _2.0_. As found in [LICENSE](https://github.com/intel/intel-extension-for-pytorch/blob/master/LICENSE.txt) file.

0 commit comments

Comments
 (0)