You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Intel® Extension for PyTorch* provides simple frontend Python APIs and utilities for users to get performance optimizations such as graph optimization and operator optimization with minor code changes. Typically, only 2 to 3 clauses are required to be added to the original code.
89
91
90
92
### Channels Last
93
+
91
94
Comparing to the default NCHW memory format, channels_last (NHWC) memory format could further accelerate convolutional neural networks.In Intel® Extension for PyTorch*, NHWC memory format has been enabled for most key CPU operators, though not all of them have been merged to PyTorch master branch yet. They are expected to be fully landed in PyTorch upstream soon.
92
95
93
96
### Auto Mixed Precision (AMP)
97
+
94
98
Low precision data type BFloat16 has been natively supported on the 3rd Generation Xeon scalable Servers (aka Cooper Lake) with AVX512 instruction set and will be supported on the next generation of Intel® Xeon® Scalable Processors with Intel® Advanced Matrix Extensions (Intel® AMX) instruction set with further boosted performance. The support of Auto Mixed Precision (AMP) with BFloat16 for CPU and BFloat16 optimization of operators have been massively enabled in Intel® Extension for PyTorch*, and partially upstreamed to PyTorch master branch. Most of these optimizations will be landed in PyTorch master through PRs that are being submitted and reviewed.
95
99
96
100
### Graph Optimization
101
+
97
102
To optimize performance further with torchscript, Intel® Extension for PyTorch* supports fusion of frequently used operator patterns, like Conv2D+ReLU, Linear+ReLU, etc. The benefit of the fusions are delivered to users in a transparant fashion.
98
103
99
104
### Operator Optimization
100
-
Intel® Extension for PyTorch* also optimizes operators and implements several customized operators for performance . A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch* via ATen registration mechanism. Moreover, some customized operators are implemented for several popular topologies . For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch* also optimized these customized operators.
105
+
106
+
Intel® Extension for PyTorch* also optimizes operators and implements several customized operators for performance. A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch* via ATen registration mechanism. Moreover, some customized operators are implemented for several popular topologies . For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch* also optimized these customized operators.
101
107
102
108
## Getting Started
103
109
@@ -110,63 +116,72 @@ For training and inference with BFloat16 data type, torch.cpu.amp has been enabl
110
116
The code changes that are required for Intel® Extension for PyTorch* are highlighted with comments in a line above.
111
117
112
118
### Training
119
+
113
120
#### Float32
121
+
114
122
```python
115
123
import torch
116
124
import torch.nn as nn
117
-
# Import intel_extension_for_pytorch
118
125
import intel_extension_for_pytorch as ipex
119
126
120
127
classModel(nn.Module):
121
128
def__init__(self):
122
129
super(Model, self).__init__()
123
-
self.linear= nn.Linear(4, 5)
130
+
self.conv= nn.Conv2d(2, 3, 2)
124
131
125
-
defforward(self, input):
126
-
returnself.linear(input)
132
+
defforward(self, x):
133
+
returnself.conv(x)
127
134
128
135
model = Model()
129
136
model.set_state_dict(torch.load(PATH))
130
137
optimizer.set_state_dict(torch.load(PATH))
131
138
139
+
model.train()
140
+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
141
+
model = model.to(memory_format=torch.channels_last)
132
142
# Invoke optimize function against the model object and optimizer object
TorchScript mode makes graph optimization possible , hence improves performance for some topologies. Intel® Extension for PyTorch* enables most commonly used operator pattern fusion, and users can get the performance benefit without additional code changes
254
+
226
255
#### Float32
256
+
227
257
```python
228
258
import torch
229
259
import torch.nn as nn
230
-
231
-
# Import intel_extension_for_pytorch
232
260
import intel_extension_for_pytorch as ipex
233
261
234
262
# oneDNN graph fusion is enabled by default, uncomment the line below to disable it explicitly
@@ -237,25 +265,31 @@ import intel_extension_for_pytorch as ipex
237
265
classModel(nn.Module):
238
266
def__init__(self):
239
267
super(Model, self).__init__()
240
-
self.linear= nn.Linear(4, 5)
268
+
self.conv= nn.Conv2d(2, 3, 2)
241
269
242
-
defforward(self, input):
243
-
returnself.linear(input)
270
+
defforward(self, x):
271
+
returnself.conv(x)
244
272
245
-
input= torch.randn(2, 4)
246
273
model = Model()
274
+
model.eval()
275
+
# Setting memory_format to torch.channels_last could improve performance with 4D input data. This is optional.
276
+
model = model.to(memory_format=torch.channels_last)
247
277
# Invoke optimize function against the model object
248
278
model = ipex.optimize(model)
249
-
model = torch.jit.trace(model, torch.rand(args.batch_size, 3, 224, 224))
model = torch.jit.trace(model, torch.rand(args.batch_size, 3, 224, 224))
278
317
model = torch.jit.freeze(model)
279
-
res = model(input)
318
+
res = model(images)
280
319
```
320
+
281
321
### Inference - C++
322
+
282
323
To work with libtorch, C++ library of PyTorch, Intel® Extension for PyTorch* provides its C++ dynamic library as well. The C++ library is supposed to handle inference workload only, such as service deployment. For regular development, please use Python interface. Comparing to usage of libtorch, no specific code changes are required, except for converting input data into channels last data format. During compilation, Intel optimizations will be activated automatically once C++ dynamic library of Intel® Extension for PyTorch* is linked.
* [Intel and Facebook Accelerate PyTorch Performance with 3rd Gen Intel® Xeon® Processors and Intel® Deep Learning Boost’s new BFloat16 capability](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-facebook-boost-bfloat16.html)
335
382
* [Accelerate PyTorch with the extension and oneDNN using Intel BF16 Technology](https://medium.com/pytorch/accelerate-pytorch-with-ipex-and-onednn-using-intel-bf16-technology-dca5b8e6b58f)
336
383
* [Scaling up BERT-like model Inference on modern CPU - Part 1 by the launcher of the extension](https://huggingface.co/blog/bert-cpu-scaling-part-1)
337
384
338
-
339
385
## Contribution
340
386
341
387
Please submit PR or issue to communicate with us or contribute code.
342
388
343
-
344
389
## License
345
390
346
391
_Apache License_, Version _2.0_. As found in [LICENSE](https://github.com/intel/intel-extension-for-pytorch/blob/master/LICENSE.txt) file.
0 commit comments