Skip to content

Commit b200e11

Browse files
ATMxsp01ATMxsp01
andauthored
Add initial python api doc in mddoc (2/2) (#11388)
* add PyTorch-API.md * small change * small change --------- Co-authored-by: ATMxsp01 <[email protected]>
1 parent aafd6d5 commit b200e11

File tree

1 file changed

+85
-0
lines changed

1 file changed

+85
-0
lines changed

docs/mddocs/PythonAPI/PyTorch-API.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# IPEX-LLM PyTorch API
2+
3+
## Optimize Model
4+
You can run any PyTorch model with `optimize_model` through only one-line code change to benefit from IPEX-LLM optimization, regardless of the library or API you are using.
5+
6+
### `ipex_llm.optimize_model`_`(model, low_bit='sym_int4', optimize_llm=True, modules_to_not_convert=None, cpu_embedding=False, lightweight_bmm=False, **kwargs)`_
7+
8+
A method to optimize any pytorch model.
9+
10+
- **Parameters**:
11+
12+
- **model**: The original PyTorch model (nn.module)
13+
14+
- **low_bit**: str value, options are `'sym_int4'`, `'asym_int4'`, `'sym_int5'`, `'asym_int5'`, `'sym_int8'`, `'nf3'`, `'nf4'`, `'fp4'`, `'fp8'`, `'fp8_e4m3'`, `'fp8_e5m2'`, `'fp16'` or `'bf16'`, `'sym_int4'` means symmetric int 4, `'asym_int4'` means asymmetric int 4, `'nf4'` means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.
15+
16+
- **optimize_llm**: Whether to further optimize llm model.
17+
18+
Default to be `True`.
19+
20+
- **modules_to_not_convert**: list of str value, modules (`nn.Module`) that are skipped when conducting model optimizations.
21+
22+
Default to be `None`.
23+
24+
- **cpu_embedding**: Whether to replace the Embedding layer, may need to set it to `True` when running BigDL-LLM on GPU on Windows.
25+
26+
Default to be `False`.
27+
28+
- **lightweight_bmm**: Whether to replace the `torch.bmm` ops, may need to set it to `True` when running BigDL-LLM on GPU on Windows.
29+
30+
Default to be `False`.
31+
32+
- **Returns**: The optimized model.
33+
34+
- **Example**:
35+
```python
36+
# Take OpenAI Whisper model as an example
37+
from ipex_llm import optimize_model
38+
model = whisper.load_model('tiny') # Load whisper model under pytorch framework
39+
model = optimize_model(model) # With only one line code change
40+
# Use the optimized model without other API change
41+
result = model.transcribe(audio, verbose=True, language="English")
42+
# (Optional) you can also save the optimized model by calling 'save_low_bit'
43+
model.save_low_bit(saved_dir)
44+
```
45+
46+
## Load Optimized Model
47+
48+
To avoid high resource consumption during the loading processes of the original model, we provide save/load API to support the saving of model after low-bit optimization and the loading of the saved low-bit model. Saving and loading operations are platform-independent, regardless of their operating systems.
49+
50+
### `ipex_llm.optimize.load_low_bit`_`(model, model_path)`_
51+
52+
Load the optimized pytorch model.
53+
54+
- **Parameters**:
55+
56+
- **model**: The PyTorch model instance.
57+
58+
- **model_path**: The path of saved optimized model.
59+
60+
61+
- **Returns**: The optimized model.
62+
63+
- **Example**:
64+
```python
65+
# Example 1:
66+
# Take ChatGLM2-6B model as an example
67+
# Make sure you have saved the optimized model by calling 'save_low_bit'
68+
from ipex_llm.optimize import low_memory_init, load_low_bit
69+
with low_memory_init(): # Fast and low cost by loading model on meta device
70+
model = AutoModel.from_pretrained(saved_dir,
71+
torch_dtype="auto",
72+
trust_remote_code=True)
73+
model = load_low_bit(model, saved_dir) # Load the optimized model
74+
```
75+
76+
```python
77+
# Example 2:
78+
# If the model doesn't fit 'low_memory_init' method,
79+
# alternatively, you can obtain the model instance through traditional loading method.
80+
# Take OpenAI Whisper model as an example
81+
# Make sure you have saved the optimized model by calling 'save_low_bit'
82+
from ipex_llm.optimize import load_low_bit
83+
model = whisper.load_model('tiny') # A model instance through traditional loading method
84+
model = load_low_bit(model, saved_dir) # Load the optimized model
85+
```

0 commit comments

Comments
 (0)