Skip to content

Commit 7ec82c6

Browse files
authored
LLM: add README.md for Long-Context examples. (#10765)
* LLM: add readme to long-context examples. * add precision. * update wording. * add GPU type. * add Long-Context example to GPU examples. * fix comments. * update max input length. * update max length. * add output length. * fix wording.
1 parent 766fe45 commit 7ec82c6

File tree

2 files changed

+34
-0
lines changed

2 files changed

+34
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Running Long-Context generation using IPEX-LLM on Intel Arc™ A770 Graphics
2+
3+
Long-Context Generation is a critical aspect in various applications, such as document summarization, extended conversation handling, and complex question answering. Effective long-context generation can lead to more coherent and contextually relevant responses, enhancing user experience and model utility.
4+
5+
This folder contains examples of running long-context generation with IPEX-LLM on Intel Arc™ A770 Graphics(16GB GPU memory):
6+
7+
<!-- TODO: Maybe like this after adding more examples:
8+
- [Single GPU](Single GPU): single GPU examples w & w/o batch.
9+
- [Multiple GPU](Multiple GPU): multiple GPU examples w & w/o batch. -->
10+
- [LLaMA2-32K](LLaMA2-32K): examples of running LLaMA2-32K models with INT4/FP8 precision.
11+
- [ChatGLM3-32K](Chatglm3-32K): examples of running ChatGLM3-32K models with INT4/FP8 precision.
12+
13+
### Maximum Input Length for Different Models with INT4/FP8 Precision.
14+
15+
- **INT4**
16+
17+
| Model Name | Low Memory Mode | Maximum Input Length | Output Length |
18+
| -- | -- | -- | -- |
19+
| LLaMA2-7B-32K | Disable | 10K | 512 |
20+
| | Enable | 12K | 512 |
21+
| ChatGLM3-6B-32K | Disable | 9K | 512 |
22+
| | Enable | 10K | 512 |
23+
24+
- **FP8**
25+
26+
| Model Name | Low Memory Mode | Maximum Input Length | Output Length |
27+
| -- | -- | -- | -- |
28+
| LLaMA2-7B-32K | Disable | 7K | 512 |
29+
| | Enable | 9K | 512 |
30+
| ChatGLM3-6B-32K | Disable | 8K | 512|
31+
| | Enable | 9K | 512 |
32+
33+
> Note: If you need to run longer input or use less memory, please set `IPEX_LLM_LOW_MEM=1` to enable **low memory mode**, which will enable memory optimization and may slightly affect the performance.

python/llm/example/GPU/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This folder contains examples of running IPEX-LLM on Intel GPU:
1212
- [PyTorch-Models](PyTorch-Models): running any PyTorch model on IPEX-LLM (with "one-line code change")
1313
- [Speculative-Decoding](Speculative-Decoding): running any ***Hugging Face Transformers*** model with ***self-speculative decoding*** on Intel GPUs
1414
- [ModelScope-Models](ModelScope-Models): running ***ModelScope*** model with IPEX-LLM on Intel GPUs
15+
- [Long-Context](Long-Context): running **long-context** generation with IPEX-LLM on Intel Arc™ A770 Graphics.
1516

1617

1718
## System Support

0 commit comments

Comments
 (0)