llama.cpp 缺失算子补全 #1

noemotiovon · 2025-04-07T08:21:10Z

状态	名称	责任人	备注
已完成	ABS	@hipudding
已完成	ADD	@hipudding
已完成	ARGMAX	@noemotiovon
已完成	CONT	@noemotiovon	bf16暂不支持
已完成	CONV_TRANSPOSE_1D	@noemotiovon
已完成	COS	@noemotiovon
已完成	COUNT_EQUAL	@noemotiovon	需要多个算子拼接，SUM和EqTensor
不涉及	CPY	@noemotiovon	部分类型不支持
	CROSS_ENTROPY_LOSS		需要多个算子拼接，logsoftmax, argmax, sub, BinaryCrossEntropy
	CROSS_ENTROPY_LOSS_BACK
已完成	DUP	@noemotiovon
已完成	ELU	@noemotiovon
已完成	EXP	@hipudding
	FLASH_ATTN_EXT
	GATED_LINEAR_ATTN
不涉及	GET_ROWS	@noemotiovon	部分类型不支持
	GET_ROWS_BACK		indexAdd算子但不支持，需要再找
	L2_NORM
已完成	LOG	@noemotiovon
已完成	MEAN	@noemotiovon
	MUL_MAT
	MUL_MAT_ID
已完成	NEG	@hipudding
	OPT_STEP_ADAMW
	OUT_PROD
已完成	PAD_REFLECT_1D	@noemotiovon
	REPEAT_BACK
	RMS_NORM_BACK
已完成	ROPE	@noemotiovon	部分高级特性不支持，不影响Qwen和llama
	ROPE_BACK
	RWKV_WKV6
	RWKV_WKV7
	SET
已完成	SGN	@noemotiovon
已完成	SIGMOID	@hipudding
	SILU_BACK
已完成	SIN	@noemotiovon
	SOFT_MAX_BACK
已完成	SQRT	@hipudding
	SSM_CONV
	SSM_SCAN
已完成	STEP	@noemotiovon
已完成	SUB
已完成	SUM	@hipudding
不涉及	UPSCALE		部分类型不支持
已完成	GELU_QUICK	@hipudding

noemotiovon · 2025-04-07T08:33:51Z

04-02: GET_ROWS && DUP && CPY

noemotiovon · 2025-04-07T08:35:07Z

04-03: SIN && COS && ARGMAX

noemotiovon · 2025-04-07T08:36:18Z

04-07: CONV_TRANSPOSE_1D && ELU

noemotiovon · 2025-04-09T02:52:58Z

04-08: 使用std::func解决template中使用lambda表达式的问题

noemotiovon · 2025-04-09T09:03:21Z

04-09: LOG && MEAN && PAD_REFLECT_1D && COUNT_EQUAL && SGN && STEP

noemotiovon · 2025-04-10T08:54:45Z

04-10: ROPE optimization

noemotiovon · 2025-04-11T09:51:25Z

04-11: 优化 ROPE 算子精度问题 && 学习 Profiling 技巧

hipudding · 2025-04-14T06:24:25Z

排除310p不支持的算子

noemotiovon · 2025-04-15T11:39:29Z

04-14 - 04-15:

调研Profiling，与aipc项目产品确定算子需求。
调研aclnn-FA能否在llama.cpp中使用。结论：精度存在偏差。
- 目前的关键点是llama.cpp中，FA是通过将mask（F16）的数值经过转换，乘以slope后直接加到Query-Key的打分上，使得某些位置的得分大幅度降低，从而在softmax计算时被接近于 0 的权重排除掉，实现方式是通过打分+偏执。
- ACLNN 算子要求的 mask 参数数据类型为 bool 或 int8，用来做二值化屏蔽。

noemotiovon · 2025-04-16T02:03:17Z

已完成。PR

排除310p不支持的算子

hipudding · 2025-04-16T08:21:55Z

复杂算子暂时没有影响大模型推理，根据实际需求支持剩余算子

noemotiovon · 2025-04-24T01:24:20Z

04-21：已经完成了MUL_MAT_ID的算子实现，当前未使用GroupedMatMul加速库算子。PR

noemotiovon · 2025-04-24T01:26:58Z

04-22：对MUL_MAT_ID算子，使用GroupedMatMul来完成实现，仍存在一些参数上的问题，待完成，暂时先搁置，投入torchair + mindie 实现 deepseek-v2 kv cache上

noemotiovon assigned noemotiovon and hipudding and unassigned noemotiovon Apr 7, 2025

noemotiovon assigned noemotiovon and hipudding and unassigned hipudding and noemotiovon Apr 8, 2025

hipudding assigned noemotiovon and unassigned hipudding Apr 9, 2025

hipudding closed this as completed Apr 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp 缺失算子补全 #1

llama.cpp 缺失算子补全 #1

noemotiovon commented Apr 7, 2025 •

edited

Loading

noemotiovon commented Apr 7, 2025

noemotiovon commented Apr 7, 2025

noemotiovon commented Apr 7, 2025

noemotiovon commented Apr 9, 2025

noemotiovon commented Apr 9, 2025 •

edited

Loading

noemotiovon commented Apr 10, 2025

noemotiovon commented Apr 11, 2025

hipudding commented Apr 14, 2025 •

edited by noemotiovon

Loading

noemotiovon commented Apr 15, 2025

noemotiovon commented Apr 16, 2025

hipudding commented Apr 16, 2025

noemotiovon commented Apr 24, 2025 •

edited

Loading

noemotiovon commented Apr 24, 2025

llama.cpp 缺失算子补全 #1

llama.cpp 缺失算子补全 #1

Comments

noemotiovon commented Apr 7, 2025 • edited Loading

noemotiovon commented Apr 7, 2025

noemotiovon commented Apr 7, 2025

noemotiovon commented Apr 7, 2025

noemotiovon commented Apr 9, 2025

noemotiovon commented Apr 9, 2025 • edited Loading

noemotiovon commented Apr 10, 2025

noemotiovon commented Apr 11, 2025

hipudding commented Apr 14, 2025 • edited by noemotiovon Loading

noemotiovon commented Apr 15, 2025

noemotiovon commented Apr 16, 2025

hipudding commented Apr 16, 2025

noemotiovon commented Apr 24, 2025 • edited Loading

noemotiovon commented Apr 24, 2025

noemotiovon commented Apr 7, 2025 •

edited

Loading

noemotiovon commented Apr 9, 2025 •

edited

Loading

hipudding commented Apr 14, 2025 •

edited by noemotiovon

Loading

noemotiovon commented Apr 24, 2025 •

edited

Loading