Skip to content

Commit a1631e5

Browse files
compiladeggerganov
andauthored
llama : simplify Mamba with advanced batch splits (#8526)
* llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <[email protected]> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <[email protected]>
1 parent fc54ef0 commit a1631e5

File tree

4 files changed

+1137
-678
lines changed

4 files changed

+1137
-678
lines changed

ggml/include/ggml.h

+3-6
Original file line numberDiff line numberDiff line change
@@ -1777,10 +1777,8 @@ extern "C" {
17771777

17781778
GGML_API struct ggml_tensor * ggml_ssm_conv(
17791779
struct ggml_context * ctx,
1780-
struct ggml_tensor * s,
1781-
struct ggml_tensor * x,
1782-
struct ggml_tensor * c,
1783-
struct ggml_tensor * sq);
1780+
struct ggml_tensor * sx,
1781+
struct ggml_tensor * c);
17841782

17851783
GGML_API struct ggml_tensor * ggml_ssm_scan(
17861784
struct ggml_context * ctx,
@@ -1789,8 +1787,7 @@ extern "C" {
17891787
struct ggml_tensor * dt,
17901788
struct ggml_tensor * A,
17911789
struct ggml_tensor * B,
1792-
struct ggml_tensor * C,
1793-
struct ggml_tensor * sq);
1790+
struct ggml_tensor * C);
17941791

17951792
// partition into non-overlapping windows with padding if needed
17961793
// example:

0 commit comments

Comments
 (0)