Skip to content

piolob: large transfer regression between 6.6.74+rpt-rpi-v8 & 6.12.18-v8-16k+ #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jepler opened this issue Mar 17, 2025 · 4 comments
Closed

Comments

@jepler
Copy link

jepler commented Mar 17, 2025

For #107, the ability to transfer more than 65532 bytes at a time was added.

On my system with kernel 6.6.74+rpt-rpi-v8 this works. On my colleague's system with 6.12.18-v8-16k+, large transfers result in "pio_sm_xfer_data: Operation not permitted".

This can easily be seen for example by using a modified version of the "pull noblock" test from #116 (comment)

On an affected system, running it with no args will give an error

$ uname -a
Linux raspberrypi 6.12.18-v8-16k+ #1862 SMP PREEMPT Wed Mar 12 12:33:09 GMT 2025 aarch64 GNU/Linux
$ ./bench1
Loaded program at 29, using sm 0
Actual frequency 10.000000MHz
Bounce buffer size 65532
Transfer size 262144
pio_sm_xfer_data: Operation not permitted
Aborted

while specifying a transfer size of 65532 will succeed:

$ ./bench1 10e6 65532
Loaded program at 29, using sm 0
Actual frequency 10.000000MHz
Bounce buffer size 65532
Transfer size 65532
32241744 bytes in 3000.3ms (10.2MiB/s)
{"frequency": 1e+07, "rate": 1.07463e+07}
pull noblock test with block size control
#include <errno.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include "piolib.h"
#include "ws2812.pio.h"

#define bench_wrap_target 1
#define bench_wrap 2

static const uint16_t bench_program_instructions[] = {
    0xe020, // set x,0
            // .wrap_target
    0x8080, // pull noblock
    0x6000, // out pins, 32
            // .wrap
};

static const struct pio_program bench_program = {
    .instructions = bench_program_instructions,
    .length = 3,
    .origin = -1,
};

static inline pio_sm_config bench_program_get_default_config(uint offset) {
    pio_sm_config c = pio_get_default_sm_config();
    sm_config_set_wrap(&c, offset + bench_wrap_target, offset + bench_wrap);
    sm_config_set_sideset(&c, 1, false, false);
    return c;
}

static inline float bench_program_init(PIO pio, int sm, int offset, float freq, int gpio_base) {
    pio_sm_config c = bench_program_get_default_config(offset);
    sm_config_set_out_shift(&c, false, false /* auto pull */, 32);
    sm_config_set_out_pins(&c, 0, 32);
    sm_config_set_fifo_join(&c, PIO_FIFO_JOIN_TX);
    float div = clock_get_hz(clk_sys) / freq;
    if(div < 1) div = 1;
    if(div > 65535) div = 65535;
    int div_int = (int)div;
    int div_frac = (int)((div - div_int) * 256);
    sm_config_set_clkdiv_int_frac(&c, div_int, div_frac);
    pio_sm_init(pio, sm, offset, &c);
    pio_sm_set_enabled(pio, sm, true);
    pio_gpio_init(pio, gpio_base);
    pio_gpio_init(pio, gpio_base+1);
    pio_sm_set_consecutive_pindirs(pio, sm, gpio_base, 2, true);
    return clock_get_hz(clk_sys) / (div_int + div_frac / 256.);
}


double monotonic() {
    struct timespec tv;
    clock_gettime(CLOCK_MONOTONIC, &tv);
    return tv.tv_sec + tv.tv_nsec * 1e-9;
}

long databuf[32768];

int main(int argc, const char **argv)
{
    float frequency = argc > 1 ? atof(argv[1]) : 10e6;
    size_t xfer_size = argc > 2 ? (size_t)atoi(argv[2]) : sizeof(databuf);
    size_t bounce_buffer_size = 65532;
    PIO pio;
    int sm;
    uint offset;

    if (xfer_size > sizeof(databuf)) xfer_size = sizeof(databuf);

    pio = pio0;
    sm = pio_claim_unused_sm(pio, true);
    pio_sm_config_xfer(pio, sm, PIO_DIR_TO_SM, bounce_buffer_size, 3);

    offset = pio_add_program(pio, &bench_program);
    fprintf(stderr, "Loaded program at %d, using sm %d\n", offset, sm);

    float actual_frequency = bench_program_init(pio, sm, offset, frequency, /* base pin */ 5);
    fprintf(stderr, "Actual frequency %fMHz\n", actual_frequency/1e6);
    fprintf(stderr, "Bounce buffer size %zu\n", bounce_buffer_size);
    fprintf(stderr, "Transfer size %zu\n", xfer_size);
    pio_sm_clear_fifos(pio, sm);

    for(size_t i=0; i<sizeof(databuf)/sizeof(databuf[0]); i++ )
        databuf[i] = i % 2 ? 0x55555555 : 0xaaaaaaaa;

    double t0 = monotonic();
    size_t xfer = 0;
    do {
        int r = pio_sm_xfer_data(pio, sm, PIO_DIR_TO_SM, xfer_size, databuf);
        if (r < 0) { errno = -r; perror("pio_sm_xfer_data"); abort(); }
        xfer += xfer_size;
    } while(monotonic() - t0 < 3);
    double t1 = monotonic();
    double dt = t1 - t0;
    double rate = xfer / dt; // bytes per second
    fprintf(stderr, "%zu bytes in %.1fms (%.1fMiB/s)\n",
        xfer, dt*1e3, rate / 1048576);
    printf("{\"frequency\": %g, \"rate\": %g}\n",
        actual_frequency, rate);
    return 0;
}

ping @ladyada @FoamyGuy for interest

@pelwell
Copy link
Collaborator

pelwell commented Mar 17, 2025

It's working for me, albeit with a self-built kernel:

$ uname -a
Linux phil-lite 6.12.18-v8-16k+ #1862 SMP PREEMPT Wed Mar 12 12:33:09 GMT 2025 aarch64 GNU/Linux
$ sudo ./pull_noblock
Loaded program at 29, using sm 0
Actual frequency 10.000000MHz
33554432 bytes in 3125.1ms (10.2MiB/s)
{"frequency": 1e+07, "rate": 1.07371e+07}

What firmware version is your colleague running?

$ vcgencmd bootloader_version
$ dmesg | grep RP1

@jepler
Copy link
Author

jepler commented Mar 17, 2025

fwiw my error reporting code is incorrect. The actual error when transferring >65532 bytes is ETIMEDOUT.

ioctl(3, _IOC(_IOC_WRITE, 0x66, 0x2, 0x10), 0x7fd5731d98) = -1 ETIMEDOUT (Connection timed out)

errors like the following appear in dmesg

[   52.622787] rp1-pio 1f00178000.pio: DMA wait timed out
[   71.534683] rp1-pio 1f00178000.pio: DMA bounce timed out
[   72.558524] rp1-pio 1f00178000.pio: DMA wait timed out

version info (now on my local machine, I ran sudo rpi-update to get on the 6.12.y kernels):

$ uname -a
Linux m5 6.12.19-v8-16k+ #1863 SMP PREEMPT Thu Mar 13 14:23:53 GMT 2025 aarch64 GNU/Linux
$ vcgencmd bootloader_version
2025/03/10 17:10:37
version 2bb2ae640058a2f3aa8dcbdc2da33302e509668d (release)
timestamp 1741626637
update-time 1742227390
capabilities 0x0000007f

$ dmesg | grep RP1
[    6.898924] rp1-firmware rp1_firmware: RP1 Firmware version eb39cfd516f8c90628aa9d91f52370aade5d0a55

@pelwell
Copy link
Collaborator

pelwell commented Mar 19, 2025

I was able to reproduce this with the updated pull_noblock test. It's caused by a patch missing from the 6.12 RP1 DMA driver, now restored by raspberrypi/linux#6729. You can test this kernel with sudo rpi-update pulls/6729.

@jepler
Copy link
Author

jepler commented Mar 19, 2025

It looks like that's fixed it. thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants