Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program full sectors, longer read/write timeouts #52

Merged
merged 5 commits into from
Jul 6, 2019

Conversation

ewenmcneill
Copy link
Contributor

As diagnosed in timvideos/litex-buildenv#137, there seems to be an issue with programming the TinyFPGA BX SPI flash with less than 256 octets (minor sector size). In theory the SPI flash supports this (see datasheet pages 14-16), but in practice it always seems to fail. It's unclear whether this is caused by a state machine issue in the TinyFPGA-Bootloader verilog, or some bug or unclear documentation in the SPI flash. But since we've already erased the entire 256 octets (and on any subsequent write, we'll also erase the entire 256 octets) the simplest solution is to pad all minor sector writes out to a full 256 octets (with 0xff, which is what is read back after erase, as that hopefully saves one cycle of wear on the "unused" flash cells).

Also double the read timeouts (was 1.0 seconds, now 2.0 seconds), because the only way that pySerial will return a short read is if the readTimeout is hit, and there are multiple reports of users getting short reads and verify errors (and thus programming failures; some bricking their boards). And substantially increase the writeTimeout (was 1.0 seconds, now 5.0 seconds) because tinyprog gives up immediately if there is a writeTimeout, which could lead to a partially programmed board (possibly one that is bricked), so it seems worth being more patient. (IIRC there's another pull request around that retries the sector write if it times out/fails, but ISTR it's buried in a gigantic pull request that adds a whole other board, etc, and is against 6-12 month old code, so it is not easily merged directly.)

Seems to work for me, both images that are a multiple of 256 octets, and images that are not a multiple of 256 octets. (See timvideos/litex-buildenv#137 for the testing to demonstrate there was a problem; I re-ran the tests in timvideos/litex-buildenv#137 (comment) to check this version was working.)

Programing output:

(LX P=tinyfpga_bx.minimal F=none R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ make clean; export FIRMWARE=none; make image-flash
[...]
tinyprog --program-image build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+none.bin

    TinyProg CLI
    ------------
    Using device id 1d50:6130
    Only one board with active bootloader, using it.

    Programming /dev/ttyACM0 with build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+none.bin
    Programming at addr 028000
    Waking up SPI flash
    180192 bytes to program
    Programming and Verifying: 180kB [00:04, 39.6kB/s]                          
    Success!

(LX P=tinyfpga_bx.minimal F=stub R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ make clean; export FIRMWARE=stub; make image-flash
[...]
tinyprog --program-image build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+stub.bin

    TinyProg CLI
    ------------
    Using device id 1d50:6130
    Only one board with active bootloader, using it.

    Programming /dev/ttyACM0 with build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+stub.bin
    Programming at addr 028000
    Waking up SPI flash
    203904 bytes to program
    Programming and Verifying: 205kB [00:05, 39.7kB/s]                          
    Success!

(LX P=tinyfpga_bx.minimal F=stub R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ make clean; export FIRMWARE=micropython; make gateware; scripts/build-micropython.sh; make image-flash
[...]
tinyprog --program-image build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin

    TinyProg CLI
    ------------
    Using device id 1d50:6130
    Only one board with active bootloader, using it.

    Programming /dev/ttyACM0 with build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin
    Programming at addr 028000
    Waking up SPI flash
    442884 bytes to program
    Programming and Verifying: 446kB [00:11, 40.3kB/s]                          
    Success!

(LX P=tinyfpga_bx.minimal F=micropython R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ 

And the serial output (note that if you program FIRMWARE=none it just writes to the gateware section of the flash, which pretty much guarantees it'll boot whatever firmware was written previously, thus in my case it booted MicroPython; see further down for writing a different firmware and then re-writing MicroPython back again):

        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
 SoC BIOS / CPU: LM32 /  16MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jun  8 2019 12:59:13

BIOS CRC passed (494556cb)
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
Booting from flash...
Executing booted program at 0x20058008
MicroPython v1.9.4-534-gd2bd404 on 2019-05-20; litex with lm32
>>> 
        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
 SoC BIOS / CPU: LM32 /  16MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jun  8 2019 13:03:17

BIOS CRC passed (f68ee6ca)
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
Booting from flash...
Executing booted program at 0x20058008
Stub firmware booting...


LiteX-Buildenv Stub Firmware, built Jun  8 2019 13:03:18

Type "help" for help

STUB>
        __   _ __      _  __
       / /  (_) /____ | |/_/
      / /__/ / __/ -_)>  <
     /____/_/\__/\__/_/|_|
 SoC BIOS / CPU: LM32 /  16MHz
(c) Copyright 2012-2018 Enjoy-Digital
(c) Copyright 2007-2018 M-Labs Limited
Built Jun  8 2019 13:09:46

BIOS CRC passed (71f12f87)
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
Booting from flash...
Executing booted program at 0x20058008
MicroPython v1.9.4-534-gd2bd404 on 2019-06-08; litex with lm32
>>> 

Either the TinyProg-BootLoader state machines, or the SPI flash chip
used on the TinyFPGA BX, seems unable to *properly* program a partial
minor sector (ie, writing less than 256 octets), leading to a failed
write/verify cycle.  This seems to happen at the end of any programming
that is not an exact multiple of 256 octets.  Since the whole 256 octets
is already erased in program_sectors(), we pad out the short write to
a full 256 octets (with 0xff, which is the value read for "erased", to
reduce wear on on the flash cells).

For more detail on diagnosing this, see:
timvideos/litex-buildenv#137
"pip-wheel-metadata" is created alongside setup.py, by "pip install ."
(used for development).

See pypa/pip#6213 for discussion of this
clutter (issue currently open; might be moved to another location,
eg build/pip-wheel-metadata or .pip-wheel-metadata, in a later version).
Double the read timeout to reduce risk of short reads causing verify
errors.  Substantially increase the write timeout to try to reduce the
risk of an incomplete write, or a write being abandoned when it was
nearly finished due to a minor SPI flash delay.
Copy link
Collaborator

@mithro mithro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you start a write not on a sector boundary?

@ewenmcneill
Copy link
Contributor Author

@mithro According to the flash chip documentation, with program sectors, it'll "wrap around" within the 256 octet (minor) sector (start of page 15 of the datasheet), ie that method only adjusts certain low address values.

I think the way that program_sectors() is used (called by program_bitstream(), called by various places in perform_bootloader_update() and main()) probably means that it's always called with a starting address on a 256 octet boundary.

But you're right, tinyprog does accept arbitrary addresses, so I suppose it's possible that the first write might be a short one too. It looks like self.erase tries to handle this case (by read/erase/rewrite). But the rest of the writing is then done in 256-byte chunks, ignoring any "sector boundaries", so I think it would then hit the SPI flash "wrap around" behaviour with the SPI flash write method that program_sectors uses, and the result would be garbage in the SPI flash. So AFAICT it'll only actually work if the start address is on a 256-byte sector boundary.

We should probably (a) check that the start address is actually a 256-byte sector boundary (and fail out at the beginning if it is not), and (b) adjust the padding calculation so it can't wrap around in that case.

Ewen

@ewenmcneill
Copy link
Contributor Author

We should probably (a) check that the start address is actually a 256-byte sector boundary (and fail out at the beginning if it is not), and (b) adjust the padding calculation so it can't wrap around in that case.

I've added both of those. I believe the only regularly used programming locations are 0x28000 (start of user image) and 0x00000 (start of flash, for the bootloader), but it'll now throw an exception if someone tries to program somewhere else that is legal but not a multiple of 256. I also changed the code to calculate the offset from the start of the 256 octet sector boundary in the case where it's going to try to do a short write, and avoid wrapping in that case. (But in practice I think the only way you could trigger that behaviour and get sane results would be to be writing less than 256 octets total at an offset that is not a multiple of 256; which is now no longer supported.)

Ewen

PS: The programming offsets are shown in hex, but I only realised after testing that --addr would accept hex values (0xNNNN) due to the way parse_int() worked; I'd assumed it just did int()...

PPS: The "you can't do that" stack trace is ugly, but in theory should "never happen", so I've left it as just a safety check.

(LX P=tinyfpga_bx.minimal F=micropython R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ tinyprog -m
[
  {
    "boardmeta": {
      "name": "TinyFPGA BX",
      "fpga": "ice40lp8k-cm81",
      "hver": "1.0.0",
      "uuid": "1012cf3a-e168-4cf0-b517-d0e002789cfc"
    },
    "bootmeta": {
      "bootloader": "TinyFPGA USB Bootloader",
      "bver": "1.0.1",
      "update": "https://tinyfpga.com/update/tinyfpga-bx",
      "addrmap": {
        "bootloader": "0x000a0-0x28000",
        "userimage": "0x28000-0x50000",
        "userdata": "0x50000-0x100000"
      }
    },
    "port": "/dev/ttyACM0"
  }
]
(LX P=tinyfpga_bx.minimal F=micropython R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ perl -le 'print 0x28000;'
163840
(LX P=tinyfpga_bx.minimal F=micropython R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ tinyprog --program-image build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin --addr 163845

    TinyProg CLI
    ------------
    Using device id 1d50:6130
    Only one board with active bootloader, using it.

    Programming /dev/ttyACM0 with build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin
    Programming at addr 028005
    Waking up SPI flash
    442884 bytes to program
Traceback (most recent call last):
  File "/src/fpga/litex-buildenv/build/conda/bin/tinyprog", line 11, in <module>
    load_entry_point('tinyprog', 'console_scripts', 'tinyprog')()
  File "/src/fpga/dev-litex-tools/TinyFPGA-Bootloader/programmer/tinyprog/__main__.py", line 463, in main
    if not fpga.program_bitstream(addr, bitstream):
  File "/src/fpga/dev-litex-tools/TinyFPGA-Bootloader/programmer/tinyprog/__init__.py", line 612, in program_bitstream
    return self.program_sectors(addr, bitstream)
  File "/src/fpga/dev-litex-tools/TinyFPGA-Bootloader/programmer/tinyprog/__init__.py", line 510, in program_sectors
    format(minor_sector_size, addr))
ValueError: Address must be a multiple of 256, not 163845
(LX P=tinyfpga_bx.minimal F=micropython R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ tinyprog --program-image build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin --addr 163840

    TinyProg CLI
    ------------
    Using device id 1d50:6130
    Only one board with active bootloader, using it.

    Programming /dev/ttyACM0 with build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin
    Programming at addr 028000
    Waking up SPI flash
    442884 bytes to program
    Programming and Verifying: 446kB [00:11, 40.1kB/s]                          
    Success!

(LX P=tinyfpga_bx.minimal F=micropython R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ 

@ewenmcneill
Copy link
Contributor Author

Ah, it looks like self.write actually also tries to handle this 256 octet sector boundary (and do a short write at the start). So in theory writing at arbitrary addresses might actually work, it's just very inefficient (as program_sectors will self.erase in 4k chunks, which will do a bunch of rewriting at the margins to impedance match the SPI flash erase chunks to the relevant offsets, only for them to be erased/rewritten on the next sector written....). I'm dubious whether that's ever actually been used in anger though.

I'll repush a version that just adjusts the padding to reach the end of the 256 octet offset but not overrun it, and only apply if the original start address was a multiple of the minor sector size. Ie, only apply in the case where we were having problems.

Ewen

To avoid wrap around when writing beyond the 256 octet sector, if the
write starts part way into the 256 octet sector, only do the padding
to a full 256 octet sector if we are writing from the begining of the
sector (and thus writing a full 256 octets will not wrap around).

We also write 0xff for safety, as in most SPI flash that is the erased
value, and effectively one can only write 0 bits, so writing twice to
a cell (without erasing) writes (existing value AND new value), which
for a new value of 0xff should be the existing value.
@ewenmcneill ewenmcneill force-pushed the program-full-sectors branch from 6a81e58 to 0b573f3 Compare June 8, 2019 03:45
@ewenmcneill
Copy link
Contributor Author

ewenmcneill commented Jun 8, 2019

Now the special padding work around will only happen if we're aligned, which should be safe. But we don't try to force programming at aligned addresses. (I'm really not convinced that programming starting at unaligned addresses via program_sectors works, but it should work as well as it did previously.)

Still seems to solve the FIRWWARE=none problem.

While testing this I noticed that our MicroPython image is actually not a multiple of 256 octets (it's 4 bytes longer), nor is our stub image, yet both of always worked (give or take timeouts):

FIRMWARE=none: 180192 = 703 * 256 + 224
FIRMWARE=micropython: 442884 = 1730 * 256 + 4
FIRMWARE=stub: 203904 = 796 * 256 + 128

So there's apparently something Yet More Magical (tm) about the 224 byte value of FIRMWARE=none somewhere inside tinyprog, the TinyFPGA-Bootloader verilog, or the SPI flash, which causes it not to work when the others (4 bytes, 128 bytes) do work.

But:
(a) I've run out of patience to try to figure out precisely why the 224 byte partial sector is failing;
(b) rounding up aligned writes to a full SPI flash sector works; and
(c) as far as I'm aware, writing 0xff to a SPI flash is a no-op, because my understanding is that one gets a "1" in a flash cell by erasing a block of cells and not writing a "0" to it, and one gets a "0" in a flash cell by explicitly writing a zero to it (ie, draining that cell), and the extra 0xffs write no "0"s ; and
(d) in practice I think all the uses of TinyProg are writing at aligned addresses anyway, and that's the case of all the "address range" boundaries defined (eg, in the metadata); and
(e) AFAICT the TinyProg Bootloader doesn't have support for choosing among multiple images, so the only case where there is potentially trailing data that matters, after what is being written, is someone attempting to do a short write to patch the middle of their image, which honestly feels really really unlikely in practice.

So "this seems to work in practice", and I think at this point it shouldn't break anything in practice.

But if someone else wants to figure out why writing a 224 byte partial sector is the thing that fails (with weird data wrap around being observed -- see timvideos/litex-buildenv#137) then feel free. to do that instead.

Ewen

PS: Either way I think we should increase the read/write timeouts, which means if we're not going to merge this pull request we should probably cherry pick that change and the pip-wheel-metadata changes out of this branch and apply them separately.

@ewenmcneill
Copy link
Contributor Author

Joy. I was doing a final reprogram of MicroPython into the board before I put it away, and finally triggered a timeout. It seems there's something where even 5 seconds isn't enough and/or the bootloader/SPI state machine gets itself wedged.

tinyprog --program-image build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin

    TinyProg CLI
    ------------
    Using device id 1d50:6130
    Only one board with active bootloader, using it.

    Programming /dev/ttyACM0 with build/tinyfpga_bx_base_lm32.minimal//image-gateware+bios+micropython.bin
    Programming at addr 028000
    Waking up SPI flash
    442884 bytes to program
    Programming and Verifying:  94%|████████▍| 418k/443k [00:15<00:00, 26.8kB/s]
Failed to write to serial port:
Write timeout
targets/tinyfpga_bx/Makefile.mk:31: recipe for target 'image-flash-tinyfpga_bx' failed
make: *** [image-flash-tinyfpga_bx] Error 1
(LX P=tinyfpga_bx.minimal F=micropython R=default-target-tinyfpga-bx) ewen@parthenon:/src/fpga/litex-buildenv$ 

It worked fine when I did another "make image-flash" to program it a second time. So there might well be value in extracting out the "try programming each sector multiple times" logic from #16 as well as these timeouts changes. Especially for BootLoader reprogramming (which unfortunately get recommended to basically every new user, as a bunch of TinyFPGA BX boards shipped with an incorrectly programmed bootloader :-( )

Ewen

@mithro
Copy link
Collaborator

mithro commented Jun 8, 2019

@ewenmcneill Maybe we should have very large timeouts on programming the bootloader?

@ewenmcneill
Copy link
Contributor Author

Maybe we should have very large timeouts on programming the bootloader?

@mithro: That's a timeout on writing a single 256 octet minor sector!

5 seconds is a long timeout for writing 256 octets (bytes). Honestly even 1 second is a long timeout: that's roughly one octet ever 4ms. Even over 9600 bps serial that ought to be enough time, and it looks like the serial runs around 38400 bps in practice. 5 seconds/256 octets allows around 19ms per octet written. So I don't think it's the serial transfer itself that's the limiting factor.

5 seconds to write 256 octets is also a very long time in the context of writing to SPI flash. The write speed there is typically in the many hundreds of Kbps, if not Mbps (give or take how quickly they're clocked into the SPI flash).

So I expect we could set, eg, a 30 second timeout and still hit it in the same conditions as the 5 second / 1 second write timeout.

What still hitting a write timeout sometimes, even at 5 seconds, says to me is that there's a race condition in the USB / serial / TinyFPGA-Bootloader / SPI Flash interaction (and they're all basically chained together :-( ) which sometimes causes it to stop clocking through additional serial bytes. Causing the write to backlog indefinitely, and thus hit the really long timeout. No idea if that's on the USB/serial to BootLoader side or the BootLoader to SPI Flash side, or somewhere in between.

Short of debugging that TinyFPGA-Bootloader to eliminate all possible race conditions (formal methods?) the next best option seems to be to retry to the sector write instead of immediately bailing out with "oh, it didn't work, hope that didn't brick your board", which is the traditional/existing tinyprog handling of errors (ie, see #16, who report that retrying writes made it reliable for them).

Ewen

@mithro
Copy link
Collaborator

mithro commented Jun 8, 2019

I think retry is a good idea.

My guess is the timeout could occur because of issue in things like USB forwarding when using a VM.

@ewenmcneill
Copy link
Contributor Author

ewenmcneill commented Jun 8, 2019

FTR, I'm not using a VM. Ubuntu 16.04 LTS on Dell XPS9360, direct on the hardware. And still got the write timeout above (and a few others, with the same combination in the past). Same TinyFPGA BX, USB cable, USB port, laptop, etc, all work properly the other 95%+ of the time. Hence thinking there's an edge condition in the TinyFPGA Bootloader verilog (which AFAICT just got developed until it "almost always" worked).

The retry logic is in #16. Feel free to pull out the relevant bits and rework that and merge it too. (I've have many, many, other things on my todo list that need doing, so if it waits for me to pull out/rework bits of #16 it may well be Some Time (tm).)

Ewen

@mithro mithro merged commit 09b7046 into tinyfpga:master Jul 6, 2019
@ewenmcneill
Copy link
Contributor Author

FTR, this commit from Trammel Hudson (in a tree mentioned on Twitter) might be worth considering as well: osresearch@4367218. It skips erase/write on identical 4k sectors, and skips write/read back on 0xff filled 256-octet mini sectors (since 0xff is the erased value). Having merged this pull request it'll need a tiny bit of tweaking to insert in the right place, but the individual code blocks seem sensible to me.

Ewen

@mithro
Copy link
Collaborator

mithro commented Jul 6, 2019

@ewenmcneill - I think that commit is merged now?

@ewenmcneill
Copy link
Contributor Author

@ewenmcneill - I think that commit is merged now?

It's not merged yet. But it is in #54, so if that gets merged we'll pick it up. (Otherwise I think we probably want that erase/write optimisation logic anyway.)

Ewen

@mithro
Copy link
Collaborator

mithro commented Jul 6, 2019

@ewenmcneill - This pull request is merged now....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants