Add Disassembler #89

wnienhaus · 2023-07-02T14:17:41Z

This PR adds a disassembler for ESP32 ULP binaries. (See docs/disassembler.rst for details)

It can disassemble ULP binaries as well as snippets of hex bytes (e.g. from an xxd output) into ULP instructions.

This tool was built primarily for making debugging of the assembler easier but may be useful for other use cases.

Note, that instructions printed by the disassembler show values according to what is encoded into the actual binary instruction, not what was originally specified as a value during assembly. For example JUMP instructions take an offset in bytes during assembly, whereas the binary instruction contains the offset as number of words (bytes divided by 4). The disassembler will show number of words, not number of bytes for JUMP instructions.

The work-horse code of this disassembler already exists for some time (I used it when implementing #50) and this PR now cleans it all up and makes it into a useable tool.

I am already using this to help with implementing S2 support (#85)

Pass bytes from a hexdump in as command line arguments, eg: micropython -m tools.disassemble 401f 0040 (If the byte sequence is not quoted, all args are joined together into a single byte sequence. Spaces are allowed and will be ignored)

In this approach, each opcode has its own decoding (using the correct struct for each opcode). Each opcode (or opcode+subopcode) also has its own rendering function. The lookup table is hierarchical so the same structure used for opcodes is also used within opcodes for looking up subopcodes.

Useful for running just one unit test file instead of all. Now one can pass the name of a unit test (or a list of names) to the 00_unit_tests.sh script. Example: cd tests ./00_unit_tests.sh disassemble # run only disassemble.py The default (if nothing is passed the script) is still to run all tests as before.

These are likely memory left empty for storing data.

The original "manual disassembling" now requires the "-m" option, followed by the sequence of hex digits representing the instructions. The sequence of hex digits does not need to be quoted. All parameters after -m will be joined together into a sequence of hex digits.

Now the instruction (hex) and disassembled code will appear on one line next to each other and the bytes are no longer printed with Python specific formatting (not wrapped in b''). This results in a much cleaner looking output. Example output: 40008072 MOVE r0, 4 010000d0 LD r1, r0, 0

Offsets are in number of bytes (matches how 'GNU as' outputs listings)

If the magic bytes in the header are not 'ulp\0' then the file is not a ULP binary or otherwise corrupt.

Some values are easier to read as hex values than as decimal. For example peripheral register addresses like 0x123 where the first digit (1) indicates which peripheral register to address, while the remaining 2 digits (0x23) are the offset within that register in number of 32-bit words. Also absolute JUMP addresses are easier to find via the hex value given that the disassembler includes the byte offset of each instruction in hex format.

wnienhaus · 2023-07-02T14:23:10Z

@ThomasWaldmann If you have a bit of time, I'd value your feedback. You have very critical eye (in a positive sense). But I know you're not spending any time on this project anymore, so no hard feelings if you pass.

Btw, ESP32-S2 support is basically done and I'm busy cleaning that up, so soon there will be something more useful to look at.

Test both disassembling a file (assembled from source for the test), and disassembling a byte sequence provided on the command line. Source code to be assembled and expected disassembler listings are provided in the tests/fixtures directory.

dpgeorge · 2023-07-12T00:42:04Z

tools/disassemble.py

@@ -0,0 +1,320 @@
+from uctypes import struct, addressof, LITTLE_ENDIAN, UINT16, UINT32


It might be worth adding an MIT license and copyright to this file.

Thanks for that comment. Would you suggest generally adding this to all files in the repo? Or is there something about the disassembler specifically that makes it better to add here?

We currently have a LICENSE file in the repository root stating this is licensed under MIT and copyright by those listed in the AUTHORS file.

I see Micropython itself has the MIT licence at the beginning of all its files. I guess that is better, in case a file is distributed on its own somehow. Perhaps I'll make a commit separate to the PR to add the licence and copyright to all files.

Would you suggest generally adding this to all files in the repo?

Yes. That makes it very clear for anyone who copies the file what the license/copyright is.

You can add a short header using the SPDX-License-Identifier format. Or a long one like in the micropython repo.

Perhaps I'll make a commit separate to the PR to add the licence and copyright to all files.

That sounds good!

I'ld rather dislike having the same (long) license text as a header in each file.

If there is a header in a file, I'ld rather expect it to roughly tell what's inside that file and after that there could be also a short notice identifying the license.

Ok, so perhaps we'll go the SPDX-License-Identifier route? I'll look at how that works, and how to add the copyright notice.

wnienhaus · 2023-07-12T15:58:04Z

I will now merge this PR. I have moved the License header topic to a new issue (#90) where we can discuss this further.

wnienhaus added 15 commits July 2, 2023 16:42

First crude version of a disassembler

ce82cbb

Pass bytes from a hexdump in as command line arguments, eg: micropython -m tools.disassemble 401f 0040 (If the byte sequence is not quoted, all args are joined together into a single byte sequence. Spaces are allowed and will be ignored)

Add command line handling, implementing help (-h)

8325c2b

Tease apart decoding of instruction and printing. Add unit tests.

e4b34e2

Add unit tests for field level output

278bbf0

Show empty "instructions" as <empty>

6720584

These are likely memory left empty for storing data.

Add verbose option. Hide field level detail when not verbose.

a4867e8

Add byte offset to output to make seeing offsets easier

40ea7e9

Offsets are in number of bytes (matches how 'GNU as' outputs listings)

use text_offset from ULP header instead of hardcoded offset

08bb182

Output header in verbose mode. Also validate ULP header.

15a631a

If the magic bytes in the header are not 'ulp\0' then the file is not a ULP binary or otherwise corrupt.

Print .text and .data section separately

59766fb

wnienhaus force-pushed the disassembler branch 2 times, most recently from e9ad923 to eac277b Compare July 2, 2023 20:07

wnienhaus added 2 commits July 11, 2023 19:10

Add integration tests for disassembler

eff6f96

Test both disassembling a file (assembled from source for the test), and disassembling a byte sequence provided on the command line. Source code to be assembled and expected disassembler listings are provided in the tests/fixtures directory.

Add documentation for disassembler

06b277e

wnienhaus force-pushed the disassembler branch from eac277b to 06b277e Compare July 11, 2023 16:15

dpgeorge reviewed Jul 12, 2023

View reviewed changes

wnienhaus mentioned this pull request Jul 12, 2023

Add License/Copyright to all files #90

Closed

wnienhaus merged commit 5c4d016 into micropython:master Jul 12, 2023

wnienhaus deleted the disassembler branch August 5, 2023 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Disassembler #89

Add Disassembler #89

Uh oh!

wnienhaus commented Jul 2, 2023

Uh oh!

wnienhaus commented Jul 2, 2023

Uh oh!

dpgeorge Jul 12, 2023

Uh oh!

wnienhaus Jul 12, 2023

Uh oh!

dpgeorge Jul 12, 2023

Uh oh!

ThomasWaldmann Jul 12, 2023

Uh oh!

wnienhaus Jul 12, 2023

Uh oh!

wnienhaus commented Jul 12, 2023

Uh oh!

Uh oh!

		@@ -0,0 +1,320 @@
		from uctypes import struct, addressof, LITTLE_ENDIAN, UINT16, UINT32

Add Disassembler #89

Add Disassembler #89

Uh oh!

Conversation

wnienhaus commented Jul 2, 2023

Uh oh!

wnienhaus commented Jul 2, 2023

Uh oh!

dpgeorge Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

wnienhaus Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

dpgeorge Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

wnienhaus Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

wnienhaus commented Jul 12, 2023

Uh oh!

Uh oh!