-
Notifications
You must be signed in to change notification settings - Fork 25
Add Disassembler #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Disassembler #89
Conversation
Pass bytes from a hexdump in as command line arguments, eg: micropython -m tools.disassemble 401f 0040 (If the byte sequence is not quoted, all args are joined together into a single byte sequence. Spaces are allowed and will be ignored)
In this approach, each opcode has its own decoding (using the correct struct for each opcode). Each opcode (or opcode+subopcode) also has its own rendering function. The lookup table is hierarchical so the same structure used for opcodes is also used within opcodes for looking up subopcodes.
Useful for running just one unit test file instead of all. Now one can pass the name of a unit test (or a list of names) to the 00_unit_tests.sh script. Example: cd tests ./00_unit_tests.sh disassemble # run only disassemble.py The default (if nothing is passed the script) is still to run all tests as before.
These are likely memory left empty for storing data.
The original "manual disassembling" now requires the "-m" option, followed by the sequence of hex digits representing the instructions. The sequence of hex digits does not need to be quoted. All parameters after -m will be joined together into a sequence of hex digits.
Now the instruction (hex) and disassembled code will appear on one line next to each other and the bytes are no longer printed with Python specific formatting (not wrapped in b''). This results in a much cleaner looking output. Example output: 40008072 MOVE r0, 4 010000d0 LD r1, r0, 0
Offsets are in number of bytes (matches how 'GNU as' outputs listings)
If the magic bytes in the header are not 'ulp\0' then the file is not a ULP binary or otherwise corrupt.
Some values are easier to read as hex values than as decimal. For example peripheral register addresses like 0x123 where the first digit (1) indicates which peripheral register to address, while the remaining 2 digits (0x23) are the offset within that register in number of 32-bit words. Also absolute JUMP addresses are easier to find via the hex value given that the disassembler includes the byte offset of each instruction in hex format.
@ThomasWaldmann If you have a bit of time, I'd value your feedback. You have very critical eye (in a positive sense). But I know you're not spending any time on this project anymore, so no hard feelings if you pass. Btw, ESP32-S2 support is basically done and I'm busy cleaning that up, so soon there will be something more useful to look at. |
e9ad923
to
eac277b
Compare
Test both disassembling a file (assembled from source for the test), and disassembling a byte sequence provided on the command line. Source code to be assembled and expected disassembler listings are provided in the tests/fixtures directory.
@@ -0,0 +1,320 @@ | |||
from uctypes import struct, addressof, LITTLE_ENDIAN, UINT16, UINT32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding an MIT license and copyright to this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that comment. Would you suggest generally adding this to all files in the repo? Or is there something about the disassembler specifically that makes it better to add here?
We currently have a LICENSE file in the repository root stating this is licensed under MIT and copyright by those listed in the AUTHORS file.
I see Micropython itself has the MIT licence at the beginning of all its files. I guess that is better, in case a file is distributed on its own somehow. Perhaps I'll make a commit separate to the PR to add the licence and copyright to all files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you suggest generally adding this to all files in the repo?
Yes. That makes it very clear for anyone who copies the file what the license/copyright is.
You can add a short header using the SPDX-License-Identifier
format. Or a long one like in the micropython
repo.
Perhaps I'll make a commit separate to the PR to add the licence and copyright to all files.
That sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'ld rather dislike having the same (long) license text as a header in each file.
If there is a header in a file, I'ld rather expect it to roughly tell what's inside that file and after that there could be also a short notice identifying the license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so perhaps we'll go the SPDX-License-Identifier
route? I'll look at how that works, and how to add the copyright notice.
I will now merge this PR. I have moved the License header topic to a new issue (#90) where we can discuss this further. |
This PR adds a disassembler for ESP32 ULP binaries. (See docs/disassembler.rst for details)
It can disassemble ULP binaries as well as snippets of hex bytes (e.g. from an
xxd
output) into ULP instructions.This tool was built primarily for making debugging of the assembler easier but may be useful for other use cases.
Note, that instructions printed by the disassembler show values according to what is encoded into the actual binary instruction, not what was originally specified as a value during assembly. For example
JUMP
instructions take an offset in bytes during assembly, whereas the binary instruction contains the offset as number of words (bytes divided by 4). The disassembler will show number of words, not number of bytes for JUMP instructions.The work-horse code of this disassembler already exists for some time (I used it when implementing #50) and this PR now cleans it all up and makes it into a useable tool.
I am already using this to help with implementing S2 support (#85)