|
| 1 | +# RyuJIT: Porting to different platforms |
| 2 | + |
| 3 | +## What is a Platform? |
| 4 | +* Target instruction set and pointer size |
| 5 | +* Target calling convention |
| 6 | +* Runtime data structures (not really covered here) |
| 7 | +* GC encoding |
| 8 | + * So far only JIT32_GCENCODER and everything else |
| 9 | +* Debug info (so far mostly the same for all targets?) |
| 10 | +* EH info (not really covered here) |
| 11 | + |
| 12 | +One advantage of the CLR is that the VM (mostly) hides the (non-ABI) OS differences |
| 13 | + |
| 14 | +## The Very High Level View |
| 15 | +* 32 vs. 64 bits |
| 16 | + * This work is not yet complete in the backend, but should be sharable |
| 17 | +* Instruction set architecture: |
| 18 | + * instrsXXX.h, emitXXX.cpp and targetXXX.cpp |
| 19 | + * lowerXXX.cpp |
| 20 | + * codeGenXXX.cpp and simdcodegenXXX.cpp |
| 21 | + * unwindXXX.cpp |
| 22 | +* Calling Convention: all over the place |
| 23 | + |
| 24 | +## Front-end changes |
| 25 | +* Calling Convention |
| 26 | + * Struct args and returns seem to be the most complex differences |
| 27 | + * Importer and morph are highly aware of these |
| 28 | + * E.g. fgMorphArgs(), fgFixupStructReturn(), fgMorphCall(), fgPromoteStructs() and the various struct assignment morphing methods |
| 29 | + * HFAs on ARM |
| 30 | +* Tail calls are target-dependent, but probably should be less so |
| 31 | +* Intrinsics: each platform recognizes different methods as intrinsics (e.g. Sin only for x86, Round everywhere BUT amd64) |
| 32 | +* Target-specific morphs such as for mul, mod and div |
| 33 | + |
| 34 | +## Backend Changes |
| 35 | +* Lowering: fully expose control flow and register requirements |
| 36 | +* Code Generation: traverse blocks in layout order, generating code (InstrDescs) based on register assignments on nodes |
| 37 | + * Then, generate prolog & epilog, as well as GC, EH and scope tables |
| 38 | +* ABI changes: |
| 39 | + * Calling convention register requirements |
| 40 | + * Lowering of calls and returns |
| 41 | + * Code sequences for prologs & epilogs |
| 42 | + * Allocation & layout of frame |
| 43 | + |
| 44 | +## Target ISA "Configuration" |
| 45 | +* Conditional compilation (set in jit.h, based on incoming define, e.g. #ifdef X86) |
| 46 | +```C++ |
| 47 | +_TARGET_64_BIT_ (32 bit target is just ! _TARGET_64BIT_) |
| 48 | +_TARGET_XARCH_, _TARGET_ARMARCH_ |
| 49 | +_TARGET_AMD64_, _TARGET_X86_, _TARGET_ARM64_, _TARGET_ARM_ |
| 50 | +``` |
| 51 | +* Target.h |
| 52 | +* InstrsXXX.h |
| 53 | +
|
| 54 | +## Instruction Encoding |
| 55 | +* The instrDesc is the data structure used for encoding |
| 56 | + * It is initialized with the opcode bits, and has fields for immediates and register numbers. |
| 57 | + * instrDescs are collected into groups |
| 58 | + * A label may only occur at the beginning of a group |
| 59 | +* The emitter is called to: |
| 60 | + * Create new instructions (instrDescs), during CodeGen |
| 61 | + * Emit the bits from the instrDescs after CodeGen is complete |
| 62 | + * Update Gcinfo (live GC vars & safe points) |
| 63 | +
|
| 64 | +## Adding Encodings |
| 65 | +* The instruction encodings are captured in instrsXXX.h. These are the opcode bits for each instruction |
| 66 | +* The structure of each instruction's encoding is target-dependent |
| 67 | +* An "instruction" is just the representation of the opcode |
| 68 | +* An instance of "instrDesc" represents the instruction to be emitted |
| 69 | +* For each "type" of instruction, emit methods need to be implemented. These follow a pattern but a target may have unique ones, e.g. |
| 70 | +```C++ |
| 71 | +emitter::emitInsMov(instruction ins, emitAttr attr, GenTree* node) |
| 72 | +emitter::emitIns_R_I(instruction ins, emitAttr attr, regNumber reg, ssize_t val) |
| 73 | +emitter::emitInsTernary(instruction ins, emitAttr attr, GenTree* dst, GenTree* src1, GenTree* src2) (currently Arm64 only) |
| 74 | +``` |
| 75 | + |
| 76 | +## Lowering |
| 77 | +* Lowering ensures that all register requirements are exposed for the register allocator |
| 78 | + * Use count, def count, "internal" reg count, and any special register requirements |
| 79 | + * Does half the work of code generation, since all computation is made explicit |
| 80 | + * But it is NOT necessarily a 1:1 mapping from lowered tree nodes to target instructions |
| 81 | + * Its first pass does a tree walk, transforming the instructions. Some of this is target-independent. Notable exceptions: |
| 82 | + * Calls and arguments |
| 83 | + * Switch lowering |
| 84 | + * LEA transformation |
| 85 | + * Its second pass walks the nodes in execution order |
| 86 | + * Sets register requirements |
| 87 | + * sometimes changes the register requirements children (which have already been traversed) |
| 88 | + * Sets the block order and node locations for LSRA |
| 89 | + * LinearScan:: startBlockSequence() and LinearScan::moveToNextBlock() |
| 90 | + |
| 91 | +## Register Allocation |
| 92 | +* Register allocation is largely target-independent |
| 93 | + * The second phase of Lowering does nearly all the target-dependent work |
| 94 | +* Register candidates are determined in the front-end |
| 95 | + * Local variables or temps, or fields of local variables or temps |
| 96 | + * Not address-taken, plus a few other restrictions |
| 97 | + * Sorted by lvaSortByRefCount(), and marked "lvTracked" |
| 98 | + |
| 99 | +## Addressing Modes |
| 100 | +* The code to find and capture addressing modes is particularly poorly abstracted |
| 101 | +* genCreateAddrMode(), in CodeGenCommon.cpp traverses the tree looking for an addressing mode, then captures its constituent elements (base, index, scale & offset) in "out parameters" |
| 102 | + * It optionally generates code |
| 103 | + * For RyuJIT, it NEVER generates code, and is only used by gtSetEvalOrder, and by lowering |
| 104 | + |
| 105 | +## Code Generation |
| 106 | +* For the most part, the code generation method structure is the same for all architectures |
| 107 | + * Most code generation methods start with "gen" |
| 108 | +* Theoretically, CodeGenCommon.cpp contains code "mostly" common to all targets (this factoring is imperfect) |
| 109 | + * Method prolog, epilog, |
| 110 | +* genCodeForBBList |
| 111 | + * walks the trees in execution order, calling genCodeForTreeNode, which needs to handle all nodes that are not "contained" |
| 112 | + * generates control flow code (branches, EH) for the block |
0 commit comments