Skip to content

Commit 793de96

Browse files
committed
memory: Document taps, contention/interruptibility
1 parent 7501d0c commit 793de96

File tree

3 files changed

+317
-7
lines changed

3 files changed

+317
-7
lines changed

Diff for: docs/source/techspecs/cpu_device.rst

+167
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
CPU devices
2+
===========
3+
4+
.. contents:: :local:
5+
6+
7+
1. Overview
8+
-----------
9+
10+
CPU devices derivatives are used, unsurprisingly, to implement the
11+
emulation of CPUs, MCUs and SOCs. A CPU device is first a combination
12+
of ``device_execute_interface``, ``device_memory_interface``,
13+
``device_state_interface`` and ``device_disasm_interface``. Refer to
14+
the associated documentations when they exist.
15+
16+
Two more functionalities are specific to CPU devices which are the DRC
17+
and the interruptibility support.
18+
19+
20+
2. DRC
21+
------
22+
23+
TODO.
24+
25+
26+
3. Interruptibility
27+
-------------------
28+
29+
3.1 Definition
30+
~~~~~~~~~~~~~~
31+
32+
An interruptible CPU is defined as a core which is able to suspend the
33+
execution of a instruction at any time, exit execute_run, then at the
34+
next call of ``execute_run`` keep going from where it was. This
35+
includes begin able to abort an issued memory access, quit
36+
execute_run, then upon the next call of execute_run reissue the exact
37+
same access.
38+
39+
40+
3.2 Implementation requirements
41+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42+
43+
Memory accesses must be done with ``read_interruptible`` or
44+
``write_interruptible`` on a ``memory_access_specific`` or a
45+
``memory_access_cache``. The access must be done as bus width and bus
46+
alignment.
47+
48+
After each access the core must test whether ``icount <= 0``. This
49+
test should be done after ``icount`` is decremented of the time taken
50+
by the access itself, to limit the number of tests. When ``icount``
51+
reaches 0 or less it means that the instruction emulation needs to be
52+
suspended.
53+
54+
To know whether the access needs to be re-issued,
55+
``access_to_be_redone()`` needs to be called. If it returns true then
56+
the time taken by the access needs to be credited back, since it
57+
hasn't yet happened, and the access will need to be re-issued. The
58+
call to ``access_to_be_redone()`` clears the reissue flag. If you
59+
need to check the flag without clearing it use
60+
``access_to_be_redone_noclear()``.
61+
62+
The core needs to do enough bookkeeping to eventually restart the
63+
instruction execution just before the access or just after the test,
64+
depending on the need of reissue.
65+
66+
Finally, to indicate to the rest of the infrastructure the support, it
67+
must override cpu_is_interruptible() to return true.
68+
69+
70+
3.3 Example implementation with generators
71+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72+
73+
To ensure decent performance, the current implementations (h8, 6502
74+
and 68000) use a python generator to generate two versions of each
75+
instruction interpreter, one for the normal emulation, and one for
76+
restarting the instruction.
77+
78+
The restarted version looks like that (for a 4-cycles per access cpu):
79+
80+
.. code-block:: C++
81+
82+
void device::execute_inst_restarted()
83+
{
84+
switch(m_inst_substate) {
85+
case 0:
86+
[...]
87+
88+
m_address = [...];
89+
m_mask = [...];
90+
[[fallthrough]];
91+
case 42:
92+
m_result = specific.read_interruptible(m_address, m_mask);
93+
m_icount -= 4;
94+
if(m_icount <= 0) {
95+
if(access_to_be_redone()) {
96+
m_icount += 4;
97+
m_inst_substate = 42;
98+
} else
99+
m_inst_substate = 43;
100+
return;
101+
}
102+
[[fallthrough]];
103+
case 43:
104+
[...] = m_result;
105+
[...]
106+
}
107+
m_inst_substate = 0;
108+
return;
109+
}
110+
111+
The non-restarted version is the same thing with the switch and the
112+
final ``m_inst_substate`` clearing removed.
113+
114+
.. code-block:: C++
115+
116+
void device::execute_inst_non_restarted()
117+
{
118+
[...]
119+
m_address = [...];
120+
m_mask = [...];
121+
m_result = specific.read_interruptible(m_address, m_mask);
122+
m_icount -= 4;
123+
if(m_icount <= 0) {
124+
if(access_to_be_redone()) {
125+
m_icount += 4;
126+
m_inst_substate = 42;
127+
} else
128+
m_inst_substate = 43;
129+
return;
130+
}
131+
[...] = m_result;
132+
[...]
133+
return;
134+
}
135+
136+
The main loop then looks like this:
137+
138+
.. code-block:: C++
139+
140+
void device::execute_run()
141+
{
142+
if(m_inst_substate)
143+
call appropriate restarted instrution handler
144+
while(m_icount > 0) {
145+
debugger_instruction_hook(m_pc);
146+
call appropriate non-restarted instruction handler
147+
}
148+
}
149+
150+
The idea is thus that ``m_inst_substate`` indicates where in an
151+
instruction one is, but only when an interruption happens. It
152+
otherwise stays at 0 and is essentially never looked at. Having two
153+
versions of the interpretation allows to remove the overhead of the
154+
switch and the end-of-instruction substate clearing.
155+
156+
It is not a requirement to use a generator-based that method, but a
157+
different one which does not have unacceptable performance
158+
implications has not yet been found.
159+
160+
161+
3.4 Interaction with DRC
162+
~~~~~~~~~~~~~~~~~~~~~~~~
163+
164+
At this point, interruptibility and DRC are entirely incompatible. We
165+
do not have a method to quit the generated code before or after an
166+
access. It's theorically possible but definitely non-trivial.
167+

Diff for: docs/source/techspecs/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ MAME’s source or working on scripts that run within the MAME framework.
1616
device_rom_interface
1717
device_disasm_interface
1818
memory
19+
cpu_device
1920
floppy
2021
nscsi
2122
m6502

Diff for: docs/source/techspecs/memory.rst

+149-7
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,77 @@ or the view can be disabled using the ``disable`` method. A disabled
276276
view can be re-enabled at any time.
277277

278278

279+
.. _3.5:
280+
281+
3.5 Bus contention handling
282+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
283+
284+
Some specific CPUs have be upgraded to be interruptible which allows
285+
to add bus contention and wait states capabitilites. Being
286+
interruptible means, in practice, that an instruction can be
287+
interrupted at any time and the execute_run method of the core exited.
288+
Other devices can then run, then eventually controls returns to the
289+
core and the instruction continues from the point it was started.
290+
Importantly, this can be triggered from a handler and even be used to
291+
interrupt just before the access that is currently done
292+
(e.g. continuation will redo the access).
293+
294+
The CPUs supporting that declare their capability by overriding the
295+
method ``cpu_is_interruptible`` to return true.
296+
297+
Three intermediate contention handlers can be added to accesses:
298+
299+
* ``before_delay``: wait a number of cycles before doing the access.
300+
* ``after_delay``: wait a number of cycles after doing the access.
301+
* ``before_time``: wait for a given time before doing the access.
302+
303+
For the delay handlers, a method or lambda is called which returns the
304+
number of cycles to wait (as a u32).
305+
306+
The ``before_time`` is special. First, the time is compared to the
307+
current value of cpu->total_cycles(). That value is the number of
308+
cycles elapsed since the last reset of the cpu. It is passed as a
309+
parameter to the method as a u64 and must return the earliest time as
310+
a u64 when the access can be done, which can be equal to the passed-in
311+
time. From there two things can happen: either the running cpu has
312+
enough cycles left to consume to reach that time. In that case, the
313+
necessary number of cycles is consumed, and the access is done.
314+
Otherwise, when there isn't enough, the remaining cycles are consumed,
315+
the access aborted, scheduling happens, and eventually the access is
316+
redone. In that case the method is called again with the new current
317+
time, and must return the (probably same) earliest time again. This
318+
will happen until enough cycles to consume are available to directly
319+
do the access.
320+
321+
This approach allows to for instance handle consecutive DMAs. A first
322+
DMA grabs the bus for a transfer. This shows up as the method
323+
answering for the earliest time for access the time of the end of the
324+
dma. If no timer happens until that time the access will then happen
325+
just after the dma finishes. But if a timer elapses before that and
326+
as a consequence another dma is queued while the first is running, the
327+
cycle will be aborted for lack of remaining time, and the method will
328+
eventually be called again. It will then give the time of when the
329+
second dma will finish, and all will be well.
330+
331+
It can also allow to reduce said earlier time when circonstances
332+
require it. For instance a PIO latch that waits up to 64 cycles that
333+
data arrives can indicate that current time + 64 as a target (which
334+
will trigger a bus error for instance) but if a timer elapses and
335+
fills the latch meanwhile the method will be called again and that
336+
time can just return the current time to let the access pass though.
337+
Beware that if the timer elapsing did not fill the latch then the
338+
method must return the time it returned previously, e.g. the initial
339+
access time + 64, otherwise irrelevant timers happening or simply
340+
scheduling quantum effects will delay the timeout, possibly to
341+
infinity if the quantum is small enough.
342+
343+
Contention handlers on the same address are taken into account in the
344+
``before_time``, ``before_delay`` then ``after_delay`` order.
345+
Contention handlers of the same type on the same address at
346+
last-one-wins. Installing any non-contention handler on a range where
347+
a contention handler was removes it.
348+
349+
279350
4. Address maps API
280351
-------------------
281352

@@ -292,13 +363,14 @@ The general syntax for entries uses method chaining:
292363

293364
.. code-block:: C++
294365

295-
map(start, end).handler(...).handler_qualifier(...).range_qualifier();
366+
map(start, end).handler(...).handler_qualifier(...).range_qualifier().contention();
296367

297368
The values start and end define the range, the handler() block
298369
determines how the access is handled, the handler_qualifier() block
299-
specifies some aspects of the handler (memory sharing for instance) and
300-
the range_qualifier() block refines the range (mirroring, masking, lane
301-
selection, etc.).
370+
specifies some aspects of the handler (memory sharing for instance)
371+
and the range_qualifier() block refines the range (mirroring, masking,
372+
lane selection, etc.). The contention methods handle bus contention
373+
and wait states for cpus supporting them.
302374

303375
The map follows a “last one wins” principle, where the handler specified
304376
last is selected when multiple handlers match a given address.
@@ -607,7 +679,20 @@ behaviour. An example of use the i960 which marks burstable zones
607679
that way (they have a specific hardware-level support).
608680

609681

610-
4.5 View setup
682+
4.5 Contention
683+
~~~~~~~~~~~~~~
684+
685+
.. code-block:: C++
686+
687+
(...).before_time(method).(...)
688+
(...).before_delay(method).(...)
689+
(...).after_delay(method).(...)
690+
691+
These three methods allow to add the contention methods to a handler.
692+
See section `3.5`_. Multiple methods can be handler to one handler.
693+
694+
695+
4.6 View setup
611696
~~~~~~~~~~~~~~
612697

613698
.. code-block:: C++
@@ -641,6 +726,7 @@ can be installed only once. A view can also be part of “what was there
641726
before”.
642727

643728

729+
644730
5. Address space dynamic mapping API
645731
------------------------------------
646732

@@ -803,8 +889,32 @@ with an optional mirror and flags.
803889
Install a device address with an address map in a space. The
804890
``unitmask``, ``cswidth`` and ``flags`` arguments are optional.
805891

806-
5.9 View installation
807-
~~~~~~~~~~~~~~~~~~~~~
892+
5.9 Contention
893+
~~~~~~~~~~~~~~
894+
895+
.. code-block:: C++
896+
897+
using ws_time_delegate = device_delegate<u64 (offs_t, u64)>;
898+
using ws_delay_delegate = device_delegate<u32 (offs_t)>;
899+
900+
space.install_read_before_time(addrstart, addrend, addrmirror, ws_time_delegate)
901+
space.install_write_before_time(addrstart, addrend, addrmirror, ws_time_delegate)
902+
space.install_readwrite_before_time(addrstart, addrend, addrmirror, ws_time_delegate)
903+
904+
space.install_read_before_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
905+
space.install_write_before_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
906+
space.install_readwrite_before_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
907+
908+
space.install_read_after_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
909+
space.install_write_after_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
910+
space.install_readwrite_after_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
911+
912+
Install a contention handler in the decode path. The addrmirror
913+
parameter is optional.
914+
915+
916+
5.10 View installation
917+
~~~~~~~~~~~~~~~~~~~~~~
808918

809919
.. code-block:: C++
810920

@@ -820,3 +930,35 @@ by indexing to call a dynamic mapping method on it.
820930

821931
A view can be installed into a variant of another view without issues,
822932
with only the usual constraint of single installation.
933+
934+
5.11 Taps
935+
~~~~~~~~~
936+
937+
.. code-block:: C++
938+
939+
using tap = std::function<void (offs_t offset, uNN &data, uNN mem_mask)
940+
941+
memory_passthrough_handler mph = space.install_read_tap(addrstart, addrend, name, read_tap, &mph);
942+
memory_passthrough_handler mph = space.install_write_tap(addrstart, addrend, name, write_tap, &mph);
943+
memory_passthrough_handler mph = space.install_readwrite_tap(addrstart, addrend, name, read_tap, write_tap, &mph);
944+
945+
mph.remove();
946+
947+
A tap is a method that is be called when a specific range of addresses
948+
is accessed without overriding the actual access. Taps can change the
949+
data passed around. A write tap happens before the access, and can
950+
change the value to be written. A read tap happens after the access,
951+
and can change the value returned.
952+
953+
Taps must be of the same width and alignement than the bus. Multiple
954+
taps can act over the same addresses.
955+
956+
The ``memory_passthrough_handler`` object collates a number of taps
957+
and allow to remove them all in one call. The ``mph`` parameter is
958+
optional and a new one will be created if absent.
959+
960+
Taps are lost when a new handler is installed at the same addresses
961+
(under the usual principle of last one wins). If they need to be
962+
preserved, one should install a change notifier on the address space,
963+
and remove + reinstall the taps when notified.
964+

0 commit comments

Comments
 (0)