Skip to content

About MMU mapping on ARM64 #46477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
carlocaione opened this issue Jun 13, 2022 · 16 comments
Open

About MMU mapping on ARM64 #46477

carlocaione opened this issue Jun 13, 2022 · 16 comments
Assignees
Labels
area: ARM64 ARM (64-bit) Architecture Enhancement Changes/Updates/Additions to existing features

Comments

@carlocaione
Copy link
Collaborator

Facts

On ARM64 we can MMU-map a memory region in two different ways:

  • Directly interfacing with the MMU code
  • Going through the Zephyr MMU / device MMIO APIs.

Direct interface with MMU code for direct mapping

This is done by the ARM64 MMU code to setup the basic Zephyr regions (text, data, etc..) in:

static const struct arm_mmu_flat_range mmu_zephyr_ranges[] = {
/* Mark the zephyr execution regions (data, bss, noinit, etc.)
* cacheable, read-write
* Note: read-write region is marked execute-never internally
*/
{ .name = "zephyr_data",
.start = _image_ram_start,
.end = _image_ram_end,
.attrs = MT_NORMAL | MT_P_RW_U_NA | MT_DEFAULT_SECURE_STATE },
/* Mark text segment cacheable,read only and executable */
{ .name = "zephyr_code",
.start = __text_region_start,
.end = __text_region_end,
.attrs = MT_NORMAL | MT_P_RX_U_RX | MT_DEFAULT_SECURE_STATE },
/* Mark rodata segment cacheable, read only and execute-never */
{ .name = "zephyr_rodata",
.start = __rodata_region_start,
.end = __rodata_region_end,
.attrs = MT_NORMAL | MT_P_RO_U_RO | MT_DEFAULT_SECURE_STATE },
#ifdef CONFIG_NOCACHE_MEMORY
/* Mark nocache segment noncachable, read-write and execute-never */
{ .name = "nocache_data",
.start = _nocache_ram_start,
.end = _nocache_ram_end,
.attrs = MT_NORMAL_NC | MT_P_RW_U_RW | MT_DEFAULT_SECURE_STATE },
#endif
};

but it also used by the soc-specific code to map regions for peripherals that do not support the device MMIO APIs, for example in:

static const struct arm_mmu_region mmu_regions[] = {
MMU_REGION_FLAT_ENTRY("GIC",
DT_REG_ADDR_BY_IDX(DT_INST(0, arm_gic), 0),
DT_REG_SIZE_BY_IDX(DT_INST(0, arm_gic), 0),
MT_DEVICE_nGnRnE | MT_P_RW_U_NA | MT_DEFAULT_SECURE_STATE),
MMU_REGION_FLAT_ENTRY("GIC",
DT_REG_ADDR_BY_IDX(DT_INST(0, arm_gic), 1),
DT_REG_SIZE_BY_IDX(DT_INST(0, arm_gic), 1),
MT_DEVICE_nGnRnE | MT_P_RW_U_NA | MT_DEFAULT_SECURE_STATE),
};

This mapping is done directly in the MMU driver code and it is usually a direct (1:1) mapping.

Using the device MMIO (or MMU) APIs

There has been lately a certain effort to make the drivers using the device MMIO APIs. These API are leveraging the Zephyr MMU code to map the physical MMIO region of a peripheral to a virtual memory region automatically at init time (see include/zephyr/sys/device_mmio.h)

In general the mapping is not a direct mapping, but instead the virtual region is carved out from a memory pool of virtual addresses configured using CONFIG_KERNEL_VM_BASE and CONFIG_KERNEL_VM_SIZE.

Problems

There are several.

  1. The two methods are orthogonal, the only point of contact is the MMU driver that is actually doing the mapping.
  2. The Zephyr MMU code is using a simple mechanism to keep tracking of the allocated pages that is being bypassed by the direct interface with the MMU code, so in theory could be conflicts.
  3. Especially on ARM64 we have (theoretically) plenty of virtual memory so we really would like to do direct mapping for MMIO driver regions but this is not currently possible with the Zephyr MMU code.

Solution?

The easiest one is to give up the direct interface and instead relying exclusively on the Zephyr MMU code. This would force us to give up the 1:1 mapping or adding support for that.

Tagging the main actors involved @dcpleung @npitre @povergoing

@carlocaione carlocaione added bug The issue is a bug, or the PR is fixing a bug Enhancement Changes/Updates/Additions to existing features and removed bug The issue is a bug, or the PR is fixing a bug labels Jun 13, 2022
@carlocaione carlocaione self-assigned this Jun 13, 2022
@povergoing
Copy link
Member

My concern would be, that the MPU should support the MMIO region too, but these MMIO APIs cannot be reused by MPU if it is not a 1:1 mapping design? You can simply consider the MPU as an MMU that only supports 1:1 mapping.

Does the 1:1 mapping or direct mapping mean virt_addr = phy_addr, IIUC? I didn't learn why the APIs in device_mmio.h needs non-direct mapping since the Zephyr is designed to be a single-memory-space OS.

I am not sure if it is suitable or how difficult it is. Is it possible that we re-use the kernel partitions? Add mmu_zephyr_ranges and all (or part of them marked by some label) peripherals regions defined in DTS into kernel partitions so that MMU or MPU could only consider how to fulfill the kernel partitions. Also, the APIs in device_mmio.h do nothing but add the region into kernel partitions.

@carlocaione
Copy link
Collaborator Author

My concern would be, that the MPU should support the MMIO region too, but these MMIO APIs cannot be reused by MPU if it is not a 1:1 mapping design? You can simply consider the MPU as an MMU that only supports 1:1 mapping.

Well, I was not aware of that and this is definitely concerning (:hankey:)

Does the 1:1 mapping or direct mapping mean virt_addr = phy_addr, IIUC?

Yes.

I didn't learn why the APIs in device_mmio.h needs non-direct mapping since the Zephyr is designed to be a single-memory-space OS.

I think @dcpleung could shed some light on this.

But the point is that when a physical address needs to be mapped using z_phys_map() the destination virtual address is obtained by a pool of virtual address and then mapped using arch_mem_map(). See:

zephyr/kernel/mmu.c

Lines 736 to 751 in d130160

/* Obtain an appropriately sized chunk of virtual memory */
dest_addr = virt_region_alloc(aligned_size, align_boundary);
if (!dest_addr) {
goto fail;
}
/* If this fails there's something amiss with virt_region_get */
__ASSERT((uintptr_t)dest_addr <
((uintptr_t)dest_addr + (size - 1)),
"wraparound for virtual address %p (size %zu)",
dest_addr, size);
LOG_DBG("arch_mem_map(%p, 0x%lx, %zu, %x) offset %lu", dest_addr,
aligned_phys, aligned_size, flags, addr_offset);
arch_mem_map(dest_addr, aligned_phys, aligned_size, flags);

so it definitely is not a 1:1 mapping (or AFAICT).

I am not sure if it is suitable or how difficult it is. Is it possible that we re-use the kernel partitions? Add mmu_zephyr_ranges and all (or part of them marked by some label) peripherals regions defined in DTS into kernel partitions so that MMU or MPU could only consider how to fulfill the kernel partitions. Also, the APIs in device_mmio.h do nothing but add the region into kernel partitions.

Uhm, this seems more complicated than adding support for 1:1 in the current API.

@carlocaione
Copy link
Collaborator Author

Well, I was not aware of that and this is definitely concerning (hankey)

Oh well, maybe not. I just checked and when MMU is not present (i.e. you have MPU), the device MMIO APIs are not mapping anything and you are basically back to accessing straight the phys.

@dcpleung
Copy link
Member

The device MMIO was introduced before I took over userspace, so the design decision is a bit fuzzy. But IIRC, it is working similar to the Linux Kernel where the MMIO range is not 1:1 mapping in general (at least on x86).

Just wondering what would be the use case for having 1:1 mapping? I can see that it would make debugging easier, but in production, does it matter where the hardware registers are mapped?

@carlocaione
Copy link
Collaborator Author

Just wondering what would be the use case for having 1:1 mapping? I can see that it would make debugging easier, but in production, does it matter where the hardware registers are mapped?

Well, the big issue with Zephyr is that 95% of the drivers are not using the device MMIO API and that means that they are basically accessing the physical address all the times (usually the physical address is retrieved from the DT with the usual DT_REG_ADDR, saved into the config struct and used to access the various registers).

So either you fix the driver adding support for the MMIO API (so the driver uses the virt address instead of phys) or you add a 1:1 mapping leaving the driver unfixed. See for example what happened here #46443 (comment).

This is a huge problem IMHO.

@dcpleung
Copy link
Member

Well, the big issue with Zephyr is that 95% of the drivers are not using the device MMIO API and that means that they are basically accessing the physical address all the times (usually the physical address is retrieved from the DT with the usual DT_REG_ADDR, saved into the config struct and used to access the various registers).

So either you fix the driver adding support for the MMIO API (so the driver uses the virt address instead of phys) or you add a 1:1 mapping leaving the driver unfixed. See for example what happened here #46443 (comment).

This is a huge problem IMHO.

Driver not using MMIO API is indeed a huge issue when dealing with MMU as those addresses by default are not accessible. Though I was asking what were the use cases when using the MMIO API. I would assume a proper MMU implementation allows I/O addresses to be mapped into virtual space.

@carlocaione
Copy link
Collaborator Author

Though I was asking what were the use cases when using the MMIO API.

Oh right, I probably explained myself badly.

So, If you are using the MMIO API and the driver supports it there is indeed no problem, we are fine in that case even without a 1:1 mapping.

We still have to deal with the case where the driver is not using the MMIO API. In this case for ARM64 we are bypassing this problem by directly creating the 1:1 mapping using the MMU driver but entirely bypassing the Zephyr MMU code. So my suggestion was for this second case: removing the direct interface with the MMU driver and instead relying on the Zephyr MMU code to create the 1:1 mapping for all the driver still not supporting MMIO API.

@dcpleung
Copy link
Member

dcpleung commented Jun 13, 2022

Maybe we can convert those drivers to use the device MMIO API when they are being included? TBH, anything we do now to make those non-"device MMIO API" enabled devices work would be a stop-gap effort. So I think the proper way going forward is to convert them to use device MMIO API. Though... I don't know how many you will need to do at the moment. Could you hazard a guess on what you need for your development at the moment?

@povergoing
Copy link
Member

removing the direct interface with the MMU driver and instead relying on the Zephyr MMU code to create the 1:1 mapping for all the driver still not supporting MMIO API.

Cool, that means, if we want MPU to support MMIO API instead of a big device region, we can extend the non-MMU case?

@carlocaione
Copy link
Collaborator Author

Maybe we can convert those drivers to use the device MMIO API when they are being included? TBH, anything we do now to make those non-"device MMIO API" enabled devices work would be a stop-gap effort. So I think the proper way going forward is to convert them to use device MMIO API.

Yes, this is indeed what I'm trying to do while reviewing new drivers submission: convince people to use MMIO API.

I don't know how many you will need to do at the moment. Could you hazard a guess on what you need for your development at the moment?

I don't need any for my development but: (1) this must be considered for new drivers submission and (2) this is part of a cleanup work to remove the mmu_regions for good.

About the point (2) in general having the two methods (the MMIO API and the direct mapping using mmu_regions) is confusing for developers and prone to errors in the long term (what if the MMIO API is mapping to a virt address that is already mapped by the MMU driver for example?).

@carlocaione
Copy link
Collaborator Author

Cool, that means, if we want MPU to support MMIO API instead of a big device region, we can extend the non-MMU case?

Possibly? But the MPU case is definitely easier (and more limited since you have a limited number of slots) and I'm not sure if going through the MMIO API is worth it

@ibirnbaum
Copy link
Member

As far as I can tell from this discussion, the MMIO interface is intended for mapping devices' register spaces, but what about DMA areas?

Take the Xilinx Ethernet driver, for example: the DT of the two SoC families that support it define an OCM memory area to be used for the DMA. I can obtain that physical address via a 'chosen' entry which is configurable at the board level. At the SoC level, an identity mapping is set up via the mmu_regions table using just that information from the DT.

The driver declares the DMA area for each activated instance of the device (size may vary between instances, DMA parameters such as buffer count/size are configurable on a per-device basis) as a struct, of which one instance is placed in the OCM memory area using section and __aligned attributes:

#define ETH_XLNX_GEM_DMA_AREA_INST(port) \
static struct eth_xlnx_dma_area_gem##port eth_xlnx_gem##port##_dma_area\
__ocm_bss_section __aligned(4096);

Any access to those structs happen on the basis of the physical address, and the controller requires writing the physical addresses of certain members of that struct to its registers (namely TX queue base address, RX queue base address), which can just be obtained using &eth_xlnx_gem##port##_dma_area.some_member.

Will there be a way to map a DMA area aside from a device's register space, and will there be a way to resolve its physical address? What about situations like this one where the linker inserts references to the physical address based on section placement of data?

@ibirnbaum
Copy link
Member

ibirnbaum commented Jun 14, 2022

Also, if getting rid of the mmu_regions table entirely is the eventual goal, how will we handle required mappings that are not associated with any driver, but are required for the SoC code and maybe also some driver code to work properly? For example, the Zynq maps:

  • the 4k page @ 0x00000000 for the exception vectors. That one is initially RW upon power-up, the vectors are being copied to that location early on, and once the MMU comes up, it's re-configured to RX.
  • System Level Control Registers (SLCR, required by all drivers which generate a clock frequency or baud rate of some sort, clock prescaler settings are bundled here) @ 0xF8000000
  • MPCore @ 0xF8F00000, the ARM Architected Timer and the GIC are located here, at least the timer driver does map its slice of that memory by itself.

Will all that be moved to the device tree, including permissions?

@carlocaione
Copy link
Collaborator Author

carlocaione commented Jun 14, 2022

As far as I can tell from this discussion, the MMIO interface is intended for mapping devices' register spaces, but what about DMA areas?

That's not part of the discussion really. The MMIO API is used only to map the MMIO registers space of the drivers, it's basically the Zephyr equivalente of the devm_ioremap_resource() linux call.

Take the Xilinx Ethernet driver, for example: the DT of the two SoC families that support it define an OCM memory area to be used for the DMA. I can obtain that physical address via a 'chosen' entry which is configurable at the board level. At the SoC level, an identity mapping is set up via the mmu_regions table using just that information from the DT.

You can keep doing that if you want.

Will there be a way to map a DMA area aside from a device's register space, and will there be a way to resolve its physical address?

You can create a 1:1 mapping using mmu_regions table and then using the physical address, or you can use something like z_phys_map() to create the mapping taking care of using the returned virtual address.

Also, if getting rid of the mmu_regions table entirely is the eventual goal, how will we handle required mappings that are not associated with any driver, but are required for the SoC code and maybe also some driver code to work properly?

I want to get rid of the mmu_regions table when this is used to map the MMIO region of drivers, because this is something that we should have done a long time ago already. All the other use cases are to evaluated on a case by case basis. If you need to map anything different from that you can keep using it or you can use something more fancy like z_phys_map() or k_mem_map().

Will all that be moved to the device tree, including permissions?

No.

@ibirnbaum
Copy link
Member

@carlocaione Thanks for the info!

@dcpleung
Copy link
Member

I am all for nudging everyone to use the device MMIO API. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM64 ARM (64-bit) Architecture Enhancement Changes/Updates/Additions to existing features
Projects
None yet
Development

No branches or pull requests

5 participants