Skip to content

(docs) CP-53645 on xenguest: Add walk-through on claiming and populating VM memory #6373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion doc/content/lib/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
title: Libraries
Copy link
Member

@psafont psafont Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: Libraries
title: C Libraries in Xen

hidden: true
---
{{% children description=true %}}
5 changes: 5 additions & 0 deletions doc/content/lib/xen/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: Xen
description: Insights into Xen hypercall functions exposed to the toolstack
---
{{% children description=true %}}
68 changes: 68 additions & 0 deletions doc/content/lib/xen/get_free_buddy-flowchart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: Flowchart of get_free_buddy() of the Xen Buddy allocator
hidden: true
---
```mermaid
flowchart TD

alloc_round_robin
--No free memory on the host-->
Failure

node_affinity_exact
--No free memory<br>on the Domain's
node_affinity nodes:<br>Abort exact allocation-->
Failure

get_free_buddy["get_free_buddy()"]
-->MEMF_node{memflags<br>&<br>MEMF_node?}
--Yes-->
try_MEMF_node{Alloc
from
node}
--Success: page-->
Success
try_MEMF_node
--No free memory on the node-->
MEMF_exact{memflags
&
MEMF_exact?}
--"No"-->
node_affinity_set{NUMA affinity set?}
-- Domain->node_affinity
is not set: Fall back to
round-robin allocation
--> alloc_round_robin

MEMF_exact
--Yes:
As there is not enough
free memory on the
exact NUMA node(s):
Abort exact allocation
-->Failure

MEMF_node
--No NUMA node in memflags-->
node_affinity_set{domain-><br>node_affinity<br>set?}
--Set-->
node_affinity{Alloc from<br>node_affinity<br>nodes}
--No free memory on
the node_affinity nodes
Check if exact request-->
node_affinity_exact{memflags<br>&<br>MEMF_exact?}
--Not exact: Fall back to<br>round-robin allocation-->
alloc_round_robin

node_affinity--Success: page-->Success

alloc_round_robin{" Fall back to
round-robin
allocation"}
--Success: page-->
Success(Success: Return the page)

click get_free_buddy
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L855-L1116
" _blank
```
85 changes: 85 additions & 0 deletions doc/content/lib/xen/get_free_buddy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
title: get_free_buddy()
description: Find free memory based on the given flags and optionally, a domain
mermaid:
force: true
---
## Overview

[get_free_buddy()](https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L855-L1116) is
[called](https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L1005)
from [alloc_heap_pages()](xc_domain_populate_physmap#alloc_heap_pages)
to find a page at the most suitable place for a memory allocation.

It finds memory depending on the given flags and domain:

- Optionally allocate prefer to allocate from a passed NUMA node
- Optionally allocate from the domain's next affine NUMA node (round-robin)
- Optionally return if the preferred NUMA allocation did not succeed
- Optionally allocate from not-yet scrubbed memory
- Optionally allocate from the given range of memory zones
- Fall back allocate from the next NUMA node on the system (round-robin)

## Input parameters

- `struct domain`
- Zones to allocate from (`zone_hi` until `zone_lo`)
- Page order (size of the page)
- populate_physmap() starts with 1GB pages and falls back to 2MB and 4k pages.

## Allocation strategy

Its first attempt is to find a page of matching page order
on the requested NUMA node(s).

If this is not successful, it looks to breaking higher page orders,
and if that fails too, it lowers the zone until `zone_lo`.

It does not attempt to use not scrubbed pages, but when `memflags`
tell it `MEMF_no_scrub`, it uses `check_and_stop_scrub(pg)` on 4k
pages to prevent breaking higher order pages instead.

If this fails, it checks if other NUMA nodes shall be tried.

### Exact NUMA allocation (on request, e.g. for vNUMA)

For example for vNUMA domains, the calling functions pass one specific
NUMA node, and they would also set `MEMF_exact_node` to make sure that
memory is specifically only allocated from this NUMA node.

If no NUMA node was passed or the allocation from it failed, and
`MEMF_exact_node` was not set in `memflags`, the function falls
back to the first fallback, NUMA-affine allocation.

### NUMA-affine allocation

For local NUMA memory allocation, the domain should have one or more NUMA nodes
in its `struct domain->node_affinity` field when this function is called.

This happens as part of
[NUMA placement](../../../xenopsd/walkthroughs/VM.build/Domain.build/#numa-placement)
which writes the planned vCPU affinity of the domain's vCPUs to the XenStore
which [xenguest](../../../xenopsd/walkthroughs/VM.build/xenguest) reads to
update the vCPU affinities of the domain's vCPUs in Xen, which in turn, by
default (when to domain->auto_node_affinity is active) also updates the
`struct domain->node_affinity` field.

Note: In case it contains multiple
NUMA nodes, this step allocates from the next NUMA node after the previous
NUMA node the domain allocated from in a round-robin way.

Otherwise, the function falls back to host-wide round-robin allocation.

### Host-wide round-robin allocation

When the domain's `node_affinity` is not defined or did not succeed
and `MEMF_exact_node` was not passed in `memflags`, all remaining
NUMA nodes are attempted in a round-robin way: Each subsequent call
uses the next NUMA node after the previous node that the domain
allocated memory from.

## Flowchart

This flowchart shows an overview of the decision chain of `get_free_buddy()`

{{% include "get_free_buddy-flowchart.md" %}}
53 changes: 53 additions & 0 deletions doc/content/lib/xen/populate_physmap-chart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: Simplified flowchart of populate_physmap()
hidden: true
---

```mermaid
flowchart LR

subgraph hypercall handlers
populate_physmap("<tt>populate_physmap()</tt>
One call for each memory
range (extent)")
end


subgraph "Xen buddy allocator:"

populate_physmap
--> alloc_domheap_pages("<tt>alloc_domheap_pages()</tt>
Assign allocated pages to
the domain")

alloc_domheap_pages
--> alloc_heap_pages("<tt>alloc_heap_pages()</tt>
If needed: split high-order
pages into smaller buddies,
and scrub dirty pages")
--> get_free_buddy("<tt>get_free_buddy()</tt>
If reqested: Allocate from a
preferred/exact NUMA node
and/or from
unscrubbed memory
")

end

click populate_physmap
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/memory.c#L159-L314
" _blank

click alloc_domheap_pages
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L2641-L2697
" _blank

click get_free_buddy
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L855-L958
" _blank

click alloc_heap_pages
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L967-L1116
" _blank

```
124 changes: 124 additions & 0 deletions doc/content/lib/xen/populate_physmap-dataflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
title: Flowchart for the populate_physmap hypercall
hidden: true
---
```mermaid
flowchart TD

subgraph XenCtrl
xc_domain_populate_physmap["<tt>xc_domain_populate_physmap()"]
xc_domain_populate_physmap_exact["<tt>xc_domain_populate_physmap_exact()"]
end

subgraph Xen

%% sub-subgraph from memory_op() to populate_node() and back

xc_domain_populate_physmap & xc_domain_populate_physmap_exact
<--reservation,<br>and for preempt:<br>nr_start/nr_done-->
memory_op("<tt>memory_op(XENMEM_populate_physmap)")

memory_op
--struct xen_memory_reservation-->
construct_memop_from_reservation("<tt>construct_memop_from_reservation()")
--struct<br>xen_memory_reservation->mem_flags-->
propagate_node("<tt>propagate_node()")
--_struct<br>memop_args->memflags_-->
construct_memop_from_reservation
--_struct memop_args_-->
memory_op<--struct memop_args *:
struct domain *,
List of extent base addrs,
Number of extents,
Size of each extent (extent_order),
Allocation flags(memflags)-->
populate_physmap[["<tt>populate_physmap()"]]
<-.domain, extent base addrs, extent size, memflags, nr_start and nr_done.->
populate_physmap_loop--if memflags & MEMF_populate_on_demand -->guest_physmap_mark_populate_on_demand("
<tt>guest_physmap_mark_populate_on_demand()")
populate_physmap_loop@{ label: "While extents to populate,
and not asked to preempt,
for each extent left to do:", shape: notch-pent }
--domain, order, memflags-->
alloc_domheap_pages("<tt>alloc_domheap_pages()")
--zone_lo, zone_hi, order, memflags, domain-->
alloc_heap_pages
--zone_lo, zone_hi, order, memflags, domain-->
get_free_buddy("<tt>get_free_buddy()")
--_page_info_
-->alloc_heap_pages
--if no page-->
no_scrub("<tt>get_free_buddy(MEMF_no_scrub)</tt>
(honored only when order==0)")
--_dirty 4k page_
-->alloc_heap_pages
<--_dirty 4k page_-->
scrub_one_page("<tt>scrub_one_page()")
alloc_heap_pages("<tt>alloc_heap_pages()</tt>
(also splits higher-order pages
into smaller buddies if needed)")
--_page_info_
-->alloc_domheap_pages
--page_info, order, domain, memflags-->assign_page("<tt>assign_page()")
assign_page
--page_info, nr_mfns, domain, memflags-->
assign_pages("<tt>assign_pages()")
--domain, nr_mfns-->
domain_adjust_tot_pages("<tt>domain_adjust_tot_pages()")
alloc_domheap_pages
--_page_info_-->
populate_physmap_loop
--page(gpfn, mfn, extent_order)-->
guest_physmap_add_page("<tt>guest_physmap_add_page()")

populate_physmap--nr_done, preempted-->memory_op
end

click memory_op
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/memory.c#L1409-L1425
" _blank

click construct_memop_from_reservation
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/memory.c#L1022-L1071
" _blank

click propagate_node
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/memory.c#L524-L547
" _blank

click populate_physmap
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/memory.c#L159-L314
" _blank

click populate_physmap_loop
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/memory.c#L197-L304
" _blank

click guest_physmap_mark_populate_on_demand
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L210-220
" _blank

click guest_physmap_add_page
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L296
" _blank

click alloc_domheap_pages
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L2641-L2697
" _blank

click alloc_heap_pages
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L967-L1116
" _blank

click get_free_buddy
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L855-L958
" _blank

click assign_page
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L2540-L2633
" _blank

click assign_pages
"https://github.com/xen-project/xen/blob/e16acd80/xen/common/page_alloc.c#L2635-L2639
" _blank
```
Loading
Loading