|
6 | 6 | //! When updaing KVM IRQ routing by ioctl(KVM_SET_GSI_ROUTING), all interrupts of the virtual
|
7 | 7 | //! machine must be updated all together. The [KvmIrqRouting](struct.KvmIrqRouting.html)
|
8 | 8 | //! structure is to maintain the global interrupt routing table.
|
| 9 | +//! |
| 10 | +//! It deserves a good documentation about the way that KVM based vmms manages interrupts. |
| 11 | +//! From the KVM hypervisor side, it provides three mechanism to support injecting interrupts into |
| 12 | +//! guests: |
| 13 | +//! 1) Irqfd. When data is written to an irqfd, it triggers KVM to inject an interrupt into guest. |
| 14 | +//! 2) Irq routing. Irq routing determines the way to inject an irq into guest. |
| 15 | +//! 3) Signal MSI. Vmm can inject an MSI interrupt into guest by issuing KVM_SIGNAL_MSI ioctl. |
| 16 | +//! |
| 17 | +//! Most VMMs use irqfd + irq routing to support interrupt injecting, so we will focus on this mode. |
| 18 | +//! The flow to enable interrupt injecting is: |
| 19 | +//! 1) VMM creates an irqfd |
| 20 | +//! 2) VMM invokes KVM_IRQFD to bind the irqfd to an interrupt source |
| 21 | +//! 3) VMM invokes KVM_SET_GSI_ROUTING to configure the way to inject the interrupt into guest |
| 22 | +//! 4) device backend driver writes to the irqfd |
| 23 | +//! 5) an interurpt is injected into the guest |
| 24 | +//! |
| 25 | +//! So far so good, right? Let's move on to mask/unmask/get_pending_state. That's the real tough |
| 26 | +//! part. To support mask/unmask/get_peding_state, we must have a way to break the interrupt |
| 27 | +//! delivery chain and maintain the pending state. Let's see how it's implemented by each VMM. |
| 28 | +//! - Firecracker. It's very simple, it doesn't support mask/unmask/get_pending_state at all. |
| 29 | +//! - Cloud Hypervisor. It builds the interrupt delivery path as: |
| 30 | +//! vhost-backend-driver -> EeventFd -> CLH -> Irqfd -> Irqrouting -> Guest OS |
| 31 | +//! It also maintains a masked/pending pair for each interrupt. When masking an interrupt, it |
| 32 | +//! sets the masked flag and remove IrqRouting for the interrupt. |
| 33 | +//! The CLH design has two shortcomings: |
| 34 | +//! - it's inefficient for the hot interrupt delivery path. |
| 35 | +//! - it may lose in-flight interrupts after removing IRQ routing entry for an interrupt due irqfd |
| 36 | +//! implementation details. Buy me a cup of coffee if you wants to knwo the detail. |
| 37 | +//! - Qemu. Qemu has a smart design, which supports: |
| 38 | +//! - A fast path: driver -> irqfd -> Irqrouting -> Guest OS |
| 39 | +//! - A slow path: driver -> eventfd -> qemu -> irqfd -> Irqrouting -> Guest OS |
| 40 | +//! When masking an interrupt, it switches from fast path to slow path and vice versa when |
| 41 | +//! unmasking an interrupt. |
| 42 | +//! - Dragonball V1. We doesn't support mask/unmask/get_pending_state at all, we have also enhanced |
| 43 | +//! the Virtio MMIO spec, we could use the fast path: driver -> irqfd -> Irqrouting -> Guest OS. |
| 44 | +//! - Dragonball V2. When enabling PCI device passthrough, mask/unmask/get_pending_state is a must |
| 45 | +//! to support PCI MSI/MSIx. Unlike Qemu fast path/slow path design, Dragonball V2 implements |
| 46 | +//! mask/unmask/get_pending_state with fast path only. It works as follow: |
| 47 | +//! 1) When masking an interrupt, unbind the irqfd from the interrupt by KVM_IRQFD. After that, |
| 48 | +//! all writes to the irqfd won't trigger injecting anymore, and irqfd maintains count for |
| 49 | +//! following write operations. |
| 50 | +//! 2) When unmasking an interrupt, bind the irqfd to the interrupt again by KVM_IRQFD. When |
| 51 | +//! rebinding, an interrupt will be injected into guest if the irqfd has a non-zero count. |
| 52 | +//! 3) When getting pending state, peek the count of the irqfd. But the irqfd doesn't support |
| 53 | +//! peek, so simulate peek by reading and writing back the count read. |
| 54 | +//! By this design, we use the irqfd count to maintain interrupt pending state, and auto-inject |
| 55 | +//! pending interrupts when rebinding. So we don't need to maintain the pending status bit. |
| 56 | +//! |
| 57 | +//! Why Qemu needs a slow path but Dragonball V2 doesn't need slow path? |
| 58 | +//! Qemu needs to support a broad ranges of guest OSes and all kinds of device drivers. And some |
| 59 | +//! legacy device drivers mask/unmask interrupt when handling each interrupt. |
| 60 | +//! For Dragonball, we don't expect guest device driver exhibits such behaviors, and treat |
| 61 | +//! mask/unmask/get_pending_state as cold path. We optimize for the hot interrupt delivery path |
| 62 | +//! and avoid the complexity to introduce a slow path. The penalty is that get_pending_state() |
| 63 | +//! will be much more expensive. |
9 | 64 |
|
10 | 65 | use std::collections::HashMap;
|
11 | 66 | use std::io::{Error, ErrorKind};
|
|
0 commit comments