Skip to content

Commit 746e537

Browse files
committed
Support mask/unmask/get_pending_state
Mask/unmask/get_pendign_state is needed to support PCI MSI/MSIx when enabling PCI device passthrough. Also document the overall design about the interrupt system. Signed-off-by: Liu Jiang <[email protected]>
1 parent c5b1b5f commit 746e537

File tree

5 files changed

+158
-2
lines changed

5 files changed

+158
-2
lines changed

src/interrupt/kvm/legacy_irq.rs

+49-1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ use super::*;
1111
use kvm_bindings::{
1212
KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE, KVM_IRQ_ROUTING_IRQCHIP,
1313
};
14+
use vmm_sys_util::eventfd::EFD_NONBLOCK;
1415

1516
/// Maximum number of legacy interrupts supported.
1617
pub const MAX_LEGACY_IRQS: u32 = 24;
@@ -40,7 +41,7 @@ impl LegacyIrq {
4041
Ok(LegacyIrq {
4142
base,
4243
vmfd,
43-
irqfd: EventFd::new(0)?,
44+
irqfd: EventFd::new(EFD_NONBLOCK)?,
4445
})
4546
}
4647

@@ -155,6 +156,53 @@ impl InterruptSourceGroup for LegacyIrq {
155156
}
156157
self.irqfd.write(1)
157158
}
159+
160+
fn mask(&self, index: InterruptIndex) -> Result<()> {
161+
if index > 1 {
162+
return Err(std::io::Error::from_raw_os_error(libc::EINVAL));
163+
}
164+
165+
let irqfd = &self.irqfd;
166+
self.vmfd
167+
.unregister_irqfd(irqfd, self.base + index)
168+
.map_err(from_sys_util_errno)?;
169+
170+
Ok(())
171+
}
172+
173+
fn unmask(&self, index: InterruptIndex) -> Result<()> {
174+
if index > 1 {
175+
return Err(std::io::Error::from_raw_os_error(libc::EINVAL));
176+
}
177+
178+
let irqfd = &self.irqfd;
179+
self.vmfd
180+
.register_irqfd(irqfd, self.base + index)
181+
.map_err(from_sys_util_errno)?;
182+
183+
Ok(())
184+
}
185+
186+
fn get_pending_state(&self, index: InterruptIndex) -> bool {
187+
if index > 1 {
188+
return false;
189+
}
190+
191+
// Peak the EventFd.count by reading and writing back.
192+
// The irqfd must be in NON-BLOCKING mode.
193+
let irqfd = &self.irqfd;
194+
match irqfd.read() {
195+
Err(_) => false,
196+
Ok(count) => {
197+
if count != 0 && irqfd.write(count).is_err() {
198+
// Hope the caller will handle the pending state corrrectly,
199+
// then no interrupt will be lost.
200+
//panic!("really no way to recover here!!!!");
201+
}
202+
count != 0
203+
}
204+
}
205+
}
158206
}
159207

160208
#[cfg(test)]

src/interrupt/kvm/mod.rs

+55
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,61 @@
66
//! When updaing KVM IRQ routing by ioctl(KVM_SET_GSI_ROUTING), all interrupts of the virtual
77
//! machine must be updated all together. The [KvmIrqRouting](struct.KvmIrqRouting.html)
88
//! structure is to maintain the global interrupt routing table.
9+
//!
10+
//! It deserves a good documentation about the way that KVM based vmms manages interrupts.
11+
//! From the KVM hypervisor side, it provides three mechanism to support injecting interrupts into
12+
//! guests:
13+
//! 1) Irqfd. When data is written to an irqfd, it triggers KVM to inject an interrupt into guest.
14+
//! 2) Irq routing. Irq routing determines the way to inject an irq into guest.
15+
//! 3) Signal MSI. Vmm can inject an MSI interrupt into guest by issuing KVM_SIGNAL_MSI ioctl.
16+
//!
17+
//! Most VMMs use irqfd + irq routing to support interrupt injecting, so we will focus on this mode.
18+
//! The flow to enable interrupt injecting is:
19+
//! 1) VMM creates an irqfd
20+
//! 2) VMM invokes KVM_IRQFD to bind the irqfd to an interrupt source
21+
//! 3) VMM invokes KVM_SET_GSI_ROUTING to configure the way to inject the interrupt into guest
22+
//! 4) device backend driver writes to the irqfd
23+
//! 5) an interurpt is injected into the guest
24+
//!
25+
//! So far so good, right? Let's move on to mask/unmask/get_pending_state. That's the real tough
26+
//! part. To support mask/unmask/get_peding_state, we must have a way to break the interrupt
27+
//! delivery chain and maintain the pending state. Let's see how it's implemented by each VMM.
28+
//! - Firecracker. It's very simple, it doesn't support mask/unmask/get_pending_state at all.
29+
//! - Cloud Hypervisor. It builds the interrupt delivery path as:
30+
//! vhost-backend-driver -> EeventFd -> CLH -> Irqfd -> Irqrouting -> Guest OS
31+
//! It also maintains a masked/pending pair for each interrupt. When masking an interrupt, it
32+
//! sets the masked flag and remove IrqRouting for the interrupt.
33+
//! The CLH design has two shortcomings:
34+
//! - it's inefficient for the hot interrupt delivery path.
35+
//! - it may lose in-flight interrupts after removing IRQ routing entry for an interrupt due irqfd
36+
//! implementation details. Buy me a cup of coffee if you wants to knwo the detail.
37+
//! - Qemu. Qemu has a smart design, which supports:
38+
//! - A fast path: driver -> irqfd -> Irqrouting -> Guest OS
39+
//! - A slow path: driver -> eventfd -> qemu -> irqfd -> Irqrouting -> Guest OS
40+
//! When masking an interrupt, it switches from fast path to slow path and vice versa when
41+
//! unmasking an interrupt.
42+
//! - Dragonball V1. We doesn't support mask/unmask/get_pending_state at all, we have also enhanced
43+
//! the Virtio MMIO spec, we could use the fast path: driver -> irqfd -> Irqrouting -> Guest OS.
44+
//! - Dragonball V2. When enabling PCI device passthrough, mask/unmask/get_pending_state is a must
45+
//! to support PCI MSI/MSIx. Unlike Qemu fast path/slow path design, Dragonball V2 implements
46+
//! mask/unmask/get_pending_state with fast path only. It works as follow:
47+
//! 1) When masking an interrupt, unbind the irqfd from the interrupt by KVM_IRQFD. After that,
48+
//! all writes to the irqfd won't trigger injecting anymore, and irqfd maintains count for
49+
//! following write operations.
50+
//! 2) When unmasking an interrupt, bind the irqfd to the interrupt again by KVM_IRQFD. When
51+
//! rebinding, an interrupt will be injected into guest if the irqfd has a non-zero count.
52+
//! 3) When getting pending state, peek the count of the irqfd. But the irqfd doesn't support
53+
//! peek, so simulate peek by reading and writing back the count read.
54+
//! By this design, we use the irqfd count to maintain interrupt pending state, and auto-inject
55+
//! pending interrupts when rebinding. So we don't need to maintain the pending status bit.
56+
//!
57+
//! Why Qemu needs a slow path but Dragonball V2 doesn't need slow path?
58+
//! Qemu needs to support a broad ranges of guest OSes and all kinds of device drivers. And some
59+
//! legacy device drivers mask/unmask interrupt when handling each interrupt.
60+
//! For Dragonball, we don't expect guest device driver exhibits such behaviors, and treat
61+
//! mask/unmask/get_pending_state as cold path. We optimize for the hot interrupt delivery path
62+
//! and avoid the complexity to introduce a slow path. The penalty is that get_pending_state()
63+
//! will be much more expensive.
964
1065
use std::collections::HashMap;
1166
use std::io::{Error, ErrorKind};

src/interrupt/kvm/msi_generic.rs

+2-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
//! Helper utilities for handling MSI interrupts.
55
66
use kvm_bindings::{kvm_irq_routing_entry, KVM_IRQ_ROUTING_MSI};
7+
use vmm_sys_util::eventfd::EFD_NONBLOCK;
78

89
use super::*;
910

@@ -15,7 +16,7 @@ pub(super) struct MsiConfig {
1516
impl MsiConfig {
1617
pub(super) fn new() -> Self {
1718
MsiConfig {
18-
irqfd: EventFd::new(0).unwrap(),
19+
irqfd: EventFd::new(EFD_NONBLOCK).unwrap(),
1920
config: Mutex::new(Default::default()),
2021
}
2122
}

src/interrupt/kvm/msi_irq.rs

+47
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,53 @@ impl InterruptSourceGroup for MsiIrq {
141141
let msi_config = &self.msi_configs[index as usize];
142142
msi_config.irqfd.write(1)
143143
}
144+
145+
fn mask(&self, index: InterruptIndex) -> Result<()> {
146+
if index >= self.count {
147+
return Err(std::io::Error::from_raw_os_error(libc::EINVAL));
148+
}
149+
150+
let irqfd = &self.msi_configs[index as usize].irqfd;
151+
self.vmfd
152+
.unregister_irqfd(irqfd, self.base + index)
153+
.map_err(from_sys_util_errno)?;
154+
155+
Ok(())
156+
}
157+
158+
fn unmask(&self, index: InterruptIndex) -> Result<()> {
159+
if index >= self.count {
160+
return Err(std::io::Error::from_raw_os_error(libc::EINVAL));
161+
}
162+
163+
let irqfd = &self.msi_configs[index as usize].irqfd;
164+
self.vmfd
165+
.register_irqfd(irqfd, self.base + index)
166+
.map_err(from_sys_util_errno)?;
167+
168+
Ok(())
169+
}
170+
171+
fn get_pending_state(&self, index: InterruptIndex) -> bool {
172+
if index >= self.count {
173+
return false;
174+
}
175+
176+
// Peak the EventFd.count by reading and writing back.
177+
// The irqfd must be in NON-BLOCKING mode.
178+
let irqfd = &self.msi_configs[index as usize].irqfd;
179+
match irqfd.read() {
180+
Err(_) => false,
181+
Ok(count) => {
182+
if count != 0 && irqfd.write(count).is_err() {
183+
// Hope the caller will handle the pending state corrrectly,
184+
// then no interrupt will be lost.
185+
//panic!("really no way to recover here!!!!");
186+
}
187+
count != 0
188+
}
189+
}
190+
}
144191
}
145192

146193
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]

src/interrupt/mod.rs

+5
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,11 @@ pub trait InterruptSourceGroup: Send + Sync {
208208
// To accommodate this, we can have a no-op here.
209209
Ok(())
210210
}
211+
212+
/// Check whether there's pending interrupt.
213+
fn get_pending_state(&self, _index: InterruptIndex) -> bool {
214+
false
215+
}
211216
}
212217

213218
#[cfg(feature = "kvm-irq")]

0 commit comments

Comments
 (0)