Skip to content
This repository was archived by the owner on Oct 3, 2024. It is now read-only.
This repository was archived by the owner on Oct 3, 2024. It is now read-only.

Windows 10 VM running BlueIris has igfx driver crash every few days. #228

@bheikes1

Description

@bheikes1

Greetings all,

Looking for some hints as to what might be the issue with my setup. I have a Windows 10 VM running BlueIris that has started exhibiting igfx driver crashes approximately a month ago. Previously, this system was stable with uptimes of several months with no issues.

Host system:
Proxmox 7.4-3
Kernels recently used 6.2, 6.1, 5.19, 5.15, 5.13
Intel E-2186G, 128 GB ram, Nvidia T1000, LSI HBA

VMs:
Ubuntu 22.04 running PiHole, no issues noted
TrueNas Core, has LSI HBA passed through, no issues noted
Ubuntu 22.04 running Portainer, has Nvidia T1000 passed through, no issues noted
Windows 10 22H2, has Intel igpu p630 passed through (GVT-d), igfx driver crashes every few days.

This setup has been in place for approximately a year with virtually no issues until approximately a month ago (March 8th from my notes). In the last week or so, I've worked my way through linux kernels 5.19, 6.1, 6.2, as well as trying out GVT-g to see if i could stop the igfx driver crashes. Using GVT-g, when the crash happens the VM would stop responding completely, and cause issues with the host as well necessitating a host reboot. Using GVT-d, only the VM needs to be rebooted.

Under the 6.1 and 6.2 (and perhaps 5.19) kernels using GVT-G I get syslog entries (on host) like this when a crash happens

Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 00000000315456ba guest entry 0xffffffffffffffff type 9.
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail to flush post shadow
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail to dispatch workload, skip
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 00000000315456ba guest entry 0xffffffffffffffff type 9.
Mar 15 16:03:14 pve kernel: gvt: guest page write error, gpa 4df6c000
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 00000000315456ba guest entry 0xffffffffffffffff type 9.
Mar 15 16:03:14 pve kernel: gvt: guest page write error, gpa 4df6c008
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 00000000315456ba guest entry 0xffffffffffffffff type 9.
Mar 15 16:03:14 pve kernel: gvt: guest page write error, gpa 4df6c010
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 00000000315456ba guest entry 0xffffffffffffffff type 9.
Mar 15 16:03:14 pve kernel: gvt: guest page write error, gpa 4df6c018
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 0000000000000000 guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: shadow page 00000000315456ba guest entry 0xffffffffffffffff type 9.
Mar 15 16:03:14 pve kernel: gvt: guest page write error, gpa 4df6c020

and

Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 13 kernel messages
Mar 15 16:03:14 pve kernel: gvt: guest page write error, gpa 4df6c948
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 17 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 15 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 11 kernel messages
Mar 15 16:03:14 pve kernel: gvt: vgpu 1: fail: spt 00000000315456ba guest entry 0xffffffffffffffff type 9
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 13 kernel messages
Mar 15 16:03:14 pve kernel: gvt: guest page write error, gpa 4df6ca80
Mar 15 16:03:14 pve systemd-journald[1702]: Missed 13 kernel messages

Under 6.2 and 6.1 using GVT-d I get messages like this when a crash happens

Mar 26 07:20:45 pve kernel: DMAR: DRHD: handling fault status reg 3
Mar 26 07:20:45 pve kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xffffb8024c046000 [fault reason 0x07] Next page table ptr is invalid
Mar 29 12:08:47 pve kernel: DMAR: DRHD: handling fault status reg 3
Mar 29 12:08:47 pve kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xffff8004014b4000 [fault reason 0x07] Next page table ptr is invalid
Mar 31 05:48:36 pve kernel: DMAR: DRHD: handling fault status reg 3
Mar 31 05:48:36 pve kernel: DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xffff800417686000 [fault reason 0x07] Next page table ptr is invalid

I'm trying out older kernels now (currently 5.13) to see if there is any appreciable difference. I do realize that I am running quite a complicated system, and might be bumping up against an edge case.

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions