pci: do a msi rearm on init

Submitted by Thierry Reding on Nov. 24, 2017, 2:23 p.m.


Message ID 20171124142328.GA19273@ulmo
State New
Headers show
Series "pci: do a msi rearm on init" ( rev: 2 ) in Nouveau

Not browsing as part of any series.

Commit Message

Thierry Reding Nov. 24, 2017, 2:23 p.m.
On Fri, Nov 24, 2017 at 03:08:25PM +0100, Karol Herbst wrote:
> On Fri, Nov 24, 2017 at 3:02 PM, Thierry Reding
> <thierry.reding@gmail.com> wrote:
> > On Fri, Nov 24, 2017 at 03:56:26AM +0100, Karol Herbst wrote:
> >> On my GP107 when I load nouveau after unloading it, for some reason the
> >> GPU stopped sending or the CPU stopped receiving interrupts if MSI was
> >> enabled.
> >
> > I suppose this could happen if the GPU raises an interrupt after the
> > driver's already called free_irq() on it, and hence the driver can't
> > rearm itself in the interrupt handler.
> >
> > This possibly points to a bug somewhere (the GPU should be completely
> > idle by the time free_irq() is called), but this seems like a valid
> > thing to do at initialization in any case to avoid relying on the prior
> > owner of the device to always behave properly.
> >
> Yeah, this makes sense. But what I am wondering about is, why this
> isn't a bigger problem or maybe this is just due to those changes in
> the Pascal interrupt handler and this is a Pascal only problem?

Yeah, this could be some kind of race that's only triggering on Pascal.

Comparing with the nvgpu driver it seems like the MSI interrupt should
be rearmed only after all interrupts have been processed, while Nouveau
currently rearms before processing interrupts (though after masking the
interrupts). I'm not very familiar with all of this, but perhaps Pascal
has some interrupts that Nouveau doesn't mask and therefore might race.

Perhaps something like this would help:

--- >8 ---
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
index b1b1f3626b96..0b3b802c26df 100644

> Anyway, the Nvidia driver seems to do it once on loading time as well,
> so I was quite sure we could simply do it this way and be sure that we
> are able to use the GPU from any state.

I think it's totally fine to apply as-is and leave it to further
investigation what Nouveau needs to do to properly uninitialize the
device. Like you said it can always happen that somebody else leaves
the GPU in some undefined state, in which case it's good to always
do this at initialization.


Patch hide | download patch | download mbox

--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
@@ -72,10 +72,10 @@  nvkm_pci_intr(int irq, void *arg)
        struct nvkm_device *device = pci->subdev.device;
        bool handled = false;
-       if (pci->msi)
-               pci->func->msi_rearm(pci);
        nvkm_mc_intr(device, &handled);
+       if (pci->msi)
+               pci->func->msi_rearm(pci);
        return handled ? IRQ_HANDLED : IRQ_NONE;

--- >8 ---