fifo/gk104: kick channel upon removal

Submitted by Alexandre Courbot on March 1, 2016, 7:59 a.m.

Details

Message ID 1456819145-3141-1-git-send-email-acourbot@nvidia.com
State New
Headers show
Series "fifo/gk104: kick channel upon removal" ( rev: 1 ) in Nouveau

Not browsing as part of any series.

Commit Message

Alexandre Courbot March 1, 2016, 7:59 a.m.
A channel may still be processed by the PBDMA even after removal, unless
it is properly kicked. Some chips are more sensible to this than others,
with GM20B triggering the issue very easily (the PBDMA will try to fetch
methods from the previously-removed channel after a new one is added).

Make sure this cannot happen by kicking the channel right after it is
disabled, and before the new runlist is submitted.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Second attempt at this - the first version was reverted after causing a
regression: https://lists.freedesktop.org/archives/nouveau/2015-August/021842.html

I am pretty confident this one will pass though: we are now kicking the channel
only if it has a chance of being scheduled at that time, and in a path from which
we *will* access the FIFO registers after the kick, so if nothing complained until
then, kicking the channel should be unconsequential.

Eric, since you reported the regression on the first version, would you mind trying
this one and giving us your Tested-by? I have tested it on dGPU but unfortunately
could not set it up in an optimus-like manner, with power management kicking in.

 drm/nouveau/nvkm/engine/fifo/gpfifogk104.c | 1 +
 1 file changed, 1 insertion(+)

Patch hide | download patch | download mbox

diff --git a/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c b/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c
index 2e1df01bd928..8b4a5e01829c 100644
--- a/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c
+++ b/drm/nouveau/nvkm/engine/fifo/gpfifogk104.c
@@ -154,6 +154,7 @@  gk104_fifo_gpfifo_fini(struct nvkm_fifo_chan *base)
 	if (!list_empty(&chan->head)) {
 		gk104_fifo_runlist_remove(fifo, chan);
 		nvkm_mask(device, 0x800004 + coff, 0x00000800, 0x00000800);
+		gk104_fifo_gpfifo_kick(chan);
 		gk104_fifo_runlist_commit(fifo, chan->engine);
 	}
 

Comments

On Tue, Mar 01, 2016 at 04:59:05PM +0900, Alexandre Courbot wrote:
> 
> Eric, since you reported the regression on the first version, would you mind trying
> this one and giving us your Tested-by? I have tested it on dGPU but unfortunately
> could not set it up in an optimus-like manner, with power management kicking in.
> 

Thanks.  I've tested Linux 4.5-rc6 with your patch applied, and I've yet to
encounter the crash after about half a dozen reboots.  So it seems good so far,
but I'll keep running it and will let you know if I notice a problem.
On Wed, Mar 2, 2016 at 12:53 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> On Tue, Mar 01, 2016 at 04:59:05PM +0900, Alexandre Courbot wrote:
>>
>> Eric, since you reported the regression on the first version, would you mind trying
>> this one and giving us your Tested-by? I have tested it on dGPU but unfortunately
>> could not set it up in an optimus-like manner, with power management kicking in.
>>
>
> Thanks.  I've tested Linux 4.5-rc6 with your patch applied, and I've yet to
> encounter the crash after about half a dozen reboots.  So it seems good so far,
> but I'll keep running it and will let you know if I notice a problem.

Great news, this one was bothering me for some time. Would be grateful
if you could let us know whether things are still going well after
further testing.
On Wed, Mar 2, 2016 at 6:07 PM, Alexandre Courbot <gnurou@gmail.com> wrote:
> On Wed, Mar 2, 2016 at 12:53 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
>> On Tue, Mar 01, 2016 at 04:59:05PM +0900, Alexandre Courbot wrote:
>>>
>>> Eric, since you reported the regression on the first version, would you mind trying
>>> this one and giving us your Tested-by? I have tested it on dGPU but unfortunately
>>> could not set it up in an optimus-like manner, with power management kicking in.
>>>
>>
>> Thanks.  I've tested Linux 4.5-rc6 with your patch applied, and I've yet to
>> encounter the crash after about half a dozen reboots.  So it seems good so far,
>> but I'll keep running it and will let you know if I notice a problem.
>
> Great news, this one was bothering me for some time. Would be grateful
> if you could let us know whether things are still going well after
> further testing.

Hi Eric, just to follow up - are you still happy with this patch?
On Mon, Mar 07, 2016 at 11:10:16AM +0900, Alexandre Courbot wrote:
> 
> Hi Eric, just to follow up - are you still happy with this patch?

Yes, I still haven't had any problems with it.
On Mon, Mar 7, 2016 at 11:32 AM, Eric Biggers <ebiggers3@gmail.com> wrote:
> On Mon, Mar 07, 2016 at 11:10:16AM +0900, Alexandre Courbot wrote:
>>
>> Hi Eric, just to follow up - are you still happy with this patch?
>
> Yes, I still haven't had any problems with it.

Awesome. Clear to add your Tested-by?
On Mon, Mar 07, 2016 at 11:39:18AM +0900, Alexandre Courbot wrote:
> On Mon, Mar 7, 2016 at 11:32 AM, Eric Biggers <ebiggers3@gmail.com> wrote:
> > On Mon, Mar 07, 2016 at 11:10:16AM +0900, Alexandre Courbot wrote:
> >>
> >> Hi Eric, just to follow up - are you still happy with this patch?
> >
> > Yes, I still haven't had any problems with it.
> 
> Awesome. Clear to add your Tested-by?

Yes that's fine with me.