[v2] drm/i915: Also perform gpu reset under execlist mode.

Submitted by Wang, Zhi A on July 6, 2015, 8:23 p.m.

Details

Message ID 559AE3B1.9000300@intel.com
State New
Headers show

Not browsing as part of any series.

Commit Message

Wang, Zhi A July 6, 2015, 8:23 p.m.
Hi Chris:

How about this one? :)

base object refcount
                  * will be 2 (+1 from object creation and +1 from 
do_switch()).
                  * i915_gem_context_fini() will be called after 
gpu_idle() has switched


于 07/07/15 19:17, Chris Wilson 写道:
> On Tue, Jul 07, 2015 at 03:38:37AM +0800, Zhi Wang wrote:
>> Hi Chris:
>>      Thanks for the comments! I can understand that we're concerned
>> about regressions, so this is why I think put this reset in module
>> unload path looks much safer. For safety, maybe we should only reset
>> GPU perhaps only when GEN >= 6? That looks much easier and safer,
>> also combine execlist reset and power context reset.
>>
>> Or we just add this before i915_uncore_fini() inside
>> i915_driver_unload()? This way looks much safer?
>>
>> How about this one?
>
> No, if we are just targetting execlists, then disabling it in
> cleanup_ringbuffers as before is the cleanest (as that is the opposite
> stage to where we enable them).
>
> The reset in i915_driver_unload() is preferred to replace all the resets
> required during unload. It is safe to move the context reset here as we
> do not disturb the GTT state between unpining the context and here.
> Making it conditional on gen>=5 is probably a good first step.
> -Chris
>

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_dma.c 
b/drivers/gpu/drm/i915/i915_dma.c
index c5349fa..013039e 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1134,6 +1134,21 @@  int i915_driver_unload(struct drm_device *dev)

         i915_global_gtt_cleanup(dev);

+       /*
+        * The only known way to stop the gpu from accessing the hw 
context in
+        * graphics memory space is to reset it. Do this as the very last
+        * operation to avoid confusing other code, leading to spurious 
errors.
+        *
+        * Besides, we also need to restore HW workload submission mode back
+        * to default mode when shutdown i915.
+        *
+        * It makes i915 module loading/unloading be able to switch between
+        * different workload submission mode on gen8+. And according to 
B-spec,
+        * the only way to reset HW workload submission mode to default 
mod is GPU reset.
+        */
+       if (INTEL_INFO(dev)->gen >= 5)
+               intel_gpu_reset(dev);
+
         intel_uncore_fini(dev);
         if (dev_priv->regs != NULL)
                 pci_iounmap(dev->pdev, dev_priv->regs);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index a7e58a8..376ee6b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -373,11 +373,6 @@  void i915_gem_context_fini(struct drm_device *dev)
         int i;

         if (dctx->legacy_hw_ctx.rcs_state) {
-               /* The only known way to stop the gpu from accessing the 
hw context is
-                * to reset it. Do this as the very last operation to 
avoid confusing
-                * other code, leading to spurious errors. */
-               intel_gpu_reset(dev);
-
                 /* When default context is created and switched to, 

Comments

Thanks for the comments!:) I'm wondering that why the GEN>=5 is needed 
here. Is there any HW problems in GEN>=5 when driver wants to do GPU 
reset? I'm sorry that I don't know much about GEN<6...Just want to know 
more about the story:)

于 07/07/15 20:00, Chris Wilson 写道:
> On Tue, Jul 07, 2015 at 04:23:13AM +0800, Zhi Wang wrote:
>> Hi Chris:
>>
>> How about this one? :)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c
>> b/drivers/gpu/drm/i915/i915_dma.c
>> index c5349fa..013039e 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -1134,6 +1134,21 @@ int i915_driver_unload(struct drm_device *dev)
>>
>>          i915_global_gtt_cleanup(dev);
>>
>> +       /*
>> +        * The only known way to stop the gpu from accessing the hw
>> context in
>> +        * graphics memory space is to reset it. Do this as the very last
>> +        * operation to avoid confusing other code, leading to
>> spurious errors.
>> +        *
>> +        * Besides, we also need to restore HW workload submission mode back
>> +        * to default mode when shutdown i915.
>> +        *
>> +        * It makes i915 module loading/unloading be able to switch between
>> +        * different workload submission mode on gen8+. And
>> according to B-spec,
>> +        * the only way to reset HW workload submission mode to
>> default mod is GPU reset.
>> +        */
>> +       if (INTEL_INFO(dev)->gen >= 5)
>> +               intel_gpu_reset(dev);
>> +
>>          intel_uncore_fini(dev);
>>          if (dev_priv->regs != NULL)
>>                  pci_iounmap(dev->pdev, dev_priv->regs);
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index a7e58a8..376ee6b 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -373,11 +373,6 @@ void i915_gem_context_fini(struct drm_device *dev)
>>          int i;
>>
>>          if (dctx->legacy_hw_ctx.rcs_state) {
>> -               /* The only known way to stop the gpu from accessing
>> the hw context is
>> -                * to reset it. Do this as the very last operation
>> to avoid confusing
>> -                * other code, leading to spurious errors. */
>> -               intel_gpu_reset(dev);
>> -
>>                  /* When default context is created and switched to,
>> base object refcount
>>                   * will be 2 (+1 from object creation and +1 from
>> do_switch()).
>>                   * i915_gem_context_fini() will be called after
>> gpu_idle() has switched
>
> I'd be happy to r-b that. Having a verbose comment explaining why we put
> the shotgun to our head is definitely required, thanks.
> -Chris
>
Thanks! I had question about power context.

I was curious about HW context and once dumped the GEM context object on 
GEN6 before.

According to B-spec, Gen Graphics » BSpec » 3D-Media-GPGPU Engine » 
Registers in Render Engine » Render Logical Context Data » Register 
State Context [SNB+] » Register State Context [HSW] »

I thought the context should contain all the register regions in the 
list include the dark yellow marked registers as power context 
registers, but I saw the content of the context I got began with main 
context, LRI signature 110010bf. .. It's weird.

I don't know if the context saved by MI_SET_CONTEXT is just the power 
context in B-spec?If yes, why doest it looks like that?

Thanks,
Zhi.

于 07/07/15 20:10, Chris Wilson 写道:
> On Tue, Jul 07, 2015 at 04:32:53AM +0800, Zhi Wang wrote:
>> Thanks for the comments!:) I'm wondering that why the GEN>=5 is
>> needed here. Is there any HW problems in GEN>=5 when driver wants to
>> do GPU reset? I'm sorry that I don't know much about GEN<6...Just
>> want to know more about the story:)
>
> gen5 is about the earliest that I know the hw stated to get smart and
> doing automatic power saving by writing to pinned buffers. Before gen5
> our reset hasn't been very reliable, and only recently have we had code
> that is meant to be able to reset the gpu on those gen - hence my
> reservation about doing so unconditionally.
> -Chris
>
On Tue, Jul 07, 2015 at 04:23:13AM +0800, Zhi Wang wrote:
> Hi Chris:
> 
> How about this one? :)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c
> b/drivers/gpu/drm/i915/i915_dma.c
> index c5349fa..013039e 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1134,6 +1134,21 @@ int i915_driver_unload(struct drm_device *dev)
> 
>         i915_global_gtt_cleanup(dev);
> 
> +       /*
> +        * The only known way to stop the gpu from accessing the hw
> context in
> +        * graphics memory space is to reset it. Do this as the very last
> +        * operation to avoid confusing other code, leading to
> spurious errors.
> +        *
> +        * Besides, we also need to restore HW workload submission mode back
> +        * to default mode when shutdown i915.
> +        *
> +        * It makes i915 module loading/unloading be able to switch between
> +        * different workload submission mode on gen8+. And
> according to B-spec,
> +        * the only way to reset HW workload submission mode to
> default mod is GPU reset.
> +        */
> +       if (INTEL_INFO(dev)->gen >= 5)
> +               intel_gpu_reset(dev);
> +
>         intel_uncore_fini(dev);
>         if (dev_priv->regs != NULL)
>                 pci_iounmap(dev->pdev, dev_priv->regs);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index a7e58a8..376ee6b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -373,11 +373,6 @@ void i915_gem_context_fini(struct drm_device *dev)
>         int i;
> 
>         if (dctx->legacy_hw_ctx.rcs_state) {
> -               /* The only known way to stop the gpu from accessing
> the hw context is
> -                * to reset it. Do this as the very last operation
> to avoid confusing
> -                * other code, leading to spurious errors. */
> -               intel_gpu_reset(dev);
> -
>                 /* When default context is created and switched to,
> base object refcount
>                  * will be 2 (+1 from object creation and +1 from
> do_switch()).
>                  * i915_gem_context_fini() will be called after
> gpu_idle() has switched

I'd be happy to r-b that. Having a verbose comment explaining why we put
the shotgun to our head is definitely required, thanks.
-Chris
On Tue, Jul 07, 2015 at 04:32:53AM +0800, Zhi Wang wrote:
> Thanks for the comments!:) I'm wondering that why the GEN>=5 is
> needed here. Is there any HW problems in GEN>=5 when driver wants to
> do GPU reset? I'm sorry that I don't know much about GEN<6...Just
> want to know more about the story:)

gen5 is about the earliest that I know the hw stated to get smart and
doing automatic power saving by writing to pinned buffers. Before gen5
our reset hasn't been very reliable, and only recently have we had code
that is meant to be able to reset the gpu on those gen - hence my
reservation about doing so unconditionally.
-Chris