[v2,3/4] drm/i915: Discard previous atomic state on resume if connectors change

Submitted by cpaul@redhat.com on May 12, 2016, 9:47 p.m.

Details

Message ID 1463089659-6625-1-git-send-email-cpaul@redhat.com
State New
Headers show
Series "Important MST fixes for 4.6" ( rev: 3 ) in Intel GFX

Not browsing as part of any series.

Commit Message

cpaul@redhat.com May 12, 2016, 9:47 p.m.
If an MST device is disconnected while the machine is suspended, the
number of connectors will change as well after we call
intel_dp_mst_resume(). This means that any previous atomic state we had
before suspending is no longer valid, since it'll still be pointing to
missing connectors. We need to check for this before committing the
state, otherwise we'll kernel panic on resume whenever if any MST
display was disconnected before we started resuming:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper]
Call Trace:
 [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
 [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
 [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
 [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
 [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
 [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
 [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
 [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
 [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
 [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
 [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
 [<ffffffff814da455>] device_resume+0xd5/0x1f0
 [<ffffffff814da58d>] async_resume+0x1d/0x50
 [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
 [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
 [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
 [<ffffffff810ad038>] worker_thread+0x48/0x4e0
 [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
 [<ffffffff810b3794>] kthread+0xe4/0x100
 [<ffffffff81742672>] ret_from_fork+0x22/0x50
 [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200

Changes since v1:
  - Move drm_atomic_state_free() call down so we're holding the
    appropriate locks when destroying the atomic state
Changes since v2:
  - Check that state != NULL before we start accessing it's members

This fix is only required for 4.6 and below. David Airlie's patchseries
for 4.7 to add connector reference counting provides a more proper fix
for this.

Upstream fix: b164d31f50b2923a7a92c2a40cb46973a6ba8c36
Cc: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
---
 drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 182f849..e49a313 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15959,6 +15959,18 @@  void intel_display_resume(struct drm_device *dev)
 retry:
 	ret = drm_modeset_lock_all_ctx(dev, &ctx);
 
+	/*
+	 * With MST, the number of connectors can change between suspend and
+	 * resume, which means that the state we want to restore might now be
+	 * impossible to use since it'll be pointing to non-existant
+	 * connectors.
+	 */
+	if (ret == 0 && state &&
+	    state->num_connector != dev->mode_config.num_connector) {
+		drm_atomic_state_free(state);
+		state = NULL;
+	}
+
 	if (ret == 0 && !setup) {
 		setup = true;
 

Comments

On Thu, May 12, 2016 at 05:47:39PM -0400, Lyude wrote:
> If an MST device is disconnected while the machine is suspended, the
> number of connectors will change as well after we call
> intel_dp_mst_resume(). This means that any previous atomic state we had
> before suspending is no longer valid, since it'll still be pointing to
> missing connectors. We need to check for this before committing the
> state, otherwise we'll kernel panic on resume whenever if any MST
> display was disconnected before we started resuming:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffffa01588ef>] drm_atomic_helper_check_modeset+0x29f/0xb40 [drm_kms_helper]
> Call Trace:
>  [<ffffffffa02354f4>] intel_atomic_check+0x34/0x1180 [i915]
>  [<ffffffff810e6c3f>] ? mark_held_locks+0x6f/0xa0
>  [<ffffffff810e6d99>] ? trace_hardirqs_on_caller+0x129/0x1b0
>  [<ffffffffa00ff1d2>] drm_atomic_check_only+0x192/0x620 [drm]
>  [<ffffffff813ee001>] ? pci_pm_thaw+0x21/0x90
>  [<ffffffffa00ff677>] drm_atomic_commit+0x17/0x60 [drm]
>  [<ffffffffa023e0ad>] intel_display_resume+0xbd/0x160 [i915]
>  [<ffffffff813ee070>] ? pci_pm_thaw+0x90/0x90
>  [<ffffffffa01b60d8>] i915_drm_resume+0xd8/0x160 [i915]
>  [<ffffffffa01b6185>] i915_pm_resume+0x25/0x30 [i915]
>  [<ffffffff813ee0d4>] pci_pm_resume+0x64/0xa0
>  [<ffffffff814d9ea0>] dpm_run_callback+0x90/0x190
>  [<ffffffff814da455>] device_resume+0xd5/0x1f0
>  [<ffffffff814da58d>] async_resume+0x1d/0x50
>  [<ffffffff810b6718>] async_run_entry_fn+0x48/0x150
>  [<ffffffff810acc19>] process_one_work+0x1e9/0x5c0
>  [<ffffffff810acb96>] ? process_one_work+0x166/0x5c0
>  [<ffffffff810ad038>] worker_thread+0x48/0x4e0
>  [<ffffffff810acff0>] ? process_one_work+0x5c0/0x5c0
>  [<ffffffff810b3794>] kthread+0xe4/0x100
>  [<ffffffff81742672>] ret_from_fork+0x22/0x50
>  [<ffffffff810b36b0>] ? kthread_create_on_node+0x200/0x200
> 
> Changes since v1:
>   - Move drm_atomic_state_free() call down so we're holding the
>     appropriate locks when destroying the atomic state
> Changes since v2:
>   - Check that state != NULL before we start accessing it's members
> 
> This fix is only required for 4.6 and below. David Airlie's patchseries
> for 4.7 to add connector reference counting provides a more proper fix
> for this.
> 
> Upstream fix: b164d31f50b2923a7a92c2a40cb46973a6ba8c36
> Cc: stable@vger.kernel.org
> Signed-off-by: Lyude <cpaul@redhat.com>

Same caveat: This is for stable only as a duct-tape, the real fix is too
big for backporting from 4.7.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (for stable kernels
only)
> ---
>  drivers/gpu/drm/i915/intel_display.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 182f849..e49a313 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -15959,6 +15959,18 @@ void intel_display_resume(struct drm_device *dev)
>  retry:
>  	ret = drm_modeset_lock_all_ctx(dev, &ctx);
>  
> +	/*
> +	 * With MST, the number of connectors can change between suspend and
> +	 * resume, which means that the state we want to restore might now be
> +	 * impossible to use since it'll be pointing to non-existant
> +	 * connectors.
> +	 */
> +	if (ret == 0 && state &&
> +	    state->num_connector != dev->mode_config.num_connector) {
> +		drm_atomic_state_free(state);
> +		state = NULL;
> +	}
> +
>  	if (ret == 0 && !setup) {
>  		setup = true;
>  
> -- 
> 2.5.5
>