drm: Don't race connector registration

Submitted by Daniel Vetter on Jan. 30, 2017, 9:12 a.m.

Details

Message ID 20170130091226.vclqcmpjdm2m5mj6@phenom.ffwll.local
State New
Headers show
Series "drm: Don't race connector registration" ( rev: 2 ) in DRI devel

Not browsing as part of any series.

Commit Message

Daniel Vetter Jan. 30, 2017, 9:12 a.m.
On Thu, Jan 26, 2017 at 12:34:29PM -0800, Dave Hansen wrote:
> On 01/25/2017 07:38 AM, Daniel Vetter wrote:
> > On Wed, Jan 25, 2017 at 07:20:45AM -0800, Dave Hansen wrote:
> >> On 01/24/2017 10:21 PM, Daniel Vetter wrote:
> >>> Hi Dave,
> >>>
> >>> Still waiting for your testing results on this one here ...
> >>
> >> It's definitely stable with that patch applied.  No more crashes.
> >>
> >> But, it's also definitely having difficulty re-probing to find the
> >> monitor that's attached to the dock in some cases.  Whatever is going on
> >> isn't fixed by poking it with xrandr.
> > 
> > Is this new? When exactly does this happen? Does the mst sink connector no
> > longer show up, or is the connected/disconnected status all wrong?
> 
> It's hard to say whether it's new or not.  I *think* it worked better
> before, but it also crashed pretty often, so it's hard to say.

Ok, I guess that's good enough to push at least the crash fix forward.

> And, yeah, I think it just gets the connected status wrong.  The
> connector is still there.

Hm, I thought I replied here but I didn't:
- Is this just after boot (and then the connector is stuck forever), or
  starts to happen after suspend/resume, or some other system change like
  that? Or do they just crop up eventually?

- Does this only happen once the connector is destroyed? Please trace
  intel_dp_destroy_mst_connector with something like:



- If it's not that then something in intel_dp_mst_detect (well, it's
  helper implementation drm_dp_mst_detect_port) is probably going wrong.

Cheers, Daniel

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
index 38e3ca2f6f18..24ac2d1ce3ad 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -502,6 +502,8 @@  static void intel_dp_destroy_mst_connector(struct drm_dp_mst_topology_mgr *mgr,
 
 	drm_connector_unregister(connector);
 
+	printk("mst connector getting destroyed: %s\n", connector->name);
+
 	/* need to nuke the connector */
 	drm_modeset_lock_all(dev);
 	intel_connector_remove_from_fbdev(intel_connector);

Comments

On 01/30/2017 01:12 AM, Daniel Vetter wrote:
> On Thu, Jan 26, 2017 at 12:34:29PM -0800, Dave Hansen wrote:
...
>> And, yeah, I think it just gets the connected status wrong.  The
>> connector is still there.
> 
> Hm, I thought I replied here but I didn't:
> - Is this just after boot (and then the connector is stuck forever), or
>   starts to happen after suspend/resume, or some other system change like
>   that? Or do they just crop up eventually?

The most consistent thing I do to screw it up is switch systems on my
DVI KVM switch.  When I switch back to the system in question, it
doesn't seem to notice the condition.  The connectors eventually show up
with random combinations of switching to the console (ctrl-alt-f1) and
back, running xrandr, or running gnome-control-panel and opening the
Displays applet.

I haven't been able to discern any pattern to it.  Sometimes just
running xrandr fixes it.  Sometimes just opening the control panel.
Others, I have to do it several times.

I don't think it shows up if I just leave it for a while.

> - Does this only happen once the connector is destroyed? Please trace
>   intel_dp_destroy_mst_connector with something like:

I'll see if I can gather that.
On Mon, Jan 30, 2017 at 08:43:17AM -0800, Dave Hansen wrote:
> On 01/30/2017 01:12 AM, Daniel Vetter wrote:
> > On Thu, Jan 26, 2017 at 12:34:29PM -0800, Dave Hansen wrote:
> ...
> >> And, yeah, I think it just gets the connected status wrong.  The
> >> connector is still there.
> > 
> > Hm, I thought I replied here but I didn't:
> > - Is this just after boot (and then the connector is stuck forever), or
> >   starts to happen after suspend/resume, or some other system change like
> >   that? Or do they just crop up eventually?
> 
> The most consistent thing I do to screw it up is switch systems on my
> DVI KVM switch.  When I switch back to the system in question, it
> doesn't seem to notice the condition.  The connectors eventually show up
> with random combinations of switching to the console (ctrl-alt-f1) and
> back, running xrandr, or running gnome-control-panel and opening the
> Displays applet.

Hm, so is this a dp mst kvm switch (i.e. do the connectors get
hot-added/removed when you plug/unplug that thing)? Or just some other
non-mst switch? I was working under the assumption that this is mst still,
but I've never seen an mst kvm switch.

> I haven't been able to discern any pattern to it.  Sometimes just
> running xrandr fixes it.  Sometimes just opening the control panel.
> Others, I have to do it several times.
> 
> I don't think it shows up if I just leave it for a while.
> 
> > - Does this only happen once the connector is destroyed? Please trace
> >   intel_dp_destroy_mst_connector with something like:
> 
> I'll see if I can gather that.

If it's not mst, then don't bother with this for obvious reasons :-)
-Daniel
I added some printk()s all over and gathered a bit more information
about what's going on.  It looks like the display doesn't work until the
drm connector code cleans up the *old* connector.  For some reason, it
isn't motivated to do that until I go to the console and back.

In this case, the display was connected to DP-4.
intel_dp_destroy_mst_connector() got called on it when I switched away,
but drm_connector_cleanup() did not get called.  Upon switching back
DP-3/5/6 get created.  One of these *eventually* ends up being
"enabled", but is not now.  When I switch over to the console,
drm_connector_cleanup() finally gets called on the old connector: DP-4
and I can switch back to X and I see one of DP-3/5/6 enabled and working.

Here are some snippets of dmesg interspersed with what I was doing:

Push DVI switch button to switch to other system:

> [ 6824.562838] drm_dp_destroy_port() kfree(ffff8801ade46800)
> [ 6824.563164] drm_dp_destroy_connector_work() port: ffff8801ade42000 connector: ffff8801ade46000
> [ 6824.563178] intel_dp_destroy_mst_connector() connector: ffff8801ade46000 name: DP-3 &name: ffff8801ade46048 intel_connector: ffff8801ade46000
> [ 6824.563186] drm_sysfs_connector_remove() connector: ffff8801ade46000 kdev: ffff8801a941b400
> [ 6824.571556] drm_connector_cleanup(ffff8801ade46000)::329 connector->registered: 0 cpu: 3
> [ 6824.571570] drm_connector_cleanup() kfree() connector->name: 'DP-3' &name: ffff8801ade46048
> [ 6824.571581] drm_dp_free_mst_port() kfree port: ffff8801ade42000
> [ 6824.571587] drm_dp_destroy_connector_work() port: ffff8801ade42800 connector: ffff8801ade47000
> [ 6824.571594] intel_dp_destroy_mst_connector() connector: ffff8801ade47000 name: DP-4 &name: ffff8801ade47048 intel_connector: ffff8801ade47000
> [ 6824.571601] drm_sysfs_connector_remove() connector: ffff8801ade47000 kdev: ffff8801a941a000
> [ 6824.571915] drm_dp_free_mst_port() kfree port: ffff8801ade42800
> [ 6824.571925] drm_dp_destroy_connector_work() port: ffff8801ade40800 connector: ffff8801ade43000
> [ 6824.571934] intel_dp_destroy_mst_connector() connector: ffff8801ade43000 name: DP-6 &name: ffff8801ade43048 intel_connector: ffff8801ade43000
> [ 6824.571943] drm_sysfs_connector_remove() connector: ffff8801ade43000 kdev: ffff8801a9419800
> [ 6824.572091] drm_connector_cleanup(ffff8801ade43000)::329 connector->registered: 0 cpu: 3
> [ 6824.572101] drm_connector_cleanup() kfree() connector->name: 'DP-6' &name: ffff8801ade43048
> [ 6824.572110] drm_dp_free_mst_branch_device() kfree mstb: ffff88030ac22600
> [ 6824.572117] drm_dp_free_mst_port() kfree port: ffff8801ade40800

Push button to switch back:

> [ 6837.349693] drm_connector_init() connector->name: 'DP-3' &name: ffff88040231d848
> [ 6837.349894] drm_sysfs_connector_add() connector: ffff88040231d800 kdev: ffff8801ae99f400
> [ 6837.352786] drm_connector_init() connector->name: 'DP-5' &name: ffff880402318048
> [ 6837.352951] drm_sysfs_connector_add() connector: ffff880402318000 kdev: ffff8801ae99c000
> [ 6837.353036] drm_connector_init() connector->name: 'DP-6' &name: ffff88040d37f048
> [ 6837.353154] drm_sysfs_connector_add() connector: ffff88040d37f000 kdev: ffff8801ae99ec00

I can type into the X session, but both screens are blank.  When I press
Ctrl-Alt-F2, I get:

> [ 6850.494310] drm_connector_cleanup(ffff8801ade47000)::329 connector->registered: 0 cpu: 1
> [ 6850.494314] drm_connector_cleanup() kfree() connector->name: 'DP-4' &name: ffff8801ade47048

Now I can switch back to X and everything is OK again.
On Tue, Jan 31, 2017 at 04:27:14PM -0800, Dave Hansen wrote:
> I added some printk()s all over and gathered a bit more information
> about what's going on.  It looks like the display doesn't work until the
> drm connector code cleans up the *old* connector.  For some reason, it
> isn't motivated to do that until I go to the console and back.
> 
> In this case, the display was connected to DP-4.
> intel_dp_destroy_mst_connector() got called on it when I switched away,
> but drm_connector_cleanup() did not get called.  Upon switching back
> DP-3/5/6 get created.  One of these *eventually* ends up being
> "enabled", but is not now.  When I switch over to the console,
> drm_connector_cleanup() finally gets called on the old connector: DP-4
> and I can switch back to X and I see one of DP-3/5/6 enabled and working.
> 
> Here are some snippets of dmesg interspersed with what I was doing:

Ok, so the delayed deleting seems to be involved in the bug (and we only
do that since we recently introduced refcounting for hotplugged
connectors). The question is who's getting confused, either kernel or X.
To figure this out, next time things are out of sync, please compare the
output of

$ xrandr

with what's reported in /sys/class/drm/*/status:

$ grep . /sys/class/drm/card0-DP-*/status

Another question: What desktop are you using, and if you unplug a screen,
does that general reconfigure the desktop size to disable that output? The
zombie connector only sticks around as long as someone is still using it
in the screen configuration. As soon as the reconfiguration has happened,
it should go away. You can test this by manually disabling the output when
it's stuck as on:

$ xrandr --output DP-4 --off

That should result in the delayed cleanup happening when you look at dmesg
afterwards.

Thanks, Daniel
> 
> Push DVI switch button to switch to other system:
> 
> > [ 6824.562838] drm_dp_destroy_port() kfree(ffff8801ade46800)
> > [ 6824.563164] drm_dp_destroy_connector_work() port: ffff8801ade42000 connector: ffff8801ade46000
> > [ 6824.563178] intel_dp_destroy_mst_connector() connector: ffff8801ade46000 name: DP-3 &name: ffff8801ade46048 intel_connector: ffff8801ade46000
> > [ 6824.563186] drm_sysfs_connector_remove() connector: ffff8801ade46000 kdev: ffff8801a941b400
> > [ 6824.571556] drm_connector_cleanup(ffff8801ade46000)::329 connector->registered: 0 cpu: 3
> > [ 6824.571570] drm_connector_cleanup() kfree() connector->name: 'DP-3' &name: ffff8801ade46048
> > [ 6824.571581] drm_dp_free_mst_port() kfree port: ffff8801ade42000
> > [ 6824.571587] drm_dp_destroy_connector_work() port: ffff8801ade42800 connector: ffff8801ade47000
> > [ 6824.571594] intel_dp_destroy_mst_connector() connector: ffff8801ade47000 name: DP-4 &name: ffff8801ade47048 intel_connector: ffff8801ade47000
> > [ 6824.571601] drm_sysfs_connector_remove() connector: ffff8801ade47000 kdev: ffff8801a941a000
> > [ 6824.571915] drm_dp_free_mst_port() kfree port: ffff8801ade42800
> > [ 6824.571925] drm_dp_destroy_connector_work() port: ffff8801ade40800 connector: ffff8801ade43000
> > [ 6824.571934] intel_dp_destroy_mst_connector() connector: ffff8801ade43000 name: DP-6 &name: ffff8801ade43048 intel_connector: ffff8801ade43000
> > [ 6824.571943] drm_sysfs_connector_remove() connector: ffff8801ade43000 kdev: ffff8801a9419800
> > [ 6824.572091] drm_connector_cleanup(ffff8801ade43000)::329 connector->registered: 0 cpu: 3
> > [ 6824.572101] drm_connector_cleanup() kfree() connector->name: 'DP-6' &name: ffff8801ade43048
> > [ 6824.572110] drm_dp_free_mst_branch_device() kfree mstb: ffff88030ac22600
> > [ 6824.572117] drm_dp_free_mst_port() kfree port: ffff8801ade40800
> 
> Push button to switch back:
> 
> > [ 6837.349693] drm_connector_init() connector->name: 'DP-3' &name: ffff88040231d848
> > [ 6837.349894] drm_sysfs_connector_add() connector: ffff88040231d800 kdev: ffff8801ae99f400
> > [ 6837.352786] drm_connector_init() connector->name: 'DP-5' &name: ffff880402318048
> > [ 6837.352951] drm_sysfs_connector_add() connector: ffff880402318000 kdev: ffff8801ae99c000
> > [ 6837.353036] drm_connector_init() connector->name: 'DP-6' &name: ffff88040d37f048
> > [ 6837.353154] drm_sysfs_connector_add() connector: ffff88040d37f000 kdev: ffff8801ae99ec00
> 
> I can type into the X session, but both screens are blank.  When I press
> Ctrl-Alt-F2, I get:
> 
> > [ 6850.494310] drm_connector_cleanup(ffff8801ade47000)::329 connector->registered: 0 cpu: 1
> > [ 6850.494314] drm_connector_cleanup() kfree() connector->name: 'DP-4' &name: ffff8801ade47048
> 
> Now I can switch back to X and everything is OK again.