drm: Don't race connector registration

Submitted by Daniel Vetter on Jan. 12, 2017, 4:15 p.m.

Details

Message ID 1484237756-2720-1-git-send-email-daniel.vetter@ffwll.ch
State New
Headers show
Series "drm: Don't race connector registration" ( rev: 1 ) in DRI devel

Not browsing as part of any series.

Commit Message

Daniel Vetter Jan. 12, 2017, 4:15 p.m.
I was under the misconception that the sysfs dev stuff can be fully
set up, and then registered all in one step with device_add. That's
true for properties and property groups, but not for parents and child
devices. Those must be fully registered before you can register a
child.

Add a bit of tracking to make sure that asynchronous mst connector
hotplugging gets this right. For consistency we rely upon the implicit
barriers of the connector->mutex, which is taken anyway, to ensure
that at least either the connector or device registration call will
work out.

Mildly tested since I can't reliably reproduce this on my mst box
here.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/drm_connector.c | 3 +++
 drivers/gpu/drm/drm_drv.c       | 4 ++++
 include/drm/drmP.h              | 1 +
 3 files changed, 8 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index c75ab242f907..5999cb83d05b 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -378,6 +378,9 @@  int drm_connector_register(struct drm_connector *connector)
 {
 	int ret = 0;
 
+	if (!connector->dev->registered)
+		return 0;
+
 	mutex_lock(&connector->mutex);
 	if (connector->registered)
 		goto unlock;
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 7e24103c47f1..cad6df626678 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -749,6 +749,8 @@  int drm_dev_register(struct drm_device *dev, unsigned long flags)
 	if (ret)
 		goto err_minors;
 
+	dev->registered = true;
+
 	if (dev->driver->load) {
 		ret = dev->driver->load(dev, flags);
 		if (ret)
@@ -796,6 +798,8 @@  void drm_dev_unregister(struct drm_device *dev)
 
 	drm_lastclose(dev);
 
+	dev->registered = false;
+
 	if (drm_core_check_feature(dev, DRIVER_MODESET))
 		drm_modeset_unregister_all(dev);
 
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index c537c278a4be..ec105c339347 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -518,6 +518,7 @@  struct drm_device {
 	struct drm_minor *control;		/**< Control node */
 	struct drm_minor *primary;		/**< Primary node */
 	struct drm_minor *render;		/**< Render node */
+	bool registered;
 
 	/* currently active master for this device. Protected by master_mutex */
 	struct drm_master *master;

Comments

On Thu, Jan 12, 2017 at 05:15:56PM +0100, Daniel Vetter wrote:
> I was under the misconception that the sysfs dev stuff can be fully
> set up, and then registered all in one step with device_add. That's
> true for properties and property groups, but not for parents and child
> devices. Those must be fully registered before you can register a
> child.
> 
> Add a bit of tracking to make sure that asynchronous mst connector
> hotplugging gets this right. For consistency we rely upon the implicit
> barriers of the connector->mutex, which is taken anyway, to ensure
> that at least either the connector or device registration call will
> work out.
> 
> Mildly tested since I can't reliably reproduce this on my mst box
> here.
> 
> Reported-by: Dave Hansen <dave.hansen@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Cc: stable@vger.kernel.org

> ---
>  drivers/gpu/drm/drm_connector.c | 3 +++
>  drivers/gpu/drm/drm_drv.c       | 4 ++++
>  include/drm/drmP.h              | 1 +
>  3 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
> index c75ab242f907..5999cb83d05b 100644
> --- a/drivers/gpu/drm/drm_connector.c
> +++ b/drivers/gpu/drm/drm_connector.c
> @@ -378,6 +378,9 @@ int drm_connector_register(struct drm_connector *connector)
>  {
>  	int ret = 0;
>  
> +	if (!connector->dev->registered)
> +		return 0;
> +
>  	mutex_lock(&connector->mutex);
>  	if (connector->registered)
>  		goto unlock;
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 7e24103c47f1..cad6df626678 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -749,6 +749,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags)
>  	if (ret)
>  		goto err_minors;
>  
> +	dev->registered = true;
> +
>  	if (dev->driver->load) {
>  		ret = dev->driver->load(dev, flags);
>  		if (ret)
> @@ -796,6 +798,8 @@ void drm_dev_unregister(struct drm_device *dev)
>  
>  	drm_lastclose(dev);
>  
> +	dev->registered = false;
> +
>  	if (drm_core_check_feature(dev, DRIVER_MODESET))
>  		drm_modeset_unregister_all(dev);
>  
> diff --git a/include/drm/drmP.h b/include/drm/drmP.h
> index c537c278a4be..ec105c339347 100644
> --- a/include/drm/drmP.h
> +++ b/include/drm/drmP.h
> @@ -518,6 +518,7 @@ struct drm_device {
>  	struct drm_minor *control;		/**< Control node */
>  	struct drm_minor *primary;		/**< Primary node */
>  	struct drm_minor *render;		/**< Render node */
> +	bool registered;
>  
>  	/* currently active master for this device. Protected by master_mutex */
>  	struct drm_master *master;
> -- 
> 2.5.5
>
On Thu, Jan 12, 2017 at 05:15:56PM +0100, Daniel Vetter wrote:
> I was under the misconception that the sysfs dev stuff can be fully
> set up, and then registered all in one step with device_add. That's
> true for properties and property groups, but not for parents and child
> devices. Those must be fully registered before you can register a
> child.
> 
> Add a bit of tracking to make sure that asynchronous mst connector
> hotplugging gets this right. For consistency we rely upon the implicit
> barriers of the connector->mutex, which is taken anyway, to ensure
> that at least either the connector or device registration call will
> work out.
> 
> Mildly tested since I can't reliably reproduce this on my mst box
> here.

Hmm, right it should prevent the oops on load. I think we need to
control the async dp-mst better and stop it running until we have the
device registered, and so if we use in the interim, plonk a WARN_ON on
top for -next and try and fix it for realz. (Probably a
drm_dp_mst_mgr_register() hook?)

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Hi Dave,

Still waiting for your testing results on this one here ...
-Daniel

On Thu, Jan 12, 2017 at 05:17:21PM +0100, Daniel Vetter wrote:
> On Thu, Jan 12, 2017 at 05:15:56PM +0100, Daniel Vetter wrote:
> > I was under the misconception that the sysfs dev stuff can be fully
> > set up, and then registered all in one step with device_add. That's
> > true for properties and property groups, but not for parents and child
> > devices. Those must be fully registered before you can register a
> > child.
> > 
> > Add a bit of tracking to make sure that asynchronous mst connector
> > hotplugging gets this right. For consistency we rely upon the implicit
> > barriers of the connector->mutex, which is taken anyway, to ensure
> > that at least either the connector or device registration call will
> > work out.
> > 
> > Mildly tested since I can't reliably reproduce this on my mst box
> > here.
> > 
> > Reported-by: Dave Hansen <dave.hansen@intel.com>
> > Cc: Dave Hansen <dave.hansen@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Cc: stable@vger.kernel.org
> 
> > ---
> >  drivers/gpu/drm/drm_connector.c | 3 +++
> >  drivers/gpu/drm/drm_drv.c       | 4 ++++
> >  include/drm/drmP.h              | 1 +
> >  3 files changed, 8 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
> > index c75ab242f907..5999cb83d05b 100644
> > --- a/drivers/gpu/drm/drm_connector.c
> > +++ b/drivers/gpu/drm/drm_connector.c
> > @@ -378,6 +378,9 @@ int drm_connector_register(struct drm_connector *connector)
> >  {
> >  	int ret = 0;
> >  
> > +	if (!connector->dev->registered)
> > +		return 0;
> > +
> >  	mutex_lock(&connector->mutex);
> >  	if (connector->registered)
> >  		goto unlock;
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 7e24103c47f1..cad6df626678 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -749,6 +749,8 @@ int drm_dev_register(struct drm_device *dev, unsigned long flags)
> >  	if (ret)
> >  		goto err_minors;
> >  
> > +	dev->registered = true;
> > +
> >  	if (dev->driver->load) {
> >  		ret = dev->driver->load(dev, flags);
> >  		if (ret)
> > @@ -796,6 +798,8 @@ void drm_dev_unregister(struct drm_device *dev)
> >  
> >  	drm_lastclose(dev);
> >  
> > +	dev->registered = false;
> > +
> >  	if (drm_core_check_feature(dev, DRIVER_MODESET))
> >  		drm_modeset_unregister_all(dev);
> >  
> > diff --git a/include/drm/drmP.h b/include/drm/drmP.h
> > index c537c278a4be..ec105c339347 100644
> > --- a/include/drm/drmP.h
> > +++ b/include/drm/drmP.h
> > @@ -518,6 +518,7 @@ struct drm_device {
> >  	struct drm_minor *control;		/**< Control node */
> >  	struct drm_minor *primary;		/**< Primary node */
> >  	struct drm_minor *render;		/**< Render node */
> > +	bool registered;
> >  
> >  	/* currently active master for this device. Protected by master_mutex */
> >  	struct drm_master *master;
> > -- 
> > 2.5.5
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
On 01/24/2017 10:21 PM, Daniel Vetter wrote:
> Hi Dave,
> 
> Still waiting for your testing results on this one here ...

It's definitely stable with that patch applied.  No more crashes.

But, it's also definitely having difficulty re-probing to find the
monitor that's attached to the dock in some cases.  Whatever is going on
isn't fixed by poking it with xrandr.
On Wed, Jan 25, 2017 at 07:20:45AM -0800, Dave Hansen wrote:
> On 01/24/2017 10:21 PM, Daniel Vetter wrote:
> > Hi Dave,
> > 
> > Still waiting for your testing results on this one here ...
> 
> It's definitely stable with that patch applied.  No more crashes.
> 
> But, it's also definitely having difficulty re-probing to find the
> monitor that's attached to the dock in some cases.  Whatever is going on
> isn't fixed by poking it with xrandr.

Is this new? When exactly does this happen? Does the mst sink connector no
longer show up, or is the connected/disconnected status all wrong?
-Daniel
On 01/25/2017 07:38 AM, Daniel Vetter wrote:
> On Wed, Jan 25, 2017 at 07:20:45AM -0800, Dave Hansen wrote:
>> On 01/24/2017 10:21 PM, Daniel Vetter wrote:
>>> Hi Dave,
>>>
>>> Still waiting for your testing results on this one here ...
>>
>> It's definitely stable with that patch applied.  No more crashes.
>>
>> But, it's also definitely having difficulty re-probing to find the
>> monitor that's attached to the dock in some cases.  Whatever is going on
>> isn't fixed by poking it with xrandr.
> 
> Is this new? When exactly does this happen? Does the mst sink connector no
> longer show up, or is the connected/disconnected status all wrong?

It's hard to say whether it's new or not.  I *think* it worked better
before, but it also crashed pretty often, so it's hard to say.

And, yeah, I think it just gets the connected status wrong.  The
connector is still there.