[01/11] drm: Add devm_drm_dev_init/register

Submitted by Noralf Trønnes on Jan. 21, 2019, 12:21 p.m.

Details

Message ID eee7b04d-956f-5bc5-e4bd-4030be67f5cb@tronnes.org
State New
Headers show
Series "drm/tinydrm: Remove tinydrm_device" ( rev: 2 ) in DRI devel

Not browsing as part of any series.

Commit Message

Noralf Trønnes Jan. 21, 2019, 12:21 p.m.
Den 21.01.2019 10.55, skrev Daniel Vetter:
> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
>>> This adds resource managed (devres) versions of drm_dev_init() and
>>> drm_dev_register().
>>>
>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
>>> fbdev emulation as well.
>>>
>>> devm_drm_dev_register() isn't exported since there are no users.
>>>
>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
>>

<snip>

>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>> index 381581b01d48..12129772be45 100644
>>> --- a/drivers/gpu/drm/drm_drv.c
>>> +++ b/drivers/gpu/drm/drm_drv.c
>>> @@ -36,6 +36,7 @@
>>>  
>>>  #include <drm/drm_client.h>
>>>  #include <drm/drm_drv.h>
>>> +#include <drm/drm_fb_helper.h>
>>>  #include <drm/drmP.h>
>>>  
>>>  #include "drm_crtc_internal.h"
>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
>>>  }
>>>  EXPORT_SYMBOL(drm_dev_unregister);
>>>  
>>> +static void devm_drm_dev_init_release(void *data)
>>> +{
>>> +	drm_dev_put(data);
> 
> We need drm_dev_unplug() here, or this isn't safe.

This function is only used to cover the error path if probe fails before
devm_drm_dev_register() is called. devm_drm_dev_register_release() is
the one that calls unplug. There are comments about this in the functions.

> 
>>> +}
>>> +
>>> +/**
>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
>>> + * @parent: Parent device object
>>> + * @dev: DRM device
>>> + * @driver: DRM driver
>>> + *
>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
>>> + * automatically released on driver detach. You must supply a
> 
> I think a bit more clarity here would be good:
> 
> "... automatically released on driver unbind by callind drm_dev_unplug()."
> 
>>> + * &drm_driver.release callback to control the finalization explicitly.
> 
> I think a loud warning for these is in order:
> 
> "WARNING:
> 
> "In generally it is unsafe to use devm functions for drm structures
> because the lifetimes of &drm_device and the underlying &device do not
> match. This here works because it doesn't immediately free anything, but
> only calls drm_dev_unplug(), which internally decrements the &drm_device
> refcount through drm_dev_put().
> 
> "All other drm structures must still be explicitly released in the
> &drm_driver.release callback."
> 
> While thinking about this I just realized that with this design we have no
> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> kinds of things will leak badly (connectors, fb, ...), but there's no
> place to call it:
> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
>   drm_dev_unregister in there must be called _before_ we start to shut
>   down anything.
> - drm_driver.release is way too late.
> 
> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> already gone), but for a driver unload/unbind it would be nice if this
> happens automatically and in the right order.
> 
> So not sure what to do here really.

How about this change: (it breaks the rule of pulling helpers into the
core, so maybe we should put the devm_ functions into the simple KMS
helper instead?)

to see
@@ -378,11 +370,32 @@ void drm_dev_unplug(struct drm_device *dev)

        drm_dev_unregister(dev);

+       if (shutdown)
+               drm_kms_helper_poll_fini(dev);
+
        mutex_lock(&drm_global_mutex);
-       if (dev->open_count == 0)
+       if (dev->open_count == 0) {
+               if (shutdown)
+                       drm_atomic_helper_shutdown(dev);
                drm_dev_put(dev);
+       }
        mutex_unlock(&drm_global_mutex);
 }
+
+/**
+ * drm_dev_unplug - unplug a DRM device
+ * @dev: DRM device
+ *
+ * This unplugs a hotpluggable DRM device, which makes it inaccessible to
+ * userspace operations. Entry-points can use drm_dev_enter() and
+ * drm_dev_exit() to protect device resources in a race free manner. This
+ * essentially unregisters the device like drm_dev_unregister(), but can be
+ * called while there are still open users of @dev.
+ */
+void drm_dev_unplug(struct drm_device *dev)
+{
+       __drm_dev_unplug(dev, false);
+}
 EXPORT_SYMBOL(drm_dev_unplug);

 /*
@@ -920,7 +933,7 @@ EXPORT_SYMBOL(devm_drm_dev_init);

 static void devm_drm_dev_register_release(void *data)
 {
-       drm_dev_unplug(data);
+       __drm_dev_unplug(data, true);
 }

 static int devm_drm_dev_register(struct drm_device *dev)


I realised that we should take a ref on the parent device so it can be
accessed by the DRM_DEV_ functions after unplug.


Noralf.


> 
>>> + *
>>> + * Note: This function must be used together with
>>> + * devm_drm_dev_register_with_fbdev().
>>> + *
>>> + * RETURNS:
>>> + * 0 on success, or error code on failure.
>>> + */
>>> +int devm_drm_dev_init(struct device *parent,
>>> +		      struct drm_device *dev,
>>> +		      struct drm_driver *driver)
>>> +{
>>> +	int ret;
>>> +
>>> +	if (WARN_ON(!parent || !driver->release))
>>> +		return -EINVAL;
>>> +
>>> +	ret = drm_dev_init(dev, driver, parent);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/*
>>> +	 * This is a temporary release action that is used if probing fails
>>> +	 * before devm_drm_dev_register() is called.
>>> +	 */
>>> +	ret = devm_add_action(parent, devm_drm_dev_init_release, dev);
>>> +	if (ret)
>>> +		devm_drm_dev_init_release(dev);
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL(devm_drm_dev_init);
>>> +
>>> +static void devm_drm_dev_register_release(void *data)
>>> +{
>>> +	drm_dev_unplug(data);
>>> +}
>>> +
>>> +static int devm_drm_dev_register(struct drm_device *dev)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = drm_dev_register(dev, 0);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/*
>>> +	 * This has now served it's purpose, remove it to not mess up ref
>>> +	 * counting.
>>> +	 */
>>> +	devm_remove_action(dev->dev, devm_drm_dev_init_release, dev);
>>> +
>>> +	ret = devm_add_action(dev->dev, devm_drm_dev_register_release, dev);
>>> +	if (ret)
>>> +		devm_drm_dev_register_release(dev);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +/**
>>> + * devm_drm_dev_register_with_fbdev - Resource managed drm_dev_register()
>>> + *                                    including generic fbdev emulation
>>> + * @dev: DRM device to register
>>> + * @fbdev_bpp: Preferred bits per pixel for fbdev (optional)
>>> + *
>>> + * Managed drm_dev_register() that also calls drm_fbdev_generic_setup().
>>> + * The DRM device registered with this function is automatically unregistered on
>>> + * driver detach using drm_dev_unplug().
>>> + *
>>> + * Note: This function must be used together with devm_drm_dev_init().
>>> + *
>>> + * For testing driver detach can be triggered manually by writing to the driver
>>> + * 'unbind' file.
>>> + *
>>> + * RETURNS:
>>> + * 0 on success, negative error code on failure.
>>> + */
>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
>>> +				     unsigned int fbdev_bpp)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = devm_drm_dev_register(dev);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	drm_fbdev_generic_setup(dev, fbdev_bpp);
>>> +
>>> +	return 0;
>>> +}
>>> +EXPORT_SYMBOL(devm_drm_dev_register_with_fbdev);
>>> +
>>>  /**
>>>   * drm_dev_set_unique - Set the unique name of a DRM device
>>>   * @dev: device of which to set the unique name
>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>> index 35af23f5fa0d..c3f0477f2e7f 100644
>>> --- a/include/drm/drm_drv.h
>>> +++ b/include/drm/drm_drv.h
>>> @@ -628,6 +628,12 @@ struct drm_device *drm_dev_alloc(struct drm_driver *driver,
>>>  int drm_dev_register(struct drm_device *dev, unsigned long flags);
>>>  void drm_dev_unregister(struct drm_device *dev);
>>>  
>>> +int devm_drm_dev_init(struct device *parent,
>>> +		      struct drm_device *dev,
>>> +		      struct drm_driver *driver);
>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
>>> +				     unsigned int fbdev_bpp);
>>> +
>>>  void drm_dev_get(struct drm_device *dev);
>>>  void drm_dev_put(struct drm_device *dev);
>>>  void drm_put_dev(struct drm_device *dev);
>>> -- 
>>> 2.20.1
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
>

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 12129772be45..7ed9550baff6 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -34,7 +34,9 @@ 
 #include <linux/slab.h>
 #include <linux/srcu.h>

+#include <drm/drm_atomic_helper.h>
 #include <drm/drm_client.h>
+#include <drm/drm_crtc_helper.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_fb_helper.h>
 #include <drm/drmP.h>
@@ -355,17 +357,7 @@  void drm_dev_exit(int idx)
 }
 EXPORT_SYMBOL(drm_dev_exit);

-/**
- * drm_dev_unplug - unplug a DRM device
- * @dev: DRM device
- *
- * This unplugs a hotpluggable DRM device, which makes it inaccessible to
- * userspace operations. Entry-points can use drm_dev_enter() and
- * drm_dev_exit() to protect device resources in a race free manner. This
- * essentially unregisters the device like drm_dev_unregister(), but can be
- * called while there are still open users of @dev.
- */
-void drm_dev_unplug(struct drm_device *dev)
+static void __drm_dev_unplug(struct drm_device *dev, bool shutdown)
 {
        /*
         * After synchronizing any critical read section is guaranteed

Comments

On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> 
> 
> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> >> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> >>> This adds resource managed (devres) versions of drm_dev_init() and
> >>> drm_dev_register().
> >>>
> >>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> >>> fbdev emulation as well.
> >>>
> >>> devm_drm_dev_register() isn't exported since there are no users.
> >>>
> >>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> >>
> 
> <snip>
> 
> >>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>> index 381581b01d48..12129772be45 100644
> >>> --- a/drivers/gpu/drm/drm_drv.c
> >>> +++ b/drivers/gpu/drm/drm_drv.c
> >>> @@ -36,6 +36,7 @@
> >>>  
> >>>  #include <drm/drm_client.h>
> >>>  #include <drm/drm_drv.h>
> >>> +#include <drm/drm_fb_helper.h>
> >>>  #include <drm/drmP.h>
> >>>  
> >>>  #include "drm_crtc_internal.h"
> >>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>  }
> >>>  EXPORT_SYMBOL(drm_dev_unregister);
> >>>  
> >>> +static void devm_drm_dev_init_release(void *data)
> >>> +{
> >>> +	drm_dev_put(data);
> > 
> > We need drm_dev_unplug() here, or this isn't safe.
> 
> This function is only used to cover the error path if probe fails before
> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> the one that calls unplug. There are comments about this in the functions.

I think I get a prize for being ignorant and blind :-/

> 
> > 
> >>> +}
> >>> +
> >>> +/**
> >>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> >>> + * @parent: Parent device object
> >>> + * @dev: DRM device
> >>> + * @driver: DRM driver
> >>> + *
> >>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> >>> + * automatically released on driver detach. You must supply a
> > 
> > I think a bit more clarity here would be good:
> > 
> > "... automatically released on driver unbind by callind drm_dev_unplug()."
> > 
> >>> + * &drm_driver.release callback to control the finalization explicitly.
> > 
> > I think a loud warning for these is in order:
> > 
> > "WARNING:
> > 
> > "In generally it is unsafe to use devm functions for drm structures
> > because the lifetimes of &drm_device and the underlying &device do not
> > match. This here works because it doesn't immediately free anything, but
> > only calls drm_dev_unplug(), which internally decrements the &drm_device
> > refcount through drm_dev_put().
> > 
> > "All other drm structures must still be explicitly released in the
> > &drm_driver.release callback."
> > 
> > While thinking about this I just realized that with this design we have no
> > good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > kinds of things will leak badly (connectors, fb, ...), but there's no
> > place to call it:
> > - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> >   drm_dev_unregister in there must be called _before_ we start to shut
> >   down anything.
> > - drm_driver.release is way too late.
> > 
> > Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > already gone), but for a driver unload/unbind it would be nice if this
> > happens automatically and in the right order.
> > 
> > So not sure what to do here really.
> 
> How about this change: (it breaks the rule of pulling helpers into the
> core, so maybe we should put the devm_ functions into the simple KMS
> helper instead?)

Yeah smells a bit much like midlayer ... What would work is having a pile
more devm_ helper functions, so that we onion-unwrap everything correctly,
and in the right order. So:

- devm_drm_dev_init (always does a drm_dev_put())

- devm_drm_poll_enable (shuts down the poll helper with a devm action)

- devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)

- devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
  can call drm_dev_unplug() unconditionally).

We'd need to make sure some of the cleanup actions dtrt when the device is
gone, but I think we can achieve that by liberally sprinkling
drm_dev_enter/exit over them, e.g. the the cleanup action for
drm_mode_config_reset would be:

{
	if (drm_dev_enter())
		return;

	drm_atomic_helper_shutdown();

	drm_dev_exit();
}

This would be analog to your shutdown parameter below.

Essentially I think we can only do the drm_dev_unplug autocleanup in a
given driver from devm_drm_dev_register iff all the cleanup actions have
been devm-ified, and there's nothing left in it's unbind callback. Because
if anything is left in its unbind callback that's a bug, since
drm_dev_unregister() (called through drm_dev_unplug) is the very first (or
at least one of the very first, before we start cleanup up) functions that
need to be called.
-Daniel

> 
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 12129772be45..7ed9550baff6 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -34,7 +34,9 @@
>  #include <linux/slab.h>
>  #include <linux/srcu.h>
> 
> +#include <drm/drm_atomic_helper.h>
>  #include <drm/drm_client.h>
> +#include <drm/drm_crtc_helper.h>
>  #include <drm/drm_drv.h>
>  #include <drm/drm_fb_helper.h>
>  #include <drm/drmP.h>
> @@ -355,17 +357,7 @@ void drm_dev_exit(int idx)
>  }
>  EXPORT_SYMBOL(drm_dev_exit);
> 
> -/**
> - * drm_dev_unplug - unplug a DRM device
> - * @dev: DRM device
> - *
> - * This unplugs a hotpluggable DRM device, which makes it inaccessible to
> - * userspace operations. Entry-points can use drm_dev_enter() and
> - * drm_dev_exit() to protect device resources in a race free manner. This
> - * essentially unregisters the device like drm_dev_unregister(), but can be
> - * called while there are still open users of @dev.
> - */
> -void drm_dev_unplug(struct drm_device *dev)
> +static void __drm_dev_unplug(struct drm_device *dev, bool shutdown)
>  {
>         /*
>          * After synchronizing any critical read section is guaranteed
> to see
> @@ -378,11 +370,32 @@ void drm_dev_unplug(struct drm_device *dev)
> 
>         drm_dev_unregister(dev);
> 
> +       if (shutdown)
> +               drm_kms_helper_poll_fini(dev);
> +
>         mutex_lock(&drm_global_mutex);
> -       if (dev->open_count == 0)
> +       if (dev->open_count == 0) {
> +               if (shutdown)
> +                       drm_atomic_helper_shutdown(dev);
>                 drm_dev_put(dev);
> +       }
>         mutex_unlock(&drm_global_mutex);
>  }
> +
> +/**
> + * drm_dev_unplug - unplug a DRM device
> + * @dev: DRM device
> + *
> + * This unplugs a hotpluggable DRM device, which makes it inaccessible to
> + * userspace operations. Entry-points can use drm_dev_enter() and
> + * drm_dev_exit() to protect device resources in a race free manner. This
> + * essentially unregisters the device like drm_dev_unregister(), but can be
> + * called while there are still open users of @dev.
> + */
> +void drm_dev_unplug(struct drm_device *dev)
> +{
> +       __drm_dev_unplug(dev, false);
> +}
>  EXPORT_SYMBOL(drm_dev_unplug);
> 
>  /*
> @@ -920,7 +933,7 @@ EXPORT_SYMBOL(devm_drm_dev_init);
> 
>  static void devm_drm_dev_register_release(void *data)
>  {
> -       drm_dev_unplug(data);
> +       __drm_dev_unplug(data, true);
>  }
> 
>  static int devm_drm_dev_register(struct drm_device *dev)
> 
> 
> I realised that we should take a ref on the parent device so it can be
> accessed by the DRM_DEV_ functions after unplug.
> 
> 
> Noralf.
> 
> 
> > 
> >>> + *
> >>> + * Note: This function must be used together with
> >>> + * devm_drm_dev_register_with_fbdev().
> >>> + *
> >>> + * RETURNS:
> >>> + * 0 on success, or error code on failure.
> >>> + */
> >>> +int devm_drm_dev_init(struct device *parent,
> >>> +		      struct drm_device *dev,
> >>> +		      struct drm_driver *driver)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	if (WARN_ON(!parent || !driver->release))
> >>> +		return -EINVAL;
> >>> +
> >>> +	ret = drm_dev_init(dev, driver, parent);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	/*
> >>> +	 * This is a temporary release action that is used if probing fails
> >>> +	 * before devm_drm_dev_register() is called.
> >>> +	 */
> >>> +	ret = devm_add_action(parent, devm_drm_dev_init_release, dev);
> >>> +	if (ret)
> >>> +		devm_drm_dev_init_release(dev);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +EXPORT_SYMBOL(devm_drm_dev_init);
> >>> +
> >>> +static void devm_drm_dev_register_release(void *data)
> >>> +{
> >>> +	drm_dev_unplug(data);
> >>> +}
> >>> +
> >>> +static int devm_drm_dev_register(struct drm_device *dev)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	ret = drm_dev_register(dev, 0);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	/*
> >>> +	 * This has now served it's purpose, remove it to not mess up ref
> >>> +	 * counting.
> >>> +	 */
> >>> +	devm_remove_action(dev->dev, devm_drm_dev_init_release, dev);
> >>> +
> >>> +	ret = devm_add_action(dev->dev, devm_drm_dev_register_release, dev);
> >>> +	if (ret)
> >>> +		devm_drm_dev_register_release(dev);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +/**
> >>> + * devm_drm_dev_register_with_fbdev - Resource managed drm_dev_register()
> >>> + *                                    including generic fbdev emulation
> >>> + * @dev: DRM device to register
> >>> + * @fbdev_bpp: Preferred bits per pixel for fbdev (optional)
> >>> + *
> >>> + * Managed drm_dev_register() that also calls drm_fbdev_generic_setup().
> >>> + * The DRM device registered with this function is automatically unregistered on
> >>> + * driver detach using drm_dev_unplug().
> >>> + *
> >>> + * Note: This function must be used together with devm_drm_dev_init().
> >>> + *
> >>> + * For testing driver detach can be triggered manually by writing to the driver
> >>> + * 'unbind' file.
> >>> + *
> >>> + * RETURNS:
> >>> + * 0 on success, negative error code on failure.
> >>> + */
> >>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
> >>> +				     unsigned int fbdev_bpp)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	ret = devm_drm_dev_register(dev);
> >>> +	if (ret)
> >>> +		return ret;
> >>> +
> >>> +	drm_fbdev_generic_setup(dev, fbdev_bpp);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +EXPORT_SYMBOL(devm_drm_dev_register_with_fbdev);
> >>> +
> >>>  /**
> >>>   * drm_dev_set_unique - Set the unique name of a DRM device
> >>>   * @dev: device of which to set the unique name
> >>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> >>> index 35af23f5fa0d..c3f0477f2e7f 100644
> >>> --- a/include/drm/drm_drv.h
> >>> +++ b/include/drm/drm_drv.h
> >>> @@ -628,6 +628,12 @@ struct drm_device *drm_dev_alloc(struct drm_driver *driver,
> >>>  int drm_dev_register(struct drm_device *dev, unsigned long flags);
> >>>  void drm_dev_unregister(struct drm_device *dev);
> >>>  
> >>> +int devm_drm_dev_init(struct device *parent,
> >>> +		      struct drm_device *dev,
> >>> +		      struct drm_driver *driver);
> >>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
> >>> +				     unsigned int fbdev_bpp);
> >>> +
> >>>  void drm_dev_get(struct drm_device *dev);
> >>>  void drm_dev_put(struct drm_device *dev);
> >>>  void drm_put_dev(struct drm_device *dev);
> >>> -- 
> >>> 2.20.1
> >>>
> >>> _______________________________________________
> >>> dri-devel mailing list
> >>> dri-devel@lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>
> >> -- 
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
> >
Den 22.01.2019 10.32, skrev Daniel Vetter:
> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 21.01.2019 10.55, skrev Daniel Vetter:
>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
>>>>> This adds resource managed (devres) versions of drm_dev_init() and
>>>>> drm_dev_register().
>>>>>
>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
>>>>> fbdev emulation as well.
>>>>>
>>>>> devm_drm_dev_register() isn't exported since there are no users.
>>>>>
>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
>>>>
>>
>> <snip>
>>
>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>> index 381581b01d48..12129772be45 100644
>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>> @@ -36,6 +36,7 @@
>>>>>  
>>>>>  #include <drm/drm_client.h>
>>>>>  #include <drm/drm_drv.h>
>>>>> +#include <drm/drm_fb_helper.h>
>>>>>  #include <drm/drmP.h>
>>>>>  
>>>>>  #include "drm_crtc_internal.h"
>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
>>>>>  }
>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
>>>>>  
>>>>> +static void devm_drm_dev_init_release(void *data)
>>>>> +{
>>>>> +	drm_dev_put(data);
>>>
>>> We need drm_dev_unplug() here, or this isn't safe.
>>
>> This function is only used to cover the error path if probe fails before
>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
>> the one that calls unplug. There are comments about this in the functions.
> 
> I think I get a prize for being ignorant and blind :-/
> 
>>
>>>
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
>>>>> + * @parent: Parent device object
>>>>> + * @dev: DRM device
>>>>> + * @driver: DRM driver
>>>>> + *
>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
>>>>> + * automatically released on driver detach. You must supply a
>>>
>>> I think a bit more clarity here would be good:
>>>
>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
>>>
>>>>> + * &drm_driver.release callback to control the finalization explicitly.
>>>
>>> I think a loud warning for these is in order:
>>>
>>> "WARNING:
>>>
>>> "In generally it is unsafe to use devm functions for drm structures
>>> because the lifetimes of &drm_device and the underlying &device do not
>>> match. This here works because it doesn't immediately free anything, but
>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
>>> refcount through drm_dev_put().
>>>
>>> "All other drm structures must still be explicitly released in the
>>> &drm_driver.release callback."
>>>
>>> While thinking about this I just realized that with this design we have no
>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
>>> kinds of things will leak badly (connectors, fb, ...), but there's no
>>> place to call it:
>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
>>>   drm_dev_unregister in there must be called _before_ we start to shut
>>>   down anything.
>>> - drm_driver.release is way too late.
>>>
>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
>>> already gone), but for a driver unload/unbind it would be nice if this
>>> happens automatically and in the right order.
>>>
>>> So not sure what to do here really.
>>
>> How about this change: (it breaks the rule of pulling helpers into the
>> core, so maybe we should put the devm_ functions into the simple KMS
>> helper instead?)
> 
> Yeah smells a bit much like midlayer ... What would work is having a pile
> more devm_ helper functions, so that we onion-unwrap everything correctly,
> and in the right order. So:
> 
> - devm_drm_dev_init (always does a drm_dev_put())
> 
> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> 
> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> 
> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
>   can call drm_dev_unplug() unconditionally).
> 

Beautiful! I really like this, it's very flexible.

Where should devm_drm_mode_config_reset() live? It will pull in the
atomic helper...

> We'd need to make sure some of the cleanup actions dtrt when the device is
> gone, but I think we can achieve that by liberally sprinkling
> drm_dev_enter/exit over them, e.g. the the cleanup action for
> drm_mode_config_reset would be:
> 
> {
> 	if (drm_dev_enter())
> 		return;
> 
> 	drm_atomic_helper_shutdown();
> 
> 	drm_dev_exit();
> }
> 

drm_dev_enter() can only be used to check whether the drm_device is
registered or not, it doesn't say anything about the state of the parent
device.

All we know is that the device is being unbound from the driver, we
don't know if it's the device that's being removed or if it's the driver
that's unregistered.

I have looked at the various call chains:

driver_unregister ->
    bus_remove_driver ->
        driver_detach ->
            device_release_driver_internal

device_unregister ->
    device_del ->
        bus_remove_device ->
            device_release_driver ->
                device_release_driver_internal

sysfs: unbind_store ->
    device_release_driver ->
        device_release_driver_internal

The only way I've found to differentiate between these in a cleanup
action is that device_del() uses the bus notifier to signal
BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
notifier could be used to set a drm_device->parent_removed flag.

Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
everything get disabled when userspace closes? It does in my tinydrm
world :-)

Noralf.


> This would be analog to your shutdown parameter below.
> 
> Essentially I think we can only do the drm_dev_unplug autocleanup in a
> given driver from devm_drm_dev_register iff all the cleanup actions have
> been devm-ified, and there's nothing left in it's unbind callback. Because
> if anything is left in its unbind callback that's a bug, since
> drm_dev_unregister() (called through drm_dev_unplug) is the very first (or
> at least one of the very first, before we start cleanup up) functions that
> need to be called.
> -Daniel
> 
>>
>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>> index 12129772be45..7ed9550baff6 100644
>> --- a/drivers/gpu/drm/drm_drv.c
>> +++ b/drivers/gpu/drm/drm_drv.c
>> @@ -34,7 +34,9 @@
>>  #include <linux/slab.h>
>>  #include <linux/srcu.h>
>>
>> +#include <drm/drm_atomic_helper.h>
>>  #include <drm/drm_client.h>
>> +#include <drm/drm_crtc_helper.h>
>>  #include <drm/drm_drv.h>
>>  #include <drm/drm_fb_helper.h>
>>  #include <drm/drmP.h>
>> @@ -355,17 +357,7 @@ void drm_dev_exit(int idx)
>>  }
>>  EXPORT_SYMBOL(drm_dev_exit);
>>
>> -/**
>> - * drm_dev_unplug - unplug a DRM device
>> - * @dev: DRM device
>> - *
>> - * This unplugs a hotpluggable DRM device, which makes it inaccessible to
>> - * userspace operations. Entry-points can use drm_dev_enter() and
>> - * drm_dev_exit() to protect device resources in a race free manner. This
>> - * essentially unregisters the device like drm_dev_unregister(), but can be
>> - * called while there are still open users of @dev.
>> - */
>> -void drm_dev_unplug(struct drm_device *dev)
>> +static void __drm_dev_unplug(struct drm_device *dev, bool shutdown)
>>  {
>>         /*
>>          * After synchronizing any critical read section is guaranteed
>> to see
>> @@ -378,11 +370,32 @@ void drm_dev_unplug(struct drm_device *dev)
>>
>>         drm_dev_unregister(dev);
>>
>> +       if (shutdown)
>> +               drm_kms_helper_poll_fini(dev);
>> +
>>         mutex_lock(&drm_global_mutex);
>> -       if (dev->open_count == 0)
>> +       if (dev->open_count == 0) {
>> +               if (shutdown)
>> +                       drm_atomic_helper_shutdown(dev);
>>                 drm_dev_put(dev);
>> +       }
>>         mutex_unlock(&drm_global_mutex);
>>  }
>> +
>> +/**
>> + * drm_dev_unplug - unplug a DRM device
>> + * @dev: DRM device
>> + *
>> + * This unplugs a hotpluggable DRM device, which makes it inaccessible to
>> + * userspace operations. Entry-points can use drm_dev_enter() and
>> + * drm_dev_exit() to protect device resources in a race free manner. This
>> + * essentially unregisters the device like drm_dev_unregister(), but can be
>> + * called while there are still open users of @dev.
>> + */
>> +void drm_dev_unplug(struct drm_device *dev)
>> +{
>> +       __drm_dev_unplug(dev, false);
>> +}
>>  EXPORT_SYMBOL(drm_dev_unplug);
>>
>>  /*
>> @@ -920,7 +933,7 @@ EXPORT_SYMBOL(devm_drm_dev_init);
>>
>>  static void devm_drm_dev_register_release(void *data)
>>  {
>> -       drm_dev_unplug(data);
>> +       __drm_dev_unplug(data, true);
>>  }
>>
>>  static int devm_drm_dev_register(struct drm_device *dev)
>>
>>
>> I realised that we should take a ref on the parent device so it can be
>> accessed by the DRM_DEV_ functions after unplug.
>>
>>
>> Noralf.
>>
>>
>>>
>>>>> + *
>>>>> + * Note: This function must be used together with
>>>>> + * devm_drm_dev_register_with_fbdev().
>>>>> + *
>>>>> + * RETURNS:
>>>>> + * 0 on success, or error code on failure.
>>>>> + */
>>>>> +int devm_drm_dev_init(struct device *parent,
>>>>> +		      struct drm_device *dev,
>>>>> +		      struct drm_driver *driver)
>>>>> +{
>>>>> +	int ret;
>>>>> +
>>>>> +	if (WARN_ON(!parent || !driver->release))
>>>>> +		return -EINVAL;
>>>>> +
>>>>> +	ret = drm_dev_init(dev, driver, parent);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	/*
>>>>> +	 * This is a temporary release action that is used if probing fails
>>>>> +	 * before devm_drm_dev_register() is called.
>>>>> +	 */
>>>>> +	ret = devm_add_action(parent, devm_drm_dev_init_release, dev);
>>>>> +	if (ret)
>>>>> +		devm_drm_dev_init_release(dev);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL(devm_drm_dev_init);
>>>>> +
>>>>> +static void devm_drm_dev_register_release(void *data)
>>>>> +{
>>>>> +	drm_dev_unplug(data);
>>>>> +}
>>>>> +
>>>>> +static int devm_drm_dev_register(struct drm_device *dev)
>>>>> +{
>>>>> +	int ret;
>>>>> +
>>>>> +	ret = drm_dev_register(dev, 0);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	/*
>>>>> +	 * This has now served it's purpose, remove it to not mess up ref
>>>>> +	 * counting.
>>>>> +	 */
>>>>> +	devm_remove_action(dev->dev, devm_drm_dev_init_release, dev);
>>>>> +
>>>>> +	ret = devm_add_action(dev->dev, devm_drm_dev_register_release, dev);
>>>>> +	if (ret)
>>>>> +		devm_drm_dev_register_release(dev);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * devm_drm_dev_register_with_fbdev - Resource managed drm_dev_register()
>>>>> + *                                    including generic fbdev emulation
>>>>> + * @dev: DRM device to register
>>>>> + * @fbdev_bpp: Preferred bits per pixel for fbdev (optional)
>>>>> + *
>>>>> + * Managed drm_dev_register() that also calls drm_fbdev_generic_setup().
>>>>> + * The DRM device registered with this function is automatically unregistered on
>>>>> + * driver detach using drm_dev_unplug().
>>>>> + *
>>>>> + * Note: This function must be used together with devm_drm_dev_init().
>>>>> + *
>>>>> + * For testing driver detach can be triggered manually by writing to the driver
>>>>> + * 'unbind' file.
>>>>> + *
>>>>> + * RETURNS:
>>>>> + * 0 on success, negative error code on failure.
>>>>> + */
>>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
>>>>> +				     unsigned int fbdev_bpp)
>>>>> +{
>>>>> +	int ret;
>>>>> +
>>>>> +	ret = devm_drm_dev_register(dev);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	drm_fbdev_generic_setup(dev, fbdev_bpp);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>> +EXPORT_SYMBOL(devm_drm_dev_register_with_fbdev);
>>>>> +
>>>>>  /**
>>>>>   * drm_dev_set_unique - Set the unique name of a DRM device
>>>>>   * @dev: device of which to set the unique name
>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>>> index 35af23f5fa0d..c3f0477f2e7f 100644
>>>>> --- a/include/drm/drm_drv.h
>>>>> +++ b/include/drm/drm_drv.h
>>>>> @@ -628,6 +628,12 @@ struct drm_device *drm_dev_alloc(struct drm_driver *driver,
>>>>>  int drm_dev_register(struct drm_device *dev, unsigned long flags);
>>>>>  void drm_dev_unregister(struct drm_device *dev);
>>>>>  
>>>>> +int devm_drm_dev_init(struct device *parent,
>>>>> +		      struct drm_device *dev,
>>>>> +		      struct drm_driver *driver);
>>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
>>>>> +				     unsigned int fbdev_bpp);
>>>>> +
>>>>>  void drm_dev_get(struct drm_device *dev);
>>>>>  void drm_dev_put(struct drm_device *dev);
>>>>>  void drm_put_dev(struct drm_device *dev);
>>>>> -- 
>>>>> 2.20.1
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>> -- 
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> http://blog.ffwll.ch
>>>
>
On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
>
>
>
> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> >>
> >>
> >> Den 21.01.2019 10.55, skrev Daniel Vetter:
> >>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> >>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> >>>>> This adds resource managed (devres) versions of drm_dev_init() and
> >>>>> drm_dev_register().
> >>>>>
> >>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> >>>>> fbdev emulation as well.
> >>>>>
> >>>>> devm_drm_dev_register() isn't exported since there are no users.
> >>>>>
> >>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> >>>>
> >>
> >> <snip>
> >>
> >>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>>> index 381581b01d48..12129772be45 100644
> >>>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>>> @@ -36,6 +36,7 @@
> >>>>>
> >>>>>  #include <drm/drm_client.h>
> >>>>>  #include <drm/drm_drv.h>
> >>>>> +#include <drm/drm_fb_helper.h>
> >>>>>  #include <drm/drmP.h>
> >>>>>
> >>>>>  #include "drm_crtc_internal.h"
> >>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>>>  }
> >>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> >>>>>
> >>>>> +static void devm_drm_dev_init_release(void *data)
> >>>>> +{
> >>>>> + drm_dev_put(data);
> >>>
> >>> We need drm_dev_unplug() here, or this isn't safe.
> >>
> >> This function is only used to cover the error path if probe fails before
> >> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> >> the one that calls unplug. There are comments about this in the functions.
> >
> > I think I get a prize for being ignorant and blind :-/
> >
> >>
> >>>
> >>>>> +}
> >>>>> +
> >>>>> +/**
> >>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> >>>>> + * @parent: Parent device object
> >>>>> + * @dev: DRM device
> >>>>> + * @driver: DRM driver
> >>>>> + *
> >>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> >>>>> + * automatically released on driver detach. You must supply a
> >>>
> >>> I think a bit more clarity here would be good:
> >>>
> >>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> >>>
> >>>>> + * &drm_driver.release callback to control the finalization explicitly.
> >>>
> >>> I think a loud warning for these is in order:
> >>>
> >>> "WARNING:
> >>>
> >>> "In generally it is unsafe to use devm functions for drm structures
> >>> because the lifetimes of &drm_device and the underlying &device do not
> >>> match. This here works because it doesn't immediately free anything, but
> >>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> >>> refcount through drm_dev_put().
> >>>
> >>> "All other drm structures must still be explicitly released in the
> >>> &drm_driver.release callback."
> >>>
> >>> While thinking about this I just realized that with this design we have no
> >>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> >>> kinds of things will leak badly (connectors, fb, ...), but there's no
> >>> place to call it:
> >>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> >>>   drm_dev_unregister in there must be called _before_ we start to shut
> >>>   down anything.
> >>> - drm_driver.release is way too late.
> >>>
> >>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> >>> already gone), but for a driver unload/unbind it would be nice if this
> >>> happens automatically and in the right order.
> >>>
> >>> So not sure what to do here really.
> >>
> >> How about this change: (it breaks the rule of pulling helpers into the
> >> core, so maybe we should put the devm_ functions into the simple KMS
> >> helper instead?)
> >
> > Yeah smells a bit much like midlayer ... What would work is having a pile
> > more devm_ helper functions, so that we onion-unwrap everything correctly,
> > and in the right order. So:
> >
> > - devm_drm_dev_init (always does a drm_dev_put())
> >
> > - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> >
> > - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> >
> > - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> >   can call drm_dev_unplug() unconditionally).
> >
>
> Beautiful! I really like this, it's very flexible.
>
> Where should devm_drm_mode_config_reset() live? It will pull in the
> atomic helper...

I think a new drm_devm.c helper would be nice for all this stuff.
Especially since you can't freely mix devm-based setup/cleanup with
normal cleanup I think it'd be good to have it all together in one
place. And perhaps even a code example in the DOC: overview.

> > We'd need to make sure some of the cleanup actions dtrt when the device is
> > gone, but I think we can achieve that by liberally sprinkling
> > drm_dev_enter/exit over them, e.g. the the cleanup action for
> > drm_mode_config_reset would be:
> >
> > {
> >       if (drm_dev_enter())
> >               return;
> >
> >       drm_atomic_helper_shutdown();
> >
> >       drm_dev_exit();
> > }
> >
>
> drm_dev_enter() can only be used to check whether the drm_device is
> registered or not, it doesn't say anything about the state of the parent
> device.
>
> All we know is that the device is being unbound from the driver, we
> don't know if it's the device that's being removed or if it's the driver
> that's unregistered.

You're right, both paths will have called drm_dev_unplug by then.
Silly me. I really liked my idea :-)

> I have looked at the various call chains:
>
> driver_unregister ->
>     bus_remove_driver ->
>         driver_detach ->
>             device_release_driver_internal
>
> device_unregister ->
>     device_del ->
>         bus_remove_device ->
>             device_release_driver ->
>                 device_release_driver_internal
>
> sysfs: unbind_store ->
>     device_release_driver ->
>         device_release_driver_internal
>
> The only way I've found to differentiate between these in a cleanup
> action is that device_del() uses the bus notifier to signal
> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> notifier could be used to set a drm_device->parent_removed flag.

Hm, this might upset Greg KH's code taste ... maybe there's a better
way to do this, but best to prototype a patch with this, send it to
him and ask how to :-)

> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> everything get disabled when userspace closes? It does in my tinydrm
> world :-)

Iirc fbdev/fbcon can result in leaks ... at least we've had patches
where drivers leaked drm_connector and drm_framebuffer objects, and
they've been fixed by calling drm_atomic_helper_shutdown() in the
unload path. Maybe this is cargo-culting, but it goes way back to
pre-atomic, where drivers called drm_helper_force_disable_all().

If you try to move the fbcon to your tinydrm drivers (con2fb is
apparently the cmdline tool you need, never tried it, I only switch
the kernel's console between fbcon and dummycon and back, not what
fbcon drivers itself), then I think you should be able to reproduce.
And maybe you have a better idea how to deal with this all.

Note also that there's been proposals floating around to only close an
drm_framebuffer, not also remove it (like the current RMFB ioctl
does), with that closing userspace would not necessarily lead to a
full cleanup.

Another thing (which doesn't apply to drm_simple_display_pipe drivers)
is if you have the display on, but no planes showing (i.e. all black).
Then all the fbs will be cleaned up, but drm_connector will be
leaking. That's a case where you need drm_atomic_helper_shutdown()
even if fbcon/fbdev isn't even enabled.

Cheers, Daniel


> > This would be analog to your shutdown parameter below.
> >
> > Essentially I think we can only do the drm_dev_unplug autocleanup in a
> > given driver from devm_drm_dev_register iff all the cleanup actions have
> > been devm-ified, and there's nothing left in it's unbind callback. Because
> > if anything is left in its unbind callback that's a bug, since
> > drm_dev_unregister() (called through drm_dev_unplug) is the very first (or
> > at least one of the very first, before we start cleanup up) functions that
> > need to be called.
> > -Daniel
> >
> >>
> >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >> index 12129772be45..7ed9550baff6 100644
> >> --- a/drivers/gpu/drm/drm_drv.c
> >> +++ b/drivers/gpu/drm/drm_drv.c
> >> @@ -34,7 +34,9 @@
> >>  #include <linux/slab.h>
> >>  #include <linux/srcu.h>
> >>
> >> +#include <drm/drm_atomic_helper.h>
> >>  #include <drm/drm_client.h>
> >> +#include <drm/drm_crtc_helper.h>
> >>  #include <drm/drm_drv.h>
> >>  #include <drm/drm_fb_helper.h>
> >>  #include <drm/drmP.h>
> >> @@ -355,17 +357,7 @@ void drm_dev_exit(int idx)
> >>  }
> >>  EXPORT_SYMBOL(drm_dev_exit);
> >>
> >> -/**
> >> - * drm_dev_unplug - unplug a DRM device
> >> - * @dev: DRM device
> >> - *
> >> - * This unplugs a hotpluggable DRM device, which makes it inaccessible to
> >> - * userspace operations. Entry-points can use drm_dev_enter() and
> >> - * drm_dev_exit() to protect device resources in a race free manner. This
> >> - * essentially unregisters the device like drm_dev_unregister(), but can be
> >> - * called while there are still open users of @dev.
> >> - */
> >> -void drm_dev_unplug(struct drm_device *dev)
> >> +static void __drm_dev_unplug(struct drm_device *dev, bool shutdown)
> >>  {
> >>         /*
> >>          * After synchronizing any critical read section is guaranteed
> >> to see
> >> @@ -378,11 +370,32 @@ void drm_dev_unplug(struct drm_device *dev)
> >>
> >>         drm_dev_unregister(dev);
> >>
> >> +       if (shutdown)
> >> +               drm_kms_helper_poll_fini(dev);
> >> +
> >>         mutex_lock(&drm_global_mutex);
> >> -       if (dev->open_count == 0)
> >> +       if (dev->open_count == 0) {
> >> +               if (shutdown)
> >> +                       drm_atomic_helper_shutdown(dev);
> >>                 drm_dev_put(dev);
> >> +       }
> >>         mutex_unlock(&drm_global_mutex);
> >>  }
> >> +
> >> +/**
> >> + * drm_dev_unplug - unplug a DRM device
> >> + * @dev: DRM device
> >> + *
> >> + * This unplugs a hotpluggable DRM device, which makes it inaccessible to
> >> + * userspace operations. Entry-points can use drm_dev_enter() and
> >> + * drm_dev_exit() to protect device resources in a race free manner. This
> >> + * essentially unregisters the device like drm_dev_unregister(), but can be
> >> + * called while there are still open users of @dev.
> >> + */
> >> +void drm_dev_unplug(struct drm_device *dev)
> >> +{
> >> +       __drm_dev_unplug(dev, false);
> >> +}
> >>  EXPORT_SYMBOL(drm_dev_unplug);
> >>
> >>  /*
> >> @@ -920,7 +933,7 @@ EXPORT_SYMBOL(devm_drm_dev_init);
> >>
> >>  static void devm_drm_dev_register_release(void *data)
> >>  {
> >> -       drm_dev_unplug(data);
> >> +       __drm_dev_unplug(data, true);
> >>  }
> >>
> >>  static int devm_drm_dev_register(struct drm_device *dev)
> >>
> >>
> >> I realised that we should take a ref on the parent device so it can be
> >> accessed by the DRM_DEV_ functions after unplug.
> >>
> >>
> >> Noralf.
> >>
> >>
> >>>
> >>>>> + *
> >>>>> + * Note: This function must be used together with
> >>>>> + * devm_drm_dev_register_with_fbdev().
> >>>>> + *
> >>>>> + * RETURNS:
> >>>>> + * 0 on success, or error code on failure.
> >>>>> + */
> >>>>> +int devm_drm_dev_init(struct device *parent,
> >>>>> +               struct drm_device *dev,
> >>>>> +               struct drm_driver *driver)
> >>>>> +{
> >>>>> + int ret;
> >>>>> +
> >>>>> + if (WARN_ON(!parent || !driver->release))
> >>>>> +         return -EINVAL;
> >>>>> +
> >>>>> + ret = drm_dev_init(dev, driver, parent);
> >>>>> + if (ret)
> >>>>> +         return ret;
> >>>>> +
> >>>>> + /*
> >>>>> +  * This is a temporary release action that is used if probing fails
> >>>>> +  * before devm_drm_dev_register() is called.
> >>>>> +  */
> >>>>> + ret = devm_add_action(parent, devm_drm_dev_init_release, dev);
> >>>>> + if (ret)
> >>>>> +         devm_drm_dev_init_release(dev);
> >>>>> +
> >>>>> + return ret;
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(devm_drm_dev_init);
> >>>>> +
> >>>>> +static void devm_drm_dev_register_release(void *data)
> >>>>> +{
> >>>>> + drm_dev_unplug(data);
> >>>>> +}
> >>>>> +
> >>>>> +static int devm_drm_dev_register(struct drm_device *dev)
> >>>>> +{
> >>>>> + int ret;
> >>>>> +
> >>>>> + ret = drm_dev_register(dev, 0);
> >>>>> + if (ret)
> >>>>> +         return ret;
> >>>>> +
> >>>>> + /*
> >>>>> +  * This has now served it's purpose, remove it to not mess up ref
> >>>>> +  * counting.
> >>>>> +  */
> >>>>> + devm_remove_action(dev->dev, devm_drm_dev_init_release, dev);
> >>>>> +
> >>>>> + ret = devm_add_action(dev->dev, devm_drm_dev_register_release, dev);
> >>>>> + if (ret)
> >>>>> +         devm_drm_dev_register_release(dev);
> >>>>> +
> >>>>> + return ret;
> >>>>> +}
> >>>>> +
> >>>>> +/**
> >>>>> + * devm_drm_dev_register_with_fbdev - Resource managed drm_dev_register()
> >>>>> + *                                    including generic fbdev emulation
> >>>>> + * @dev: DRM device to register
> >>>>> + * @fbdev_bpp: Preferred bits per pixel for fbdev (optional)
> >>>>> + *
> >>>>> + * Managed drm_dev_register() that also calls drm_fbdev_generic_setup().
> >>>>> + * The DRM device registered with this function is automatically unregistered on
> >>>>> + * driver detach using drm_dev_unplug().
> >>>>> + *
> >>>>> + * Note: This function must be used together with devm_drm_dev_init().
> >>>>> + *
> >>>>> + * For testing driver detach can be triggered manually by writing to the driver
> >>>>> + * 'unbind' file.
> >>>>> + *
> >>>>> + * RETURNS:
> >>>>> + * 0 on success, negative error code on failure.
> >>>>> + */
> >>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
> >>>>> +                              unsigned int fbdev_bpp)
> >>>>> +{
> >>>>> + int ret;
> >>>>> +
> >>>>> + ret = devm_drm_dev_register(dev);
> >>>>> + if (ret)
> >>>>> +         return ret;
> >>>>> +
> >>>>> + drm_fbdev_generic_setup(dev, fbdev_bpp);
> >>>>> +
> >>>>> + return 0;
> >>>>> +}
> >>>>> +EXPORT_SYMBOL(devm_drm_dev_register_with_fbdev);
> >>>>> +
> >>>>>  /**
> >>>>>   * drm_dev_set_unique - Set the unique name of a DRM device
> >>>>>   * @dev: device of which to set the unique name
> >>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> >>>>> index 35af23f5fa0d..c3f0477f2e7f 100644
> >>>>> --- a/include/drm/drm_drv.h
> >>>>> +++ b/include/drm/drm_drv.h
> >>>>> @@ -628,6 +628,12 @@ struct drm_device *drm_dev_alloc(struct drm_driver *driver,
> >>>>>  int drm_dev_register(struct drm_device *dev, unsigned long flags);
> >>>>>  void drm_dev_unregister(struct drm_device *dev);
> >>>>>
> >>>>> +int devm_drm_dev_init(struct device *parent,
> >>>>> +               struct drm_device *dev,
> >>>>> +               struct drm_driver *driver);
> >>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
> >>>>> +                              unsigned int fbdev_bpp);
> >>>>> +
> >>>>>  void drm_dev_get(struct drm_device *dev);
> >>>>>  void drm_dev_put(struct drm_device *dev);
> >>>>>  void drm_put_dev(struct drm_device *dev);
> >>>>> --
> >>>>> 2.20.1
> >>>>>
> >>>>> _______________________________________________
> >>>>> dri-devel mailing list
> >>>>> dri-devel@lists.freedesktop.org
> >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>>
> >>>> --
> >>>> Daniel Vetter
> >>>> Software Engineer, Intel Corporation
> >>>> http://blog.ffwll.ch
> >>>
> >
Den 22.01.2019 20.30, skrev Daniel Vetter:
> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
>>
>>
>>
>> Den 22.01.2019 10.32, skrev Daniel Vetter:
>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
>>>>
>>>>
>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
>>>>>>> drm_dev_register().
>>>>>>>
>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
>>>>>>> fbdev emulation as well.
>>>>>>>
>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
>>>>>>>
>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
>>>>>>
>>>>
>>>> <snip>
>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>>>> index 381581b01d48..12129772be45 100644
>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>>>> @@ -36,6 +36,7 @@
>>>>>>>
>>>>>>>  #include <drm/drm_client.h>
>>>>>>>  #include <drm/drm_drv.h>
>>>>>>> +#include <drm/drm_fb_helper.h>
>>>>>>>  #include <drm/drmP.h>
>>>>>>>
>>>>>>>  #include "drm_crtc_internal.h"
>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
>>>>>>>
>>>>>>> +static void devm_drm_dev_init_release(void *data)
>>>>>>> +{
>>>>>>> + drm_dev_put(data);
>>>>>
>>>>> We need drm_dev_unplug() here, or this isn't safe.
>>>>
>>>> This function is only used to cover the error path if probe fails before
>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
>>>> the one that calls unplug. There are comments about this in the functions.
>>>
>>> I think I get a prize for being ignorant and blind :-/
>>>
>>>>
>>>>>
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
>>>>>>> + * @parent: Parent device object
>>>>>>> + * @dev: DRM device
>>>>>>> + * @driver: DRM driver
>>>>>>> + *
>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
>>>>>>> + * automatically released on driver detach. You must supply a
>>>>>
>>>>> I think a bit more clarity here would be good:
>>>>>
>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
>>>>>
>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
>>>>>
>>>>> I think a loud warning for these is in order:
>>>>>
>>>>> "WARNING:
>>>>>
>>>>> "In generally it is unsafe to use devm functions for drm structures
>>>>> because the lifetimes of &drm_device and the underlying &device do not
>>>>> match. This here works because it doesn't immediately free anything, but
>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
>>>>> refcount through drm_dev_put().
>>>>>
>>>>> "All other drm structures must still be explicitly released in the
>>>>> &drm_driver.release callback."
>>>>>
>>>>> While thinking about this I just realized that with this design we have no
>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
>>>>> place to call it:
>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
>>>>>   down anything.
>>>>> - drm_driver.release is way too late.
>>>>>
>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
>>>>> already gone), but for a driver unload/unbind it would be nice if this
>>>>> happens automatically and in the right order.
>>>>>
>>>>> So not sure what to do here really.
>>>>
>>>> How about this change: (it breaks the rule of pulling helpers into the
>>>> core, so maybe we should put the devm_ functions into the simple KMS
>>>> helper instead?)
>>>
>>> Yeah smells a bit much like midlayer ... What would work is having a pile
>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
>>> and in the right order. So:
>>>
>>> - devm_drm_dev_init (always does a drm_dev_put())
>>>
>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
>>>
>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
>>>
>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
>>>   can call drm_dev_unplug() unconditionally).
>>>
>>
>> Beautiful! I really like this, it's very flexible.
>>
>> Where should devm_drm_mode_config_reset() live? It will pull in the
>> atomic helper...
> 
> I think a new drm_devm.c helper would be nice for all this stuff.
> Especially since you can't freely mix devm-based setup/cleanup with
> normal cleanup I think it'd be good to have it all together in one
> place. And perhaps even a code example in the DOC: overview.
> 
>>> We'd need to make sure some of the cleanup actions dtrt when the device is
>>> gone, but I think we can achieve that by liberally sprinkling
>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
>>> drm_mode_config_reset would be:
>>>
>>> {
>>>       if (drm_dev_enter())
>>>               return;
>>>
>>>       drm_atomic_helper_shutdown();
>>>
>>>       drm_dev_exit();
>>> }
>>>
>>
>> drm_dev_enter() can only be used to check whether the drm_device is
>> registered or not, it doesn't say anything about the state of the parent
>> device.
>>
>> All we know is that the device is being unbound from the driver, we
>> don't know if it's the device that's being removed or if it's the driver
>> that's unregistered.
> 
> You're right, both paths will have called drm_dev_unplug by then.
> Silly me. I really liked my idea :-)
> 
>> I have looked at the various call chains:
>>
>> driver_unregister ->
>>     bus_remove_driver ->
>>         driver_detach ->
>>             device_release_driver_internal
>>
>> device_unregister ->
>>     device_del ->
>>         bus_remove_device ->
>>             device_release_driver ->
>>                 device_release_driver_internal
>>
>> sysfs: unbind_store ->
>>     device_release_driver ->
>>         device_release_driver_internal
>>
>> The only way I've found to differentiate between these in a cleanup
>> action is that device_del() uses the bus notifier to signal
>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
>> notifier could be used to set a drm_device->parent_removed flag.
> 
> Hm, this might upset Greg KH's code taste ... maybe there's a better
> way to do this, but best to prototype a patch with this, send it to
> him and ask how to :-)
> 

I'll leave this to the one that needs it. The tinydrm drivers doesn't
need to touch hw after DRM unregister.

>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
>> everything get disabled when userspace closes? It does in my tinydrm
>> world :-)
> 
> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> where drivers leaked drm_connector and drm_framebuffer objects, and
> they've been fixed by calling drm_atomic_helper_shutdown() in the
> unload path. Maybe this is cargo-culting, but it goes way back to
> pre-atomic, where drivers called drm_helper_force_disable_all().
> 
> If you try to move the fbcon to your tinydrm drivers (con2fb is
> apparently the cmdline tool you need, never tried it, I only switch
> the kernel's console between fbcon and dummycon and back, not what
> fbcon drivers itself), then I think you should be able to reproduce.
> And maybe you have a better idea how to deal with this all.
> 
> Note also that there's been proposals floating around to only close an
> drm_framebuffer, not also remove it (like the current RMFB ioctl
> does), with that closing userspace would not necessarily lead to a
> full cleanup.
> 
> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> is if you have the display on, but no planes showing (i.e. all black).
> Then all the fbs will be cleaned up, but drm_connector will be
> leaking. That's a case where you need drm_atomic_helper_shutdown()
> even if fbcon/fbdev isn't even enabled.
> 

Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
tinydrm. DRM userspace disables the pipe on close and the generic fbdev
emulation also releases everything.
Even so, maybe I should use devm_drm_mode_config_reset() after all to
keep drivers uniform, to avoid confusion: why doesn't he use it?

Noralf.

> Cheers, Daniel
> 
> 
>>> This would be analog to your shutdown parameter below.
>>>
>>> Essentially I think we can only do the drm_dev_unplug autocleanup in a
>>> given driver from devm_drm_dev_register iff all the cleanup actions have
>>> been devm-ified, and there's nothing left in it's unbind callback. Because
>>> if anything is left in its unbind callback that's a bug, since
>>> drm_dev_unregister() (called through drm_dev_unplug) is the very first (or
>>> at least one of the very first, before we start cleanup up) functions that
>>> need to be called.
>>> -Daniel
>>>
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>> index 12129772be45..7ed9550baff6 100644
>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>> @@ -34,7 +34,9 @@
>>>>  #include <linux/slab.h>
>>>>  #include <linux/srcu.h>
>>>>
>>>> +#include <drm/drm_atomic_helper.h>
>>>>  #include <drm/drm_client.h>
>>>> +#include <drm/drm_crtc_helper.h>
>>>>  #include <drm/drm_drv.h>
>>>>  #include <drm/drm_fb_helper.h>
>>>>  #include <drm/drmP.h>
>>>> @@ -355,17 +357,7 @@ void drm_dev_exit(int idx)
>>>>  }
>>>>  EXPORT_SYMBOL(drm_dev_exit);
>>>>
>>>> -/**
>>>> - * drm_dev_unplug - unplug a DRM device
>>>> - * @dev: DRM device
>>>> - *
>>>> - * This unplugs a hotpluggable DRM device, which makes it inaccessible to
>>>> - * userspace operations. Entry-points can use drm_dev_enter() and
>>>> - * drm_dev_exit() to protect device resources in a race free manner. This
>>>> - * essentially unregisters the device like drm_dev_unregister(), but can be
>>>> - * called while there are still open users of @dev.
>>>> - */
>>>> -void drm_dev_unplug(struct drm_device *dev)
>>>> +static void __drm_dev_unplug(struct drm_device *dev, bool shutdown)
>>>>  {
>>>>         /*
>>>>          * After synchronizing any critical read section is guaranteed
>>>> to see
>>>> @@ -378,11 +370,32 @@ void drm_dev_unplug(struct drm_device *dev)
>>>>
>>>>         drm_dev_unregister(dev);
>>>>
>>>> +       if (shutdown)
>>>> +               drm_kms_helper_poll_fini(dev);
>>>> +
>>>>         mutex_lock(&drm_global_mutex);
>>>> -       if (dev->open_count == 0)
>>>> +       if (dev->open_count == 0) {
>>>> +               if (shutdown)
>>>> +                       drm_atomic_helper_shutdown(dev);
>>>>                 drm_dev_put(dev);
>>>> +       }
>>>>         mutex_unlock(&drm_global_mutex);
>>>>  }
>>>> +
>>>> +/**
>>>> + * drm_dev_unplug - unplug a DRM device
>>>> + * @dev: DRM device
>>>> + *
>>>> + * This unplugs a hotpluggable DRM device, which makes it inaccessible to
>>>> + * userspace operations. Entry-points can use drm_dev_enter() and
>>>> + * drm_dev_exit() to protect device resources in a race free manner. This
>>>> + * essentially unregisters the device like drm_dev_unregister(), but can be
>>>> + * called while there are still open users of @dev.
>>>> + */
>>>> +void drm_dev_unplug(struct drm_device *dev)
>>>> +{
>>>> +       __drm_dev_unplug(dev, false);
>>>> +}
>>>>  EXPORT_SYMBOL(drm_dev_unplug);
>>>>
>>>>  /*
>>>> @@ -920,7 +933,7 @@ EXPORT_SYMBOL(devm_drm_dev_init);
>>>>
>>>>  static void devm_drm_dev_register_release(void *data)
>>>>  {
>>>> -       drm_dev_unplug(data);
>>>> +       __drm_dev_unplug(data, true);
>>>>  }
>>>>
>>>>  static int devm_drm_dev_register(struct drm_device *dev)
>>>>
>>>>
>>>> I realised that we should take a ref on the parent device so it can be
>>>> accessed by the DRM_DEV_ functions after unplug.
>>>>
>>>>
>>>> Noralf.
>>>>
>>>>
>>>>>
>>>>>>> + *
>>>>>>> + * Note: This function must be used together with
>>>>>>> + * devm_drm_dev_register_with_fbdev().
>>>>>>> + *
>>>>>>> + * RETURNS:
>>>>>>> + * 0 on success, or error code on failure.
>>>>>>> + */
>>>>>>> +int devm_drm_dev_init(struct device *parent,
>>>>>>> +               struct drm_device *dev,
>>>>>>> +               struct drm_driver *driver)
>>>>>>> +{
>>>>>>> + int ret;
>>>>>>> +
>>>>>>> + if (WARN_ON(!parent || !driver->release))
>>>>>>> +         return -EINVAL;
>>>>>>> +
>>>>>>> + ret = drm_dev_init(dev, driver, parent);
>>>>>>> + if (ret)
>>>>>>> +         return ret;
>>>>>>> +
>>>>>>> + /*
>>>>>>> +  * This is a temporary release action that is used if probing fails
>>>>>>> +  * before devm_drm_dev_register() is called.
>>>>>>> +  */
>>>>>>> + ret = devm_add_action(parent, devm_drm_dev_init_release, dev);
>>>>>>> + if (ret)
>>>>>>> +         devm_drm_dev_init_release(dev);
>>>>>>> +
>>>>>>> + return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(devm_drm_dev_init);
>>>>>>> +
>>>>>>> +static void devm_drm_dev_register_release(void *data)
>>>>>>> +{
>>>>>>> + drm_dev_unplug(data);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int devm_drm_dev_register(struct drm_device *dev)
>>>>>>> +{
>>>>>>> + int ret;
>>>>>>> +
>>>>>>> + ret = drm_dev_register(dev, 0);
>>>>>>> + if (ret)
>>>>>>> +         return ret;
>>>>>>> +
>>>>>>> + /*
>>>>>>> +  * This has now served it's purpose, remove it to not mess up ref
>>>>>>> +  * counting.
>>>>>>> +  */
>>>>>>> + devm_remove_action(dev->dev, devm_drm_dev_init_release, dev);
>>>>>>> +
>>>>>>> + ret = devm_add_action(dev->dev, devm_drm_dev_register_release, dev);
>>>>>>> + if (ret)
>>>>>>> +         devm_drm_dev_register_release(dev);
>>>>>>> +
>>>>>>> + return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * devm_drm_dev_register_with_fbdev - Resource managed drm_dev_register()
>>>>>>> + *                                    including generic fbdev emulation
>>>>>>> + * @dev: DRM device to register
>>>>>>> + * @fbdev_bpp: Preferred bits per pixel for fbdev (optional)
>>>>>>> + *
>>>>>>> + * Managed drm_dev_register() that also calls drm_fbdev_generic_setup().
>>>>>>> + * The DRM device registered with this function is automatically unregistered on
>>>>>>> + * driver detach using drm_dev_unplug().
>>>>>>> + *
>>>>>>> + * Note: This function must be used together with devm_drm_dev_init().
>>>>>>> + *
>>>>>>> + * For testing driver detach can be triggered manually by writing to the driver
>>>>>>> + * 'unbind' file.
>>>>>>> + *
>>>>>>> + * RETURNS:
>>>>>>> + * 0 on success, negative error code on failure.
>>>>>>> + */
>>>>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
>>>>>>> +                              unsigned int fbdev_bpp)
>>>>>>> +{
>>>>>>> + int ret;
>>>>>>> +
>>>>>>> + ret = devm_drm_dev_register(dev);
>>>>>>> + if (ret)
>>>>>>> +         return ret;
>>>>>>> +
>>>>>>> + drm_fbdev_generic_setup(dev, fbdev_bpp);
>>>>>>> +
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(devm_drm_dev_register_with_fbdev);
>>>>>>> +
>>>>>>>  /**
>>>>>>>   * drm_dev_set_unique - Set the unique name of a DRM device
>>>>>>>   * @dev: device of which to set the unique name
>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>>>>> index 35af23f5fa0d..c3f0477f2e7f 100644
>>>>>>> --- a/include/drm/drm_drv.h
>>>>>>> +++ b/include/drm/drm_drv.h
>>>>>>> @@ -628,6 +628,12 @@ struct drm_device *drm_dev_alloc(struct drm_driver *driver,
>>>>>>>  int drm_dev_register(struct drm_device *dev, unsigned long flags);
>>>>>>>  void drm_dev_unregister(struct drm_device *dev);
>>>>>>>
>>>>>>> +int devm_drm_dev_init(struct device *parent,
>>>>>>> +               struct drm_device *dev,
>>>>>>> +               struct drm_driver *driver);
>>>>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
>>>>>>> +                              unsigned int fbdev_bpp);
>>>>>>> +
>>>>>>>  void drm_dev_get(struct drm_device *dev);
>>>>>>>  void drm_dev_put(struct drm_device *dev);
>>>>>>>  void drm_put_dev(struct drm_device *dev);
>>>>>>> --
>>>>>>> 2.20.1
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>
>>>>>> --
>>>>>> Daniel Vetter
>>>>>> Software Engineer, Intel Corporation
>>>>>> http://blog.ffwll.ch
>>>>>
>>>
> 
> 
>
On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> 
> 
> Den 22.01.2019 20.30, skrev Daniel Vetter:
> > On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> >>
> >>
> >>
> >> Den 22.01.2019 10.32, skrev Daniel Vetter:
> >>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> >>>>
> >>>>
> >>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> >>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> >>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> >>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> >>>>>>> drm_dev_register().
> >>>>>>>
> >>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> >>>>>>> fbdev emulation as well.
> >>>>>>>
> >>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> >>>>>>>
> >>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> >>>>>>
> >>>>
> >>>> <snip>
> >>>>
> >>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>>>>> index 381581b01d48..12129772be45 100644
> >>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>>>>> @@ -36,6 +36,7 @@
> >>>>>>>
> >>>>>>>  #include <drm/drm_client.h>
> >>>>>>>  #include <drm/drm_drv.h>
> >>>>>>> +#include <drm/drm_fb_helper.h>
> >>>>>>>  #include <drm/drmP.h>
> >>>>>>>
> >>>>>>>  #include "drm_crtc_internal.h"
> >>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> >>>>>>>
> >>>>>>> +static void devm_drm_dev_init_release(void *data)
> >>>>>>> +{
> >>>>>>> + drm_dev_put(data);
> >>>>>
> >>>>> We need drm_dev_unplug() here, or this isn't safe.
> >>>>
> >>>> This function is only used to cover the error path if probe fails before
> >>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> >>>> the one that calls unplug. There are comments about this in the functions.
> >>>
> >>> I think I get a prize for being ignorant and blind :-/
> >>>
> >>>>
> >>>>>
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> >>>>>>> + * @parent: Parent device object
> >>>>>>> + * @dev: DRM device
> >>>>>>> + * @driver: DRM driver
> >>>>>>> + *
> >>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> >>>>>>> + * automatically released on driver detach. You must supply a
> >>>>>
> >>>>> I think a bit more clarity here would be good:
> >>>>>
> >>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> >>>>>
> >>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> >>>>>
> >>>>> I think a loud warning for these is in order:
> >>>>>
> >>>>> "WARNING:
> >>>>>
> >>>>> "In generally it is unsafe to use devm functions for drm structures
> >>>>> because the lifetimes of &drm_device and the underlying &device do not
> >>>>> match. This here works because it doesn't immediately free anything, but
> >>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> >>>>> refcount through drm_dev_put().
> >>>>>
> >>>>> "All other drm structures must still be explicitly released in the
> >>>>> &drm_driver.release callback."
> >>>>>
> >>>>> While thinking about this I just realized that with this design we have no
> >>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> >>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> >>>>> place to call it:
> >>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> >>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> >>>>>   down anything.
> >>>>> - drm_driver.release is way too late.
> >>>>>
> >>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> >>>>> already gone), but for a driver unload/unbind it would be nice if this
> >>>>> happens automatically and in the right order.
> >>>>>
> >>>>> So not sure what to do here really.
> >>>>
> >>>> How about this change: (it breaks the rule of pulling helpers into the
> >>>> core, so maybe we should put the devm_ functions into the simple KMS
> >>>> helper instead?)
> >>>
> >>> Yeah smells a bit much like midlayer ... What would work is having a pile
> >>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> >>> and in the right order. So:
> >>>
> >>> - devm_drm_dev_init (always does a drm_dev_put())
> >>>
> >>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> >>>
> >>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> >>>
> >>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> >>>   can call drm_dev_unplug() unconditionally).
> >>>
> >>
> >> Beautiful! I really like this, it's very flexible.
> >>
> >> Where should devm_drm_mode_config_reset() live? It will pull in the
> >> atomic helper...
> > 
> > I think a new drm_devm.c helper would be nice for all this stuff.
> > Especially since you can't freely mix devm-based setup/cleanup with
> > normal cleanup I think it'd be good to have it all together in one
> > place. And perhaps even a code example in the DOC: overview.
> > 
> >>> We'd need to make sure some of the cleanup actions dtrt when the device is
> >>> gone, but I think we can achieve that by liberally sprinkling
> >>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> >>> drm_mode_config_reset would be:
> >>>
> >>> {
> >>>       if (drm_dev_enter())
> >>>               return;
> >>>
> >>>       drm_atomic_helper_shutdown();
> >>>
> >>>       drm_dev_exit();
> >>> }
> >>>
> >>
> >> drm_dev_enter() can only be used to check whether the drm_device is
> >> registered or not, it doesn't say anything about the state of the parent
> >> device.
> >>
> >> All we know is that the device is being unbound from the driver, we
> >> don't know if it's the device that's being removed or if it's the driver
> >> that's unregistered.
> > 
> > You're right, both paths will have called drm_dev_unplug by then.
> > Silly me. I really liked my idea :-)
> > 
> >> I have looked at the various call chains:
> >>
> >> driver_unregister ->
> >>     bus_remove_driver ->
> >>         driver_detach ->
> >>             device_release_driver_internal
> >>
> >> device_unregister ->
> >>     device_del ->
> >>         bus_remove_device ->
> >>             device_release_driver ->
> >>                 device_release_driver_internal
> >>
> >> sysfs: unbind_store ->
> >>     device_release_driver ->
> >>         device_release_driver_internal
> >>
> >> The only way I've found to differentiate between these in a cleanup
> >> action is that device_del() uses the bus notifier to signal
> >> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> >> notifier could be used to set a drm_device->parent_removed flag.
> > 
> > Hm, this might upset Greg KH's code taste ... maybe there's a better
> > way to do this, but best to prototype a patch with this, send it to
> > him and ask how to :-)
> > 
> 
> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> need to touch hw after DRM unregister.
> 
> >> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> >> everything get disabled when userspace closes? It does in my tinydrm
> >> world :-)
> > 
> > Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > where drivers leaked drm_connector and drm_framebuffer objects, and
> > they've been fixed by calling drm_atomic_helper_shutdown() in the
> > unload path. Maybe this is cargo-culting, but it goes way back to
> > pre-atomic, where drivers called drm_helper_force_disable_all().
> > 
> > If you try to move the fbcon to your tinydrm drivers (con2fb is
> > apparently the cmdline tool you need, never tried it, I only switch
> > the kernel's console between fbcon and dummycon and back, not what
> > fbcon drivers itself), then I think you should be able to reproduce.
> > And maybe you have a better idea how to deal with this all.
> > 
> > Note also that there's been proposals floating around to only close an
> > drm_framebuffer, not also remove it (like the current RMFB ioctl
> > does), with that closing userspace would not necessarily lead to a
> > full cleanup.
> > 
> > Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > is if you have the display on, but no planes showing (i.e. all black).
> > Then all the fbs will be cleaned up, but drm_connector will be
> > leaking. That's a case where you need drm_atomic_helper_shutdown()
> > even if fbcon/fbdev isn't even enabled.
> > 
> 
> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> emulation also releases everything.
> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> keep drivers uniform, to avoid confusion: why doesn't he use it?

Hm maybe there is an official way to solve this, pulling in Greg+Rafael.

Super short summary: We want to start using devm actions to clean up drm
drivers. Here's the problem:
- For a driver unload/unbind without hotunplug, we want to properly clean
  up the hardware and shut it all down.

- But if the device is unplugged already, that's probably not the best
  idea, and we only want to clean up the kernel's resources/allocations.

What's the recommendation here? I see a few options:

- Make sure everything can deal with this properly. Hotunplug can happen
  anytime, so there's a race no matter what.

- Check with the device model whether the struct device is disappearing or
  whether we're just dealing with a driver unbind (no idea how to do
  that), and act accordingly.

- Fundamental question: Touching the hw from devm actions, is that ok? If
  not, then the pretty nifty plan laid out in this thread wont work.

- Something completely different?

Thanks, Daniel

> 
> Noralf.
> 
> > Cheers, Daniel
> > 
> > 
> >>> This would be analog to your shutdown parameter below.
> >>>
> >>> Essentially I think we can only do the drm_dev_unplug autocleanup in a
> >>> given driver from devm_drm_dev_register iff all the cleanup actions have
> >>> been devm-ified, and there's nothing left in it's unbind callback. Because
> >>> if anything is left in its unbind callback that's a bug, since
> >>> drm_dev_unregister() (called through drm_dev_unplug) is the very first (or
> >>> at least one of the very first, before we start cleanup up) functions that
> >>> need to be called.
> >>> -Daniel
> >>>
> >>>>
> >>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>> index 12129772be45..7ed9550baff6 100644
> >>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>> @@ -34,7 +34,9 @@
> >>>>  #include <linux/slab.h>
> >>>>  #include <linux/srcu.h>
> >>>>
> >>>> +#include <drm/drm_atomic_helper.h>
> >>>>  #include <drm/drm_client.h>
> >>>> +#include <drm/drm_crtc_helper.h>
> >>>>  #include <drm/drm_drv.h>
> >>>>  #include <drm/drm_fb_helper.h>
> >>>>  #include <drm/drmP.h>
> >>>> @@ -355,17 +357,7 @@ void drm_dev_exit(int idx)
> >>>>  }
> >>>>  EXPORT_SYMBOL(drm_dev_exit);
> >>>>
> >>>> -/**
> >>>> - * drm_dev_unplug - unplug a DRM device
> >>>> - * @dev: DRM device
> >>>> - *
> >>>> - * This unplugs a hotpluggable DRM device, which makes it inaccessible to
> >>>> - * userspace operations. Entry-points can use drm_dev_enter() and
> >>>> - * drm_dev_exit() to protect device resources in a race free manner. This
> >>>> - * essentially unregisters the device like drm_dev_unregister(), but can be
> >>>> - * called while there are still open users of @dev.
> >>>> - */
> >>>> -void drm_dev_unplug(struct drm_device *dev)
> >>>> +static void __drm_dev_unplug(struct drm_device *dev, bool shutdown)
> >>>>  {
> >>>>         /*
> >>>>          * After synchronizing any critical read section is guaranteed
> >>>> to see
> >>>> @@ -378,11 +370,32 @@ void drm_dev_unplug(struct drm_device *dev)
> >>>>
> >>>>         drm_dev_unregister(dev);
> >>>>
> >>>> +       if (shutdown)
> >>>> +               drm_kms_helper_poll_fini(dev);
> >>>> +
> >>>>         mutex_lock(&drm_global_mutex);
> >>>> -       if (dev->open_count == 0)
> >>>> +       if (dev->open_count == 0) {
> >>>> +               if (shutdown)
> >>>> +                       drm_atomic_helper_shutdown(dev);
> >>>>                 drm_dev_put(dev);
> >>>> +       }
> >>>>         mutex_unlock(&drm_global_mutex);
> >>>>  }
> >>>> +
> >>>> +/**
> >>>> + * drm_dev_unplug - unplug a DRM device
> >>>> + * @dev: DRM device
> >>>> + *
> >>>> + * This unplugs a hotpluggable DRM device, which makes it inaccessible to
> >>>> + * userspace operations. Entry-points can use drm_dev_enter() and
> >>>> + * drm_dev_exit() to protect device resources in a race free manner. This
> >>>> + * essentially unregisters the device like drm_dev_unregister(), but can be
> >>>> + * called while there are still open users of @dev.
> >>>> + */
> >>>> +void drm_dev_unplug(struct drm_device *dev)
> >>>> +{
> >>>> +       __drm_dev_unplug(dev, false);
> >>>> +}
> >>>>  EXPORT_SYMBOL(drm_dev_unplug);
> >>>>
> >>>>  /*
> >>>> @@ -920,7 +933,7 @@ EXPORT_SYMBOL(devm_drm_dev_init);
> >>>>
> >>>>  static void devm_drm_dev_register_release(void *data)
> >>>>  {
> >>>> -       drm_dev_unplug(data);
> >>>> +       __drm_dev_unplug(data, true);
> >>>>  }
> >>>>
> >>>>  static int devm_drm_dev_register(struct drm_device *dev)
> >>>>
> >>>>
> >>>> I realised that we should take a ref on the parent device so it can be
> >>>> accessed by the DRM_DEV_ functions after unplug.
> >>>>
> >>>>
> >>>> Noralf.
> >>>>
> >>>>
> >>>>>
> >>>>>>> + *
> >>>>>>> + * Note: This function must be used together with
> >>>>>>> + * devm_drm_dev_register_with_fbdev().
> >>>>>>> + *
> >>>>>>> + * RETURNS:
> >>>>>>> + * 0 on success, or error code on failure.
> >>>>>>> + */
> >>>>>>> +int devm_drm_dev_init(struct device *parent,
> >>>>>>> +               struct drm_device *dev,
> >>>>>>> +               struct drm_driver *driver)
> >>>>>>> +{
> >>>>>>> + int ret;
> >>>>>>> +
> >>>>>>> + if (WARN_ON(!parent || !driver->release))
> >>>>>>> +         return -EINVAL;
> >>>>>>> +
> >>>>>>> + ret = drm_dev_init(dev, driver, parent);
> >>>>>>> + if (ret)
> >>>>>>> +         return ret;
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> +  * This is a temporary release action that is used if probing fails
> >>>>>>> +  * before devm_drm_dev_register() is called.
> >>>>>>> +  */
> >>>>>>> + ret = devm_add_action(parent, devm_drm_dev_init_release, dev);
> >>>>>>> + if (ret)
> >>>>>>> +         devm_drm_dev_init_release(dev);
> >>>>>>> +
> >>>>>>> + return ret;
> >>>>>>> +}
> >>>>>>> +EXPORT_SYMBOL(devm_drm_dev_init);
> >>>>>>> +
> >>>>>>> +static void devm_drm_dev_register_release(void *data)
> >>>>>>> +{
> >>>>>>> + drm_dev_unplug(data);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +static int devm_drm_dev_register(struct drm_device *dev)
> >>>>>>> +{
> >>>>>>> + int ret;
> >>>>>>> +
> >>>>>>> + ret = drm_dev_register(dev, 0);
> >>>>>>> + if (ret)
> >>>>>>> +         return ret;
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> +  * This has now served it's purpose, remove it to not mess up ref
> >>>>>>> +  * counting.
> >>>>>>> +  */
> >>>>>>> + devm_remove_action(dev->dev, devm_drm_dev_init_release, dev);
> >>>>>>> +
> >>>>>>> + ret = devm_add_action(dev->dev, devm_drm_dev_register_release, dev);
> >>>>>>> + if (ret)
> >>>>>>> +         devm_drm_dev_register_release(dev);
> >>>>>>> +
> >>>>>>> + return ret;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>> + * devm_drm_dev_register_with_fbdev - Resource managed drm_dev_register()
> >>>>>>> + *                                    including generic fbdev emulation
> >>>>>>> + * @dev: DRM device to register
> >>>>>>> + * @fbdev_bpp: Preferred bits per pixel for fbdev (optional)
> >>>>>>> + *
> >>>>>>> + * Managed drm_dev_register() that also calls drm_fbdev_generic_setup().
> >>>>>>> + * The DRM device registered with this function is automatically unregistered on
> >>>>>>> + * driver detach using drm_dev_unplug().
> >>>>>>> + *
> >>>>>>> + * Note: This function must be used together with devm_drm_dev_init().
> >>>>>>> + *
> >>>>>>> + * For testing driver detach can be triggered manually by writing to the driver
> >>>>>>> + * 'unbind' file.
> >>>>>>> + *
> >>>>>>> + * RETURNS:
> >>>>>>> + * 0 on success, negative error code on failure.
> >>>>>>> + */
> >>>>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
> >>>>>>> +                              unsigned int fbdev_bpp)
> >>>>>>> +{
> >>>>>>> + int ret;
> >>>>>>> +
> >>>>>>> + ret = devm_drm_dev_register(dev);
> >>>>>>> + if (ret)
> >>>>>>> +         return ret;
> >>>>>>> +
> >>>>>>> + drm_fbdev_generic_setup(dev, fbdev_bpp);
> >>>>>>> +
> >>>>>>> + return 0;
> >>>>>>> +}
> >>>>>>> +EXPORT_SYMBOL(devm_drm_dev_register_with_fbdev);
> >>>>>>> +
> >>>>>>>  /**
> >>>>>>>   * drm_dev_set_unique - Set the unique name of a DRM device
> >>>>>>>   * @dev: device of which to set the unique name
> >>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> >>>>>>> index 35af23f5fa0d..c3f0477f2e7f 100644
> >>>>>>> --- a/include/drm/drm_drv.h
> >>>>>>> +++ b/include/drm/drm_drv.h
> >>>>>>> @@ -628,6 +628,12 @@ struct drm_device *drm_dev_alloc(struct drm_driver *driver,
> >>>>>>>  int drm_dev_register(struct drm_device *dev, unsigned long flags);
> >>>>>>>  void drm_dev_unregister(struct drm_device *dev);
> >>>>>>>
> >>>>>>> +int devm_drm_dev_init(struct device *parent,
> >>>>>>> +               struct drm_device *dev,
> >>>>>>> +               struct drm_driver *driver);
> >>>>>>> +int devm_drm_dev_register_with_fbdev(struct drm_device *dev,
> >>>>>>> +                              unsigned int fbdev_bpp);
> >>>>>>> +
> >>>>>>>  void drm_dev_get(struct drm_device *dev);
> >>>>>>>  void drm_dev_put(struct drm_device *dev);
> >>>>>>>  void drm_put_dev(struct drm_device *dev);
> >>>>>>> --
> >>>>>>> 2.20.1
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> dri-devel mailing list
> >>>>>>> dri-devel@lists.freedesktop.org
> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>>>>
> >>>>>> --
> >>>>>> Daniel Vetter
> >>>>>> Software Engineer, Intel Corporation
> >>>>>> http://blog.ffwll.ch
> >>>>>
> >>>
> > 
> > 
> >
On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> > 
> > 
> > Den 22.01.2019 20.30, skrev Daniel Vetter:
> > > On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> > >>
> > >>
> > >>
> > >> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > >>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> > >>>>
> > >>>>
> > >>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > >>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> > >>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> > >>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> > >>>>>>> drm_dev_register().
> > >>>>>>>
> > >>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> > >>>>>>> fbdev emulation as well.
> > >>>>>>>
> > >>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> > >>>>>>
> > >>>>
> > >>>> <snip>
> > >>>>
> > >>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > >>>>>>> index 381581b01d48..12129772be45 100644
> > >>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> > >>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> > >>>>>>> @@ -36,6 +36,7 @@
> > >>>>>>>
> > >>>>>>>  #include <drm/drm_client.h>
> > >>>>>>>  #include <drm/drm_drv.h>
> > >>>>>>> +#include <drm/drm_fb_helper.h>
> > >>>>>>>  #include <drm/drmP.h>
> > >>>>>>>
> > >>>>>>>  #include "drm_crtc_internal.h"
> > >>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> > >>>>>>>  }
> > >>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> > >>>>>>>
> > >>>>>>> +static void devm_drm_dev_init_release(void *data)
> > >>>>>>> +{
> > >>>>>>> + drm_dev_put(data);
> > >>>>>
> > >>>>> We need drm_dev_unplug() here, or this isn't safe.
> > >>>>
> > >>>> This function is only used to cover the error path if probe fails before
> > >>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> > >>>> the one that calls unplug. There are comments about this in the functions.
> > >>>
> > >>> I think I get a prize for being ignorant and blind :-/
> > >>>
> > >>>>
> > >>>>>
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>> +/**
> > >>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> > >>>>>>> + * @parent: Parent device object
> > >>>>>>> + * @dev: DRM device
> > >>>>>>> + * @driver: DRM driver
> > >>>>>>> + *
> > >>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> > >>>>>>> + * automatically released on driver detach. You must supply a
> > >>>>>
> > >>>>> I think a bit more clarity here would be good:
> > >>>>>
> > >>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> > >>>>>
> > >>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> > >>>>>
> > >>>>> I think a loud warning for these is in order:
> > >>>>>
> > >>>>> "WARNING:
> > >>>>>
> > >>>>> "In generally it is unsafe to use devm functions for drm structures
> > >>>>> because the lifetimes of &drm_device and the underlying &device do not
> > >>>>> match. This here works because it doesn't immediately free anything, but
> > >>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> > >>>>> refcount through drm_dev_put().
> > >>>>>
> > >>>>> "All other drm structures must still be explicitly released in the
> > >>>>> &drm_driver.release callback."
> > >>>>>
> > >>>>> While thinking about this I just realized that with this design we have no
> > >>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > >>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> > >>>>> place to call it:
> > >>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> > >>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> > >>>>>   down anything.
> > >>>>> - drm_driver.release is way too late.
> > >>>>>
> > >>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > >>>>> already gone), but for a driver unload/unbind it would be nice if this
> > >>>>> happens automatically and in the right order.
> > >>>>>
> > >>>>> So not sure what to do here really.
> > >>>>
> > >>>> How about this change: (it breaks the rule of pulling helpers into the
> > >>>> core, so maybe we should put the devm_ functions into the simple KMS
> > >>>> helper instead?)
> > >>>
> > >>> Yeah smells a bit much like midlayer ... What would work is having a pile
> > >>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> > >>> and in the right order. So:
> > >>>
> > >>> - devm_drm_dev_init (always does a drm_dev_put())
> > >>>
> > >>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> > >>>
> > >>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> > >>>
> > >>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> > >>>   can call drm_dev_unplug() unconditionally).
> > >>>
> > >>
> > >> Beautiful! I really like this, it's very flexible.
> > >>
> > >> Where should devm_drm_mode_config_reset() live? It will pull in the
> > >> atomic helper...
> > > 
> > > I think a new drm_devm.c helper would be nice for all this stuff.
> > > Especially since you can't freely mix devm-based setup/cleanup with
> > > normal cleanup I think it'd be good to have it all together in one
> > > place. And perhaps even a code example in the DOC: overview.
> > > 
> > >>> We'd need to make sure some of the cleanup actions dtrt when the device is
> > >>> gone, but I think we can achieve that by liberally sprinkling
> > >>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> > >>> drm_mode_config_reset would be:
> > >>>
> > >>> {
> > >>>       if (drm_dev_enter())
> > >>>               return;
> > >>>
> > >>>       drm_atomic_helper_shutdown();
> > >>>
> > >>>       drm_dev_exit();
> > >>> }
> > >>>
> > >>
> > >> drm_dev_enter() can only be used to check whether the drm_device is
> > >> registered or not, it doesn't say anything about the state of the parent
> > >> device.
> > >>
> > >> All we know is that the device is being unbound from the driver, we
> > >> don't know if it's the device that's being removed or if it's the driver
> > >> that's unregistered.
> > > 
> > > You're right, both paths will have called drm_dev_unplug by then.
> > > Silly me. I really liked my idea :-)
> > > 
> > >> I have looked at the various call chains:
> > >>
> > >> driver_unregister ->
> > >>     bus_remove_driver ->
> > >>         driver_detach ->
> > >>             device_release_driver_internal
> > >>
> > >> device_unregister ->
> > >>     device_del ->
> > >>         bus_remove_device ->
> > >>             device_release_driver ->
> > >>                 device_release_driver_internal
> > >>
> > >> sysfs: unbind_store ->
> > >>     device_release_driver ->
> > >>         device_release_driver_internal
> > >>
> > >> The only way I've found to differentiate between these in a cleanup
> > >> action is that device_del() uses the bus notifier to signal
> > >> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> > >> notifier could be used to set a drm_device->parent_removed flag.
> > > 
> > > Hm, this might upset Greg KH's code taste ... maybe there's a better
> > > way to do this, but best to prototype a patch with this, send it to
> > > him and ask how to :-)
> > > 
> > 
> > I'll leave this to the one that needs it. The tinydrm drivers doesn't
> > need to touch hw after DRM unregister.
> > 
> > >> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> > >> everything get disabled when userspace closes? It does in my tinydrm
> > >> world :-)
> > > 
> > > Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > > where drivers leaked drm_connector and drm_framebuffer objects, and
> > > they've been fixed by calling drm_atomic_helper_shutdown() in the
> > > unload path. Maybe this is cargo-culting, but it goes way back to
> > > pre-atomic, where drivers called drm_helper_force_disable_all().
> > > 
> > > If you try to move the fbcon to your tinydrm drivers (con2fb is
> > > apparently the cmdline tool you need, never tried it, I only switch
> > > the kernel's console between fbcon and dummycon and back, not what
> > > fbcon drivers itself), then I think you should be able to reproduce.
> > > And maybe you have a better idea how to deal with this all.
> > > 
> > > Note also that there's been proposals floating around to only close an
> > > drm_framebuffer, not also remove it (like the current RMFB ioctl
> > > does), with that closing userspace would not necessarily lead to a
> > > full cleanup.
> > > 
> > > Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > > is if you have the display on, but no planes showing (i.e. all black).
> > > Then all the fbs will be cleaned up, but drm_connector will be
> > > leaking. That's a case where you need drm_atomic_helper_shutdown()
> > > even if fbcon/fbdev isn't even enabled.
> > > 
> > 
> > Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> > tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> > emulation also releases everything.
> > Even so, maybe I should use devm_drm_mode_config_reset() after all to
> > keep drivers uniform, to avoid confusion: why doesn't he use it?
> 
> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> 
> Super short summary: We want to start using devm actions to clean up drm
> drivers. Here's the problem:
> - For a driver unload/unbind without hotunplug, we want to properly clean
>   up the hardware and shut it all down.

Then do it on probe/disconnect.

> - But if the device is unplugged already, that's probably not the best
>   idea, and we only want to clean up the kernel's resources/allocations.

Again, probe/disconnect will be called either way.

But as you note, memory will NOT be freed by the devm stuff if you
manually unbind a driver from a device.

So don't touch hardware there, it's not going to work :)

> What's the recommendation here? I see a few options:
> 
> - Make sure everything can deal with this properly. Hotunplug can happen
>   anytime, so there's a race no matter what.

Yes.

> - Check with the device model whether the struct device is disappearing or
>   whether we're just dealing with a driver unbind (no idea how to do
>   that), and act accordingly.

You don't know that, sorry.  Just do any hardware stuff on disconnect.
Assuming your hardware is still present :)

> - Fundamental question: Touching the hw from devm actions, is that ok? If
>   not, then the pretty nifty plan laid out in this thread wont work.

Nope, that's not going to work, the device could either be long gone, or
you will not be called due to unbind happening from userspace.

But really, unbind from userspace is very very rare, it's a developer
thing mostly.  Oh and a virtual driver thing, but those people are crazy
:)

> - Something completely different?

Do it in disconnect :)

thanks,

greg k-h
On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> > On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> > >
> > >
> > > Den 22.01.2019 20.30, skrev Daniel Vetter:
> > > > On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> > > >>
> > > >>
> > > >>
> > > >> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > > >>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> > > >>>>
> > > >>>>
> > > >>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > > >>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> > > >>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> > > >>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> > > >>>>>>> drm_dev_register().
> > > >>>>>>>
> > > >>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> > > >>>>>>> fbdev emulation as well.
> > > >>>>>>>
> > > >>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> > > >>>>>>>
> > > >>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> > > >>>>>>
> > > >>>>
> > > >>>> <snip>
> > > >>>>
> > > >>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > >>>>>>> index 381581b01d48..12129772be45 100644
> > > >>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> > > >>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> > > >>>>>>> @@ -36,6 +36,7 @@
> > > >>>>>>>
> > > >>>>>>>  #include <drm/drm_client.h>
> > > >>>>>>>  #include <drm/drm_drv.h>
> > > >>>>>>> +#include <drm/drm_fb_helper.h>
> > > >>>>>>>  #include <drm/drmP.h>
> > > >>>>>>>
> > > >>>>>>>  #include "drm_crtc_internal.h"
> > > >>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> > > >>>>>>>  }
> > > >>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> > > >>>>>>>
> > > >>>>>>> +static void devm_drm_dev_init_release(void *data)
> > > >>>>>>> +{
> > > >>>>>>> + drm_dev_put(data);
> > > >>>>>
> > > >>>>> We need drm_dev_unplug() here, or this isn't safe.
> > > >>>>
> > > >>>> This function is only used to cover the error path if probe fails before
> > > >>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> > > >>>> the one that calls unplug. There are comments about this in the functions.
> > > >>>
> > > >>> I think I get a prize for being ignorant and blind :-/
> > > >>>
> > > >>>>
> > > >>>>>
> > > >>>>>>> +}
> > > >>>>>>> +
> > > >>>>>>> +/**
> > > >>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> > > >>>>>>> + * @parent: Parent device object
> > > >>>>>>> + * @dev: DRM device
> > > >>>>>>> + * @driver: DRM driver
> > > >>>>>>> + *
> > > >>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> > > >>>>>>> + * automatically released on driver detach. You must supply a
> > > >>>>>
> > > >>>>> I think a bit more clarity here would be good:
> > > >>>>>
> > > >>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> > > >>>>>
> > > >>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> > > >>>>>
> > > >>>>> I think a loud warning for these is in order:
> > > >>>>>
> > > >>>>> "WARNING:
> > > >>>>>
> > > >>>>> "In generally it is unsafe to use devm functions for drm structures
> > > >>>>> because the lifetimes of &drm_device and the underlying &device do not
> > > >>>>> match. This here works because it doesn't immediately free anything, but
> > > >>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> > > >>>>> refcount through drm_dev_put().
> > > >>>>>
> > > >>>>> "All other drm structures must still be explicitly released in the
> > > >>>>> &drm_driver.release callback."
> > > >>>>>
> > > >>>>> While thinking about this I just realized that with this design we have no
> > > >>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > > >>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> > > >>>>> place to call it:
> > > >>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> > > >>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> > > >>>>>   down anything.
> > > >>>>> - drm_driver.release is way too late.
> > > >>>>>
> > > >>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > > >>>>> already gone), but for a driver unload/unbind it would be nice if this
> > > >>>>> happens automatically and in the right order.
> > > >>>>>
> > > >>>>> So not sure what to do here really.
> > > >>>>
> > > >>>> How about this change: (it breaks the rule of pulling helpers into the
> > > >>>> core, so maybe we should put the devm_ functions into the simple KMS
> > > >>>> helper instead?)
> > > >>>
> > > >>> Yeah smells a bit much like midlayer ... What would work is having a pile
> > > >>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> > > >>> and in the right order. So:
> > > >>>
> > > >>> - devm_drm_dev_init (always does a drm_dev_put())
> > > >>>
> > > >>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> > > >>>
> > > >>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> > > >>>
> > > >>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> > > >>>   can call drm_dev_unplug() unconditionally).
> > > >>>
> > > >>
> > > >> Beautiful! I really like this, it's very flexible.
> > > >>
> > > >> Where should devm_drm_mode_config_reset() live? It will pull in the
> > > >> atomic helper...
> > > >
> > > > I think a new drm_devm.c helper would be nice for all this stuff.
> > > > Especially since you can't freely mix devm-based setup/cleanup with
> > > > normal cleanup I think it'd be good to have it all together in one
> > > > place. And perhaps even a code example in the DOC: overview.
> > > >
> > > >>> We'd need to make sure some of the cleanup actions dtrt when the device is
> > > >>> gone, but I think we can achieve that by liberally sprinkling
> > > >>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> > > >>> drm_mode_config_reset would be:
> > > >>>
> > > >>> {
> > > >>>       if (drm_dev_enter())
> > > >>>               return;
> > > >>>
> > > >>>       drm_atomic_helper_shutdown();
> > > >>>
> > > >>>       drm_dev_exit();
> > > >>> }
> > > >>>
> > > >>
> > > >> drm_dev_enter() can only be used to check whether the drm_device is
> > > >> registered or not, it doesn't say anything about the state of the parent
> > > >> device.
> > > >>
> > > >> All we know is that the device is being unbound from the driver, we
> > > >> don't know if it's the device that's being removed or if it's the driver
> > > >> that's unregistered.
> > > >
> > > > You're right, both paths will have called drm_dev_unplug by then.
> > > > Silly me. I really liked my idea :-)
> > > >
> > > >> I have looked at the various call chains:
> > > >>
> > > >> driver_unregister ->
> > > >>     bus_remove_driver ->
> > > >>         driver_detach ->
> > > >>             device_release_driver_internal
> > > >>
> > > >> device_unregister ->
> > > >>     device_del ->
> > > >>         bus_remove_device ->
> > > >>             device_release_driver ->
> > > >>                 device_release_driver_internal
> > > >>
> > > >> sysfs: unbind_store ->
> > > >>     device_release_driver ->
> > > >>         device_release_driver_internal
> > > >>
> > > >> The only way I've found to differentiate between these in a cleanup
> > > >> action is that device_del() uses the bus notifier to signal
> > > >> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> > > >> notifier could be used to set a drm_device->parent_removed flag.
> > > >
> > > > Hm, this might upset Greg KH's code taste ... maybe there's a better
> > > > way to do this, but best to prototype a patch with this, send it to
> > > > him and ask how to :-)
> > > >
> > >
> > > I'll leave this to the one that needs it. The tinydrm drivers doesn't
> > > need to touch hw after DRM unregister.
> > >
> > > >> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> > > >> everything get disabled when userspace closes? It does in my tinydrm
> > > >> world :-)
> > > >
> > > > Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > > > where drivers leaked drm_connector and drm_framebuffer objects, and
> > > > they've been fixed by calling drm_atomic_helper_shutdown() in the
> > > > unload path. Maybe this is cargo-culting, but it goes way back to
> > > > pre-atomic, where drivers called drm_helper_force_disable_all().
> > > >
> > > > If you try to move the fbcon to your tinydrm drivers (con2fb is
> > > > apparently the cmdline tool you need, never tried it, I only switch
> > > > the kernel's console between fbcon and dummycon and back, not what
> > > > fbcon drivers itself), then I think you should be able to reproduce.
> > > > And maybe you have a better idea how to deal with this all.
> > > >
> > > > Note also that there's been proposals floating around to only close an
> > > > drm_framebuffer, not also remove it (like the current RMFB ioctl
> > > > does), with that closing userspace would not necessarily lead to a
> > > > full cleanup.
> > > >
> > > > Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > > > is if you have the display on, but no planes showing (i.e. all black).
> > > > Then all the fbs will be cleaned up, but drm_connector will be
> > > > leaking. That's a case where you need drm_atomic_helper_shutdown()
> > > > even if fbcon/fbdev isn't even enabled.
> > > >
> > >
> > > Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> > > tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> > > emulation also releases everything.
> > > Even so, maybe I should use devm_drm_mode_config_reset() after all to
> > > keep drivers uniform, to avoid confusion: why doesn't he use it?
> >
> > Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> >
> > Super short summary: We want to start using devm actions to clean up drm
> > drivers. Here's the problem:
> > - For a driver unload/unbind without hotunplug, we want to properly clean
> >   up the hardware and shut it all down.
>
> Then do it on probe/disconnect.
>
> > - But if the device is unplugged already, that's probably not the best
> >   idea, and we only want to clean up the kernel's resources/allocations.
>
> Again, probe/disconnect will be called either way.
>
> But as you note, memory will NOT be freed by the devm stuff if you
> manually unbind a driver from a device.
>
> So don't touch hardware there, it's not going to work :)
>
> > What's the recommendation here? I see a few options:
> >
> > - Make sure everything can deal with this properly. Hotunplug can happen
> >   anytime, so there's a race no matter what.
>
> Yes.
>
> > - Check with the device model whether the struct device is disappearing or
> >   whether we're just dealing with a driver unbind (no idea how to do
> >   that), and act accordingly.
>
> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> Assuming your hardware is still present :)
>
> > - Fundamental question: Touching the hw from devm actions, is that ok? If
> >   not, then the pretty nifty plan laid out in this thread wont work.
>
> Nope, that's not going to work, the device could either be long gone, or
> you will not be called due to unbind happening from userspace.
>
> But really, unbind from userspace is very very rare, it's a developer
> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> :)
>
> > - Something completely different?
>
> Do it in disconnect :)

Ah, I forgot to mention the important constraint :-) disconnect/unbind
should be the following sequence:

1. Unregister all the userspace interfaces (there's a lot of them) and
make sure all the pending ioctls are done so that from now on
userspace sees lots of -EIO (in case it still has fd open, which is
going to be the normal for hotunplug.

2. Shut down hw and all ongoing operations (only relevant for unbind,
but needs to be able to cope with sudden hotunplug on top anyway).

3. Clean up the kernel mess and release everything.

Probe is exactly the other way round, so would perfectly fit into the
devm onion cleanup. See in the commented earlier replies above how
that would match in details, but tldr; if we have to do 2. in
disconnect, then we also have to do 1. in disconnected, and only doing
3. through devm is almost not worth the bother. But if we could do all
three through devm then simple drivers wouldn't even need any
disconnect/unbind callback at all. That's our motivation for trying to
come up with an answer that's not "do it in disconnect". "do it in
disconnect" is how we do it all today already.

Yes we're trying to make tiny drivers even smaller, we have enough
nowadays that this stuff would be worth it :-)

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Den 24.01.2019 18.57, skrev Daniel Vetter:
> On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>>
>> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
>>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
>>>>
>>>>
>>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
>>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
>>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
>>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
>>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
>>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
>>>>>>>>>>> drm_dev_register().
>>>>>>>>>>>
>>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
>>>>>>>>>>> fbdev emulation as well.
>>>>>>>>>>>
>>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> <snip>
>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>> index 381581b01d48..12129772be45 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>> @@ -36,6 +36,7 @@
>>>>>>>>>>>
>>>>>>>>>>>  #include <drm/drm_client.h>
>>>>>>>>>>>  #include <drm/drm_drv.h>
>>>>>>>>>>> +#include <drm/drm_fb_helper.h>
>>>>>>>>>>>  #include <drm/drmP.h>
>>>>>>>>>>>
>>>>>>>>>>>  #include "drm_crtc_internal.h"
>>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
>>>>>>>>>>>  }
>>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
>>>>>>>>>>>
>>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
>>>>>>>>>>> +{
>>>>>>>>>>> + drm_dev_put(data);
>>>>>>>>>
>>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
>>>>>>>>
>>>>>>>> This function is only used to cover the error path if probe fails before
>>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
>>>>>>>> the one that calls unplug. There are comments about this in the functions.
>>>>>>>
>>>>>>> I think I get a prize for being ignorant and blind :-/
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +/**
>>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
>>>>>>>>>>> + * @parent: Parent device object
>>>>>>>>>>> + * @dev: DRM device
>>>>>>>>>>> + * @driver: DRM driver
>>>>>>>>>>> + *
>>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
>>>>>>>>>>> + * automatically released on driver detach. You must supply a
>>>>>>>>>
>>>>>>>>> I think a bit more clarity here would be good:
>>>>>>>>>
>>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
>>>>>>>>>
>>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
>>>>>>>>>
>>>>>>>>> I think a loud warning for these is in order:
>>>>>>>>>
>>>>>>>>> "WARNING:
>>>>>>>>>
>>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
>>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
>>>>>>>>> match. This here works because it doesn't immediately free anything, but
>>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
>>>>>>>>> refcount through drm_dev_put().
>>>>>>>>>
>>>>>>>>> "All other drm structures must still be explicitly released in the
>>>>>>>>> &drm_driver.release callback."
>>>>>>>>>
>>>>>>>>> While thinking about this I just realized that with this design we have no
>>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
>>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
>>>>>>>>> place to call it:
>>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
>>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
>>>>>>>>>   down anything.
>>>>>>>>> - drm_driver.release is way too late.
>>>>>>>>>
>>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
>>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
>>>>>>>>> happens automatically and in the right order.
>>>>>>>>>
>>>>>>>>> So not sure what to do here really.
>>>>>>>>
>>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
>>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
>>>>>>>> helper instead?)
>>>>>>>
>>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
>>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
>>>>>>> and in the right order. So:
>>>>>>>
>>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
>>>>>>>
>>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
>>>>>>>
>>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
>>>>>>>
>>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
>>>>>>>   can call drm_dev_unplug() unconditionally).
>>>>>>>
>>>>>>
>>>>>> Beautiful! I really like this, it's very flexible.
>>>>>>
>>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
>>>>>> atomic helper...
>>>>>
>>>>> I think a new drm_devm.c helper would be nice for all this stuff.
>>>>> Especially since you can't freely mix devm-based setup/cleanup with
>>>>> normal cleanup I think it'd be good to have it all together in one
>>>>> place. And perhaps even a code example in the DOC: overview.
>>>>>
>>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
>>>>>>> gone, but I think we can achieve that by liberally sprinkling
>>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
>>>>>>> drm_mode_config_reset would be:
>>>>>>>
>>>>>>> {
>>>>>>>       if (drm_dev_enter())
>>>>>>>               return;
>>>>>>>
>>>>>>>       drm_atomic_helper_shutdown();
>>>>>>>
>>>>>>>       drm_dev_exit();
>>>>>>> }
>>>>>>>
>>>>>>
>>>>>> drm_dev_enter() can only be used to check whether the drm_device is
>>>>>> registered or not, it doesn't say anything about the state of the parent
>>>>>> device.
>>>>>>
>>>>>> All we know is that the device is being unbound from the driver, we
>>>>>> don't know if it's the device that's being removed or if it's the driver
>>>>>> that's unregistered.
>>>>>
>>>>> You're right, both paths will have called drm_dev_unplug by then.
>>>>> Silly me. I really liked my idea :-)
>>>>>
>>>>>> I have looked at the various call chains:
>>>>>>
>>>>>> driver_unregister ->
>>>>>>     bus_remove_driver ->
>>>>>>         driver_detach ->
>>>>>>             device_release_driver_internal
>>>>>>
>>>>>> device_unregister ->
>>>>>>     device_del ->
>>>>>>         bus_remove_device ->
>>>>>>             device_release_driver ->
>>>>>>                 device_release_driver_internal
>>>>>>
>>>>>> sysfs: unbind_store ->
>>>>>>     device_release_driver ->
>>>>>>         device_release_driver_internal
>>>>>>
>>>>>> The only way I've found to differentiate between these in a cleanup
>>>>>> action is that device_del() uses the bus notifier to signal
>>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
>>>>>> notifier could be used to set a drm_device->parent_removed flag.
>>>>>
>>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
>>>>> way to do this, but best to prototype a patch with this, send it to
>>>>> him and ask how to :-)
>>>>>
>>>>
>>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
>>>> need to touch hw after DRM unregister.
>>>>
>>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
>>>>>> everything get disabled when userspace closes? It does in my tinydrm
>>>>>> world :-)
>>>>>
>>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
>>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
>>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
>>>>> unload path. Maybe this is cargo-culting, but it goes way back to
>>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
>>>>>
>>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
>>>>> apparently the cmdline tool you need, never tried it, I only switch
>>>>> the kernel's console between fbcon and dummycon and back, not what
>>>>> fbcon drivers itself), then I think you should be able to reproduce.
>>>>> And maybe you have a better idea how to deal with this all.
>>>>>
>>>>> Note also that there's been proposals floating around to only close an
>>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
>>>>> does), with that closing userspace would not necessarily lead to a
>>>>> full cleanup.
>>>>>
>>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
>>>>> is if you have the display on, but no planes showing (i.e. all black).
>>>>> Then all the fbs will be cleaned up, but drm_connector will be
>>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
>>>>> even if fbcon/fbdev isn't even enabled.
>>>>>
>>>>
>>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
>>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
>>>> emulation also releases everything.
>>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
>>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
>>>
>>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
>>>
>>> Super short summary: We want to start using devm actions to clean up drm
>>> drivers. Here's the problem:
>>> - For a driver unload/unbind without hotunplug, we want to properly clean
>>>   up the hardware and shut it all down.
>>
>> Then do it on probe/disconnect.
>>
>>> - But if the device is unplugged already, that's probably not the best
>>>   idea, and we only want to clean up the kernel's resources/allocations.
>>
>> Again, probe/disconnect will be called either way.
>>
>> But as you note, memory will NOT be freed by the devm stuff if you
>> manually unbind a driver from a device.
>>
>> So don't touch hardware there, it's not going to work :)
>>
>>> What's the recommendation here? I see a few options:
>>>
>>> - Make sure everything can deal with this properly. Hotunplug can happen
>>>   anytime, so there's a race no matter what.
>>
>> Yes.
>>
>>> - Check with the device model whether the struct device is disappearing or
>>>   whether we're just dealing with a driver unbind (no idea how to do
>>>   that), and act accordingly.
>>
>> You don't know that, sorry.  Just do any hardware stuff on disconnect.
>> Assuming your hardware is still present :)
>>
>>> - Fundamental question: Touching the hw from devm actions, is that ok? If
>>>   not, then the pretty nifty plan laid out in this thread wont work.
>>
>> Nope, that's not going to work, the device could either be long gone, or
>> you will not be called due to unbind happening from userspace.
>>
>> But really, unbind from userspace is very very rare, it's a developer
>> thing mostly.  Oh and a virtual driver thing, but those people are crazy
>> :)
>>
>>> - Something completely different?
>>
>> Do it in disconnect :)
> 
> Ah, I forgot to mention the important constraint :-) disconnect/unbind
> should be the following sequence:
> 
> 1. Unregister all the userspace interfaces (there's a lot of them) and
> make sure all the pending ioctls are done so that from now on
> userspace sees lots of -EIO (in case it still has fd open, which is
> going to be the normal for hotunplug.
> 
> 2. Shut down hw and all ongoing operations (only relevant for unbind,
> but needs to be able to cope with sudden hotunplug on top anyway).
> 
> 3. Clean up the kernel mess and release everything.
> 
> Probe is exactly the other way round, so would perfectly fit into the
> devm onion cleanup. See in the commented earlier replies above how
> that would match in details, but tldr; if we have to do 2. in
> disconnect, then we also have to do 1. in disconnected, and only doing
> 3. through devm is almost not worth the bother. But if we could do all
> three through devm then simple drivers wouldn't even need any
> disconnect/unbind callback at all. That's our motivation for trying to
> come up with an answer that's not "do it in disconnect". "do it in
> disconnect" is how we do it all today already.
> 
> Yes we're trying to make tiny drivers even smaller, we have enough
> nowadays that this stuff would be worth it :-)
> 

I think a solution is to say that drivers that want to touch hw on
disconnect needs to use device_driver->remove to do that.

This is an example driver that doesn't need to touch hw because it's so
simple that userspace has disabled the pipeline:

static void drm_driver_release(struct drm_device *drm)
{
	drm_mode_config_cleanup(drm);
	drm_dev_fini(drm);
	kfree(drm);
}

static struct drm_driver drm_driver = {
	.release = drm_driver_release,
	/* ... */
};

static int driver_probe(struct device *dev)
{
	struct drm_device *drm;
	int ret;

	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
	if (!drm)
		return -ENOMEM;

	ret = devm_drm_dev_init(dev, drm, &drm_driver);
	if (ret) {
		kfree(drm);
		return ret;
	}

	drm_mode_config_init(drm);

	/* Aquire various resources, all managed by devres */

	drm_mode_config_reset(drm);

	return devm_drm_dev_register(drm);
}

struct device_driver driver = {
	.probe = driver_probe,
};


A driver that wants to touch hardware on disconnect, can look like this:

static void drm_driver_release(struct drm_device *drm)
{
	drm_mode_config_cleanup(drm);
	drm_dev_fini(drm);
	kfree(drm);
}

static struct drm_driver drm_driver = {
	.release = drm_driver_release,
	/* ... */
};

static int driver_probe(struct device *dev)
{
	struct drm_device *drm;
	int ret;

	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
	if (!drm)
		return -ENOMEM;

	ret = devm_drm_dev_init(dev, drm, &drm_driver);
	if (ret) {
		kfree(drm);
		return ret;
	}

	drm_mode_config_init(drm);

	/* Aquire various resources, all managed by devres */

	drm_mode_config_reset(drm);

	ret = drm_dev_register(drm);
	if (ret)
		return ret;

	drm_dev_get(dev); /* If using drm_dev_unplug() */

	dev_set_drvdata(dev, drm);

	return 0;
}

/* This function is called before devres_release_all() */
static int driver_remove(struct device *dev)
{
	struct drm_device *drm = dev_get_drvdata(dev);

	drm_dev_unplug(drm); OR drm_dev_unregister(drm);
	drm_atomic_helper_shutdown(drm)

	return 0;
}

struct device_driver driver = {
	.probe = driver_probe,
	.remove = driver_remove,
};


Noralf.
On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> 
> 
> Den 24.01.2019 18.57, skrev Daniel Vetter:
> > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >>
> >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> >>>>
> >>>>
> >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> >>>>>>>>>>> drm_dev_register().
> >>>>>>>>>>>
> >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> >>>>>>>>>>> fbdev emulation as well.
> >>>>>>>>>>>
> >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> <snip>
> >>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> @@ -36,6 +36,7 @@
> >>>>>>>>>>>
> >>>>>>>>>>>  #include <drm/drm_client.h>
> >>>>>>>>>>>  #include <drm/drm_drv.h>
> >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> >>>>>>>>>>>  #include <drm/drmP.h>
> >>>>>>>>>>>
> >>>>>>>>>>>  #include "drm_crtc_internal.h"
> >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>>>>>  }
> >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> >>>>>>>>>>>
> >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> >>>>>>>>>>> +{
> >>>>>>>>>>> + drm_dev_put(data);
> >>>>>>>>>
> >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> >>>>>>>>
> >>>>>>>> This function is only used to cover the error path if probe fails before
> >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> >>>>>>>
> >>>>>>> I think I get a prize for being ignorant and blind :-/
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>> +/**
> >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> >>>>>>>>>>> + * @parent: Parent device object
> >>>>>>>>>>> + * @dev: DRM device
> >>>>>>>>>>> + * @driver: DRM driver
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> >>>>>>>>>
> >>>>>>>>> I think a bit more clarity here would be good:
> >>>>>>>>>
> >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> >>>>>>>>>
> >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> >>>>>>>>>
> >>>>>>>>> I think a loud warning for these is in order:
> >>>>>>>>>
> >>>>>>>>> "WARNING:
> >>>>>>>>>
> >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> >>>>>>>>> refcount through drm_dev_put().
> >>>>>>>>>
> >>>>>>>>> "All other drm structures must still be explicitly released in the
> >>>>>>>>> &drm_driver.release callback."
> >>>>>>>>>
> >>>>>>>>> While thinking about this I just realized that with this design we have no
> >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> >>>>>>>>> place to call it:
> >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> >>>>>>>>>   down anything.
> >>>>>>>>> - drm_driver.release is way too late.
> >>>>>>>>>
> >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> >>>>>>>>> happens automatically and in the right order.
> >>>>>>>>>
> >>>>>>>>> So not sure what to do here really.
> >>>>>>>>
> >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> >>>>>>>> helper instead?)
> >>>>>>>
> >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> >>>>>>> and in the right order. So:
> >>>>>>>
> >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> >>>>>>>
> >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> >>>>>>>
> >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> >>>>>>>
> >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> >>>>>>>   can call drm_dev_unplug() unconditionally).
> >>>>>>>
> >>>>>>
> >>>>>> Beautiful! I really like this, it's very flexible.
> >>>>>>
> >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> >>>>>> atomic helper...
> >>>>>
> >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> >>>>> normal cleanup I think it'd be good to have it all together in one
> >>>>> place. And perhaps even a code example in the DOC: overview.
> >>>>>
> >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> >>>>>>> drm_mode_config_reset would be:
> >>>>>>>
> >>>>>>> {
> >>>>>>>       if (drm_dev_enter())
> >>>>>>>               return;
> >>>>>>>
> >>>>>>>       drm_atomic_helper_shutdown();
> >>>>>>>
> >>>>>>>       drm_dev_exit();
> >>>>>>> }
> >>>>>>>
> >>>>>>
> >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> >>>>>> registered or not, it doesn't say anything about the state of the parent
> >>>>>> device.
> >>>>>>
> >>>>>> All we know is that the device is being unbound from the driver, we
> >>>>>> don't know if it's the device that's being removed or if it's the driver
> >>>>>> that's unregistered.
> >>>>>
> >>>>> You're right, both paths will have called drm_dev_unplug by then.
> >>>>> Silly me. I really liked my idea :-)
> >>>>>
> >>>>>> I have looked at the various call chains:
> >>>>>>
> >>>>>> driver_unregister ->
> >>>>>>     bus_remove_driver ->
> >>>>>>         driver_detach ->
> >>>>>>             device_release_driver_internal
> >>>>>>
> >>>>>> device_unregister ->
> >>>>>>     device_del ->
> >>>>>>         bus_remove_device ->
> >>>>>>             device_release_driver ->
> >>>>>>                 device_release_driver_internal
> >>>>>>
> >>>>>> sysfs: unbind_store ->
> >>>>>>     device_release_driver ->
> >>>>>>         device_release_driver_internal
> >>>>>>
> >>>>>> The only way I've found to differentiate between these in a cleanup
> >>>>>> action is that device_del() uses the bus notifier to signal
> >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> >>>>>
> >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> >>>>> way to do this, but best to prototype a patch with this, send it to
> >>>>> him and ask how to :-)
> >>>>>
> >>>>
> >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> >>>> need to touch hw after DRM unregister.
> >>>>
> >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> >>>>>> world :-)
> >>>>>
> >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> >>>>>
> >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> >>>>> apparently the cmdline tool you need, never tried it, I only switch
> >>>>> the kernel's console between fbcon and dummycon and back, not what
> >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> >>>>> And maybe you have a better idea how to deal with this all.
> >>>>>
> >>>>> Note also that there's been proposals floating around to only close an
> >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> >>>>> does), with that closing userspace would not necessarily lead to a
> >>>>> full cleanup.
> >>>>>
> >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> >>>>> is if you have the display on, but no planes showing (i.e. all black).
> >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> >>>>> even if fbcon/fbdev isn't even enabled.
> >>>>>
> >>>>
> >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> >>>> emulation also releases everything.
> >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> >>>
> >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> >>>
> >>> Super short summary: We want to start using devm actions to clean up drm
> >>> drivers. Here's the problem:
> >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> >>>   up the hardware and shut it all down.
> >>
> >> Then do it on probe/disconnect.
> >>
> >>> - But if the device is unplugged already, that's probably not the best
> >>>   idea, and we only want to clean up the kernel's resources/allocations.
> >>
> >> Again, probe/disconnect will be called either way.
> >>
> >> But as you note, memory will NOT be freed by the devm stuff if you
> >> manually unbind a driver from a device.
> >>
> >> So don't touch hardware there, it's not going to work :)
> >>
> >>> What's the recommendation here? I see a few options:
> >>>
> >>> - Make sure everything can deal with this properly. Hotunplug can happen
> >>>   anytime, so there's a race no matter what.
> >>
> >> Yes.
> >>
> >>> - Check with the device model whether the struct device is disappearing or
> >>>   whether we're just dealing with a driver unbind (no idea how to do
> >>>   that), and act accordingly.
> >>
> >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> >> Assuming your hardware is still present :)
> >>
> >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> >>>   not, then the pretty nifty plan laid out in this thread wont work.
> >>
> >> Nope, that's not going to work, the device could either be long gone, or
> >> you will not be called due to unbind happening from userspace.
> >>
> >> But really, unbind from userspace is very very rare, it's a developer
> >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> >> :)
> >>
> >>> - Something completely different?
> >>
> >> Do it in disconnect :)
> > 
> > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > should be the following sequence:
> > 
> > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > make sure all the pending ioctls are done so that from now on
> > userspace sees lots of -EIO (in case it still has fd open, which is
> > going to be the normal for hotunplug.
> > 
> > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > but needs to be able to cope with sudden hotunplug on top anyway).
> > 
> > 3. Clean up the kernel mess and release everything.
> > 
> > Probe is exactly the other way round, so would perfectly fit into the
> > devm onion cleanup. See in the commented earlier replies above how
> > that would match in details, but tldr; if we have to do 2. in
> > disconnect, then we also have to do 1. in disconnected, and only doing
> > 3. through devm is almost not worth the bother. But if we could do all
> > three through devm then simple drivers wouldn't even need any
> > disconnect/unbind callback at all. That's our motivation for trying to
> > come up with an answer that's not "do it in disconnect". "do it in
> > disconnect" is how we do it all today already.
> > 
> > Yes we're trying to make tiny drivers even smaller, we have enough
> > nowadays that this stuff would be worth it :-)
> > 
> 
> I think a solution is to say that drivers that want to touch hw on
> disconnect needs to use device_driver->remove to do that.

That's what I was trying to say above "remove" is what I was trying to
say is "disconnect".

thanks,

greg k-h
On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> 
> 
> Den 24.01.2019 18.57, skrev Daniel Vetter:
> > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >>
> >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> >>>>
> >>>>
> >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> >>>>>>>>>>> drm_dev_register().
> >>>>>>>>>>>
> >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> >>>>>>>>>>> fbdev emulation as well.
> >>>>>>>>>>>
> >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> <snip>
> >>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> @@ -36,6 +36,7 @@
> >>>>>>>>>>>
> >>>>>>>>>>>  #include <drm/drm_client.h>
> >>>>>>>>>>>  #include <drm/drm_drv.h>
> >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> >>>>>>>>>>>  #include <drm/drmP.h>
> >>>>>>>>>>>
> >>>>>>>>>>>  #include "drm_crtc_internal.h"
> >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>>>>>  }
> >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> >>>>>>>>>>>
> >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> >>>>>>>>>>> +{
> >>>>>>>>>>> + drm_dev_put(data);
> >>>>>>>>>
> >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> >>>>>>>>
> >>>>>>>> This function is only used to cover the error path if probe fails before
> >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> >>>>>>>
> >>>>>>> I think I get a prize for being ignorant and blind :-/
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>> +/**
> >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> >>>>>>>>>>> + * @parent: Parent device object
> >>>>>>>>>>> + * @dev: DRM device
> >>>>>>>>>>> + * @driver: DRM driver
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> >>>>>>>>>
> >>>>>>>>> I think a bit more clarity here would be good:
> >>>>>>>>>
> >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> >>>>>>>>>
> >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> >>>>>>>>>
> >>>>>>>>> I think a loud warning for these is in order:
> >>>>>>>>>
> >>>>>>>>> "WARNING:
> >>>>>>>>>
> >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> >>>>>>>>> refcount through drm_dev_put().
> >>>>>>>>>
> >>>>>>>>> "All other drm structures must still be explicitly released in the
> >>>>>>>>> &drm_driver.release callback."
> >>>>>>>>>
> >>>>>>>>> While thinking about this I just realized that with this design we have no
> >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> >>>>>>>>> place to call it:
> >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> >>>>>>>>>   down anything.
> >>>>>>>>> - drm_driver.release is way too late.
> >>>>>>>>>
> >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> >>>>>>>>> happens automatically and in the right order.
> >>>>>>>>>
> >>>>>>>>> So not sure what to do here really.
> >>>>>>>>
> >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> >>>>>>>> helper instead?)
> >>>>>>>
> >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> >>>>>>> and in the right order. So:
> >>>>>>>
> >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> >>>>>>>
> >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> >>>>>>>
> >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> >>>>>>>
> >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> >>>>>>>   can call drm_dev_unplug() unconditionally).
> >>>>>>>
> >>>>>>
> >>>>>> Beautiful! I really like this, it's very flexible.
> >>>>>>
> >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> >>>>>> atomic helper...
> >>>>>
> >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> >>>>> normal cleanup I think it'd be good to have it all together in one
> >>>>> place. And perhaps even a code example in the DOC: overview.
> >>>>>
> >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> >>>>>>> drm_mode_config_reset would be:
> >>>>>>>
> >>>>>>> {
> >>>>>>>       if (drm_dev_enter())
> >>>>>>>               return;
> >>>>>>>
> >>>>>>>       drm_atomic_helper_shutdown();
> >>>>>>>
> >>>>>>>       drm_dev_exit();
> >>>>>>> }
> >>>>>>>
> >>>>>>
> >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> >>>>>> registered or not, it doesn't say anything about the state of the parent
> >>>>>> device.
> >>>>>>
> >>>>>> All we know is that the device is being unbound from the driver, we
> >>>>>> don't know if it's the device that's being removed or if it's the driver
> >>>>>> that's unregistered.
> >>>>>
> >>>>> You're right, both paths will have called drm_dev_unplug by then.
> >>>>> Silly me. I really liked my idea :-)
> >>>>>
> >>>>>> I have looked at the various call chains:
> >>>>>>
> >>>>>> driver_unregister ->
> >>>>>>     bus_remove_driver ->
> >>>>>>         driver_detach ->
> >>>>>>             device_release_driver_internal
> >>>>>>
> >>>>>> device_unregister ->
> >>>>>>     device_del ->
> >>>>>>         bus_remove_device ->
> >>>>>>             device_release_driver ->
> >>>>>>                 device_release_driver_internal
> >>>>>>
> >>>>>> sysfs: unbind_store ->
> >>>>>>     device_release_driver ->
> >>>>>>         device_release_driver_internal
> >>>>>>
> >>>>>> The only way I've found to differentiate between these in a cleanup
> >>>>>> action is that device_del() uses the bus notifier to signal
> >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> >>>>>
> >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> >>>>> way to do this, but best to prototype a patch with this, send it to
> >>>>> him and ask how to :-)
> >>>>>
> >>>>
> >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> >>>> need to touch hw after DRM unregister.
> >>>>
> >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> >>>>>> world :-)
> >>>>>
> >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> >>>>>
> >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> >>>>> apparently the cmdline tool you need, never tried it, I only switch
> >>>>> the kernel's console between fbcon and dummycon and back, not what
> >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> >>>>> And maybe you have a better idea how to deal with this all.
> >>>>>
> >>>>> Note also that there's been proposals floating around to only close an
> >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> >>>>> does), with that closing userspace would not necessarily lead to a
> >>>>> full cleanup.
> >>>>>
> >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> >>>>> is if you have the display on, but no planes showing (i.e. all black).
> >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> >>>>> even if fbcon/fbdev isn't even enabled.
> >>>>>
> >>>>
> >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> >>>> emulation also releases everything.
> >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> >>>
> >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> >>>
> >>> Super short summary: We want to start using devm actions to clean up drm
> >>> drivers. Here's the problem:
> >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> >>>   up the hardware and shut it all down.
> >>
> >> Then do it on probe/disconnect.
> >>
> >>> - But if the device is unplugged already, that's probably not the best
> >>>   idea, and we only want to clean up the kernel's resources/allocations.
> >>
> >> Again, probe/disconnect will be called either way.
> >>
> >> But as you note, memory will NOT be freed by the devm stuff if you
> >> manually unbind a driver from a device.
> >>
> >> So don't touch hardware there, it's not going to work :)
> >>
> >>> What's the recommendation here? I see a few options:
> >>>
> >>> - Make sure everything can deal with this properly. Hotunplug can happen
> >>>   anytime, so there's a race no matter what.
> >>
> >> Yes.
> >>
> >>> - Check with the device model whether the struct device is disappearing or
> >>>   whether we're just dealing with a driver unbind (no idea how to do
> >>>   that), and act accordingly.
> >>
> >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> >> Assuming your hardware is still present :)
> >>
> >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> >>>   not, then the pretty nifty plan laid out in this thread wont work.
> >>
> >> Nope, that's not going to work, the device could either be long gone, or
> >> you will not be called due to unbind happening from userspace.
> >>
> >> But really, unbind from userspace is very very rare, it's a developer
> >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> >> :)
> >>
> >>> - Something completely different?
> >>
> >> Do it in disconnect :)
> > 
> > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > should be the following sequence:
> > 
> > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > make sure all the pending ioctls are done so that from now on
> > userspace sees lots of -EIO (in case it still has fd open, which is
> > going to be the normal for hotunplug.
> > 
> > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > but needs to be able to cope with sudden hotunplug on top anyway).
> > 
> > 3. Clean up the kernel mess and release everything.
> > 
> > Probe is exactly the other way round, so would perfectly fit into the
> > devm onion cleanup. See in the commented earlier replies above how
> > that would match in details, but tldr; if we have to do 2. in
> > disconnect, then we also have to do 1. in disconnected, and only doing
> > 3. through devm is almost not worth the bother. But if we could do all
> > three through devm then simple drivers wouldn't even need any
> > disconnect/unbind callback at all. That's our motivation for trying to
> > come up with an answer that's not "do it in disconnect". "do it in
> > disconnect" is how we do it all today already.
> > 
> > Yes we're trying to make tiny drivers even smaller, we have enough
> > nowadays that this stuff would be worth it :-)
> > 
> 
> I think a solution is to say that drivers that want to touch hw on
> disconnect needs to use device_driver->remove to do that.
> 
> This is an example driver that doesn't need to touch hw because it's so
> simple that userspace has disabled the pipeline:
> 
> static void drm_driver_release(struct drm_device *drm)
> {
> 	drm_mode_config_cleanup(drm);
> 	drm_dev_fini(drm);
> 	kfree(drm);
> }
> 
> static struct drm_driver drm_driver = {
> 	.release = drm_driver_release,
> 	/* ... */
> };
> 
> static int driver_probe(struct device *dev)
> {
> 	struct drm_device *drm;
> 	int ret;
> 
> 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> 	if (!drm)
> 		return -ENOMEM;
> 
> 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
> 	if (ret) {
> 		kfree(drm);
> 		return ret;
> 	}
> 
> 	drm_mode_config_init(drm);
> 
> 	/* Aquire various resources, all managed by devres */
> 
> 	drm_mode_config_reset(drm);
> 
> 	return devm_drm_dev_register(drm);
> }
> 
> struct device_driver driver = {
> 	.probe = driver_probe,
> };
> 
> 
> A driver that wants to touch hardware on disconnect, can look like this:
> 
> static void drm_driver_release(struct drm_device *drm)
> {
> 	drm_mode_config_cleanup(drm);
> 	drm_dev_fini(drm);
> 	kfree(drm);
> }
> 
> static struct drm_driver drm_driver = {
> 	.release = drm_driver_release,
> 	/* ... */
> };
> 
> static int driver_probe(struct device *dev)
> {
> 	struct drm_device *drm;
> 	int ret;
> 
> 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> 	if (!drm)
> 		return -ENOMEM;
> 
> 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
> 	if (ret) {
> 		kfree(drm);
> 		return ret;
> 	}
> 
> 	drm_mode_config_init(drm);
> 
> 	/* Aquire various resources, all managed by devres */
> 
> 	drm_mode_config_reset(drm);
> 
> 	ret = drm_dev_register(drm);
> 	if (ret)
> 		return ret;
> 
> 	drm_dev_get(dev); /* If using drm_dev_unplug() */
> 
> 	dev_set_drvdata(dev, drm);
> 
> 	return 0;
> }
> 
> /* This function is called before devres_release_all() */
> static int driver_remove(struct device *dev)
> {
> 	struct drm_device *drm = dev_get_drvdata(dev);
> 
> 	drm_dev_unplug(drm); OR drm_dev_unregister(drm);
> 	drm_atomic_helper_shutdown(drm)
> 
> 	return 0;
> }
> 
> struct device_driver driver = {
> 	.probe = driver_probe,
> 	.remove = driver_remove,

That's exactly the pattern I'm trying to avoid, because imo your tiny
driver _also_ should do this. Or we realize that all the current drivers
doing drm_atomic_helper_shutdown are misguided, but I'm not really
understanding why.

Having a devm helper which cannot be used for some drivers due to
fundamental design issues is kinda not great, because it means everyone
will still use it, and shrug the bugs off as "not my problem". Which is
what's happening right now with all the devm_kzalloc we have in drm
drivers that all the get lifetim wrong. But because devm_ is convenient
everyone uses it, so the driver unload of most drivers is full of bugs.
-Daniel
Den 29.01.2019 17.50, skrev Daniel Vetter:
> On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 24.01.2019 18.57, skrev Daniel Vetter:
>>> On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>>>>
>>>> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
>>>>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
>>>>>>
>>>>>>
>>>>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
>>>>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
>>>>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
>>>>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
>>>>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
>>>>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
>>>>>>>>>>>>> drm_dev_register().
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
>>>>>>>>>>>>> fbdev emulation as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <snip>
>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>>>> index 381581b01d48..12129772be45 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>>>> @@ -36,6 +36,7 @@
>>>>>>>>>>>>>
>>>>>>>>>>>>>  #include <drm/drm_client.h>
>>>>>>>>>>>>>  #include <drm/drm_drv.h>
>>>>>>>>>>>>> +#include <drm/drm_fb_helper.h>
>>>>>>>>>>>>>  #include <drm/drmP.h>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  #include "drm_crtc_internal.h"
>>>>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
>>>>>>>>>>>>>
>>>>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> + drm_dev_put(data);
>>>>>>>>>>>
>>>>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
>>>>>>>>>>
>>>>>>>>>> This function is only used to cover the error path if probe fails before
>>>>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
>>>>>>>>>> the one that calls unplug. There are comments about this in the functions.
>>>>>>>>>
>>>>>>>>> I think I get a prize for being ignorant and blind :-/
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
>>>>>>>>>>>>> + * @parent: Parent device object
>>>>>>>>>>>>> + * @dev: DRM device
>>>>>>>>>>>>> + * @driver: DRM driver
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
>>>>>>>>>>>>> + * automatically released on driver detach. You must supply a
>>>>>>>>>>>
>>>>>>>>>>> I think a bit more clarity here would be good:
>>>>>>>>>>>
>>>>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
>>>>>>>>>>>
>>>>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
>>>>>>>>>>>
>>>>>>>>>>> I think a loud warning for these is in order:
>>>>>>>>>>>
>>>>>>>>>>> "WARNING:
>>>>>>>>>>>
>>>>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
>>>>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
>>>>>>>>>>> match. This here works because it doesn't immediately free anything, but
>>>>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
>>>>>>>>>>> refcount through drm_dev_put().
>>>>>>>>>>>
>>>>>>>>>>> "All other drm structures must still be explicitly released in the
>>>>>>>>>>> &drm_driver.release callback."
>>>>>>>>>>>
>>>>>>>>>>> While thinking about this I just realized that with this design we have no
>>>>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
>>>>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
>>>>>>>>>>> place to call it:
>>>>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
>>>>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
>>>>>>>>>>>   down anything.
>>>>>>>>>>> - drm_driver.release is way too late.
>>>>>>>>>>>
>>>>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
>>>>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
>>>>>>>>>>> happens automatically and in the right order.
>>>>>>>>>>>
>>>>>>>>>>> So not sure what to do here really.
>>>>>>>>>>
>>>>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
>>>>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
>>>>>>>>>> helper instead?)
>>>>>>>>>
>>>>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
>>>>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
>>>>>>>>> and in the right order. So:
>>>>>>>>>
>>>>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
>>>>>>>>>
>>>>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
>>>>>>>>>
>>>>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
>>>>>>>>>
>>>>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
>>>>>>>>>   can call drm_dev_unplug() unconditionally).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Beautiful! I really like this, it's very flexible.
>>>>>>>>
>>>>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
>>>>>>>> atomic helper...
>>>>>>>
>>>>>>> I think a new drm_devm.c helper would be nice for all this stuff.
>>>>>>> Especially since you can't freely mix devm-based setup/cleanup with
>>>>>>> normal cleanup I think it'd be good to have it all together in one
>>>>>>> place. And perhaps even a code example in the DOC: overview.
>>>>>>>
>>>>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
>>>>>>>>> gone, but I think we can achieve that by liberally sprinkling
>>>>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
>>>>>>>>> drm_mode_config_reset would be:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>       if (drm_dev_enter())
>>>>>>>>>               return;
>>>>>>>>>
>>>>>>>>>       drm_atomic_helper_shutdown();
>>>>>>>>>
>>>>>>>>>       drm_dev_exit();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>
>>>>>>>> drm_dev_enter() can only be used to check whether the drm_device is
>>>>>>>> registered or not, it doesn't say anything about the state of the parent
>>>>>>>> device.
>>>>>>>>
>>>>>>>> All we know is that the device is being unbound from the driver, we
>>>>>>>> don't know if it's the device that's being removed or if it's the driver
>>>>>>>> that's unregistered.
>>>>>>>
>>>>>>> You're right, both paths will have called drm_dev_unplug by then.
>>>>>>> Silly me. I really liked my idea :-)
>>>>>>>
>>>>>>>> I have looked at the various call chains:
>>>>>>>>
>>>>>>>> driver_unregister ->
>>>>>>>>     bus_remove_driver ->
>>>>>>>>         driver_detach ->
>>>>>>>>             device_release_driver_internal
>>>>>>>>
>>>>>>>> device_unregister ->
>>>>>>>>     device_del ->
>>>>>>>>         bus_remove_device ->
>>>>>>>>             device_release_driver ->
>>>>>>>>                 device_release_driver_internal
>>>>>>>>
>>>>>>>> sysfs: unbind_store ->
>>>>>>>>     device_release_driver ->
>>>>>>>>         device_release_driver_internal
>>>>>>>>
>>>>>>>> The only way I've found to differentiate between these in a cleanup
>>>>>>>> action is that device_del() uses the bus notifier to signal
>>>>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
>>>>>>>> notifier could be used to set a drm_device->parent_removed flag.
>>>>>>>
>>>>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
>>>>>>> way to do this, but best to prototype a patch with this, send it to
>>>>>>> him and ask how to :-)
>>>>>>>
>>>>>>
>>>>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
>>>>>> need to touch hw after DRM unregister.
>>>>>>
>>>>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
>>>>>>>> everything get disabled when userspace closes? It does in my tinydrm
>>>>>>>> world :-)
>>>>>>>
>>>>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
>>>>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
>>>>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
>>>>>>> unload path. Maybe this is cargo-culting, but it goes way back to
>>>>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
>>>>>>>
>>>>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
>>>>>>> apparently the cmdline tool you need, never tried it, I only switch
>>>>>>> the kernel's console between fbcon and dummycon and back, not what
>>>>>>> fbcon drivers itself), then I think you should be able to reproduce.
>>>>>>> And maybe you have a better idea how to deal with this all.
>>>>>>>
>>>>>>> Note also that there's been proposals floating around to only close an
>>>>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
>>>>>>> does), with that closing userspace would not necessarily lead to a
>>>>>>> full cleanup.
>>>>>>>
>>>>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
>>>>>>> is if you have the display on, but no planes showing (i.e. all black).
>>>>>>> Then all the fbs will be cleaned up, but drm_connector will be
>>>>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
>>>>>>> even if fbcon/fbdev isn't even enabled.
>>>>>>>
>>>>>>
>>>>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
>>>>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
>>>>>> emulation also releases everything.
>>>>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
>>>>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
>>>>>
>>>>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
>>>>>
>>>>> Super short summary: We want to start using devm actions to clean up drm
>>>>> drivers. Here's the problem:
>>>>> - For a driver unload/unbind without hotunplug, we want to properly clean
>>>>>   up the hardware and shut it all down.
>>>>
>>>> Then do it on probe/disconnect.
>>>>
>>>>> - But if the device is unplugged already, that's probably not the best
>>>>>   idea, and we only want to clean up the kernel's resources/allocations.
>>>>
>>>> Again, probe/disconnect will be called either way.
>>>>
>>>> But as you note, memory will NOT be freed by the devm stuff if you
>>>> manually unbind a driver from a device.
>>>>
>>>> So don't touch hardware there, it's not going to work :)
>>>>
>>>>> What's the recommendation here? I see a few options:
>>>>>
>>>>> - Make sure everything can deal with this properly. Hotunplug can happen
>>>>>   anytime, so there's a race no matter what.
>>>>
>>>> Yes.
>>>>
>>>>> - Check with the device model whether the struct device is disappearing or
>>>>>   whether we're just dealing with a driver unbind (no idea how to do
>>>>>   that), and act accordingly.
>>>>
>>>> You don't know that, sorry.  Just do any hardware stuff on disconnect.
>>>> Assuming your hardware is still present :)
>>>>
>>>>> - Fundamental question: Touching the hw from devm actions, is that ok? If
>>>>>   not, then the pretty nifty plan laid out in this thread wont work.
>>>>
>>>> Nope, that's not going to work, the device could either be long gone, or
>>>> you will not be called due to unbind happening from userspace.
>>>>
>>>> But really, unbind from userspace is very very rare, it's a developer
>>>> thing mostly.  Oh and a virtual driver thing, but those people are crazy
>>>> :)
>>>>
>>>>> - Something completely different?
>>>>
>>>> Do it in disconnect :)
>>>
>>> Ah, I forgot to mention the important constraint :-) disconnect/unbind
>>> should be the following sequence:
>>>
>>> 1. Unregister all the userspace interfaces (there's a lot of them) and
>>> make sure all the pending ioctls are done so that from now on
>>> userspace sees lots of -EIO (in case it still has fd open, which is
>>> going to be the normal for hotunplug.
>>>
>>> 2. Shut down hw and all ongoing operations (only relevant for unbind,
>>> but needs to be able to cope with sudden hotunplug on top anyway).
>>>
>>> 3. Clean up the kernel mess and release everything.
>>>
>>> Probe is exactly the other way round, so would perfectly fit into the
>>> devm onion cleanup. See in the commented earlier replies above how
>>> that would match in details, but tldr; if we have to do 2. in
>>> disconnect, then we also have to do 1. in disconnected, and only doing
>>> 3. through devm is almost not worth the bother. But if we could do all
>>> three through devm then simple drivers wouldn't even need any
>>> disconnect/unbind callback at all. That's our motivation for trying to
>>> come up with an answer that's not "do it in disconnect". "do it in
>>> disconnect" is how we do it all today already.
>>>
>>> Yes we're trying to make tiny drivers even smaller, we have enough
>>> nowadays that this stuff would be worth it :-)
>>>
>>
>> I think a solution is to say that drivers that want to touch hw on
>> disconnect needs to use device_driver->remove to do that.
>>
>> This is an example driver that doesn't need to touch hw because it's so
>> simple that userspace has disabled the pipeline:
>>
>> static void drm_driver_release(struct drm_device *drm)
>> {
>> 	drm_mode_config_cleanup(drm);
>> 	drm_dev_fini(drm);
>> 	kfree(drm);
>> }
>>
>> static struct drm_driver drm_driver = {
>> 	.release = drm_driver_release,
>> 	/* ... */
>> };
>>
>> static int driver_probe(struct device *dev)
>> {
>> 	struct drm_device *drm;
>> 	int ret;
>>
>> 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
>> 	if (!drm)
>> 		return -ENOMEM;
>>
>> 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
>> 	if (ret) {
>> 		kfree(drm);
>> 		return ret;
>> 	}
>>
>> 	drm_mode_config_init(drm);
>>
>> 	/* Aquire various resources, all managed by devres */
>>
>> 	drm_mode_config_reset(drm);
>>
>> 	return devm_drm_dev_register(drm);
>> }
>>
>> struct device_driver driver = {
>> 	.probe = driver_probe,
>> };
>>
>>
>> A driver that wants to touch hardware on disconnect, can look like this:
>>
>> static void drm_driver_release(struct drm_device *drm)
>> {
>> 	drm_mode_config_cleanup(drm);
>> 	drm_dev_fini(drm);
>> 	kfree(drm);
>> }
>>
>> static struct drm_driver drm_driver = {
>> 	.release = drm_driver_release,
>> 	/* ... */
>> };
>>
>> static int driver_probe(struct device *dev)
>> {
>> 	struct drm_device *drm;
>> 	int ret;
>>
>> 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
>> 	if (!drm)
>> 		return -ENOMEM;
>>
>> 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
>> 	if (ret) {
>> 		kfree(drm);
>> 		return ret;
>> 	}
>>
>> 	drm_mode_config_init(drm);
>>
>> 	/* Aquire various resources, all managed by devres */
>>
>> 	drm_mode_config_reset(drm);
>>
>> 	ret = drm_dev_register(drm);
>> 	if (ret)
>> 		return ret;
>>
>> 	drm_dev_get(dev); /* If using drm_dev_unplug() */
>>
>> 	dev_set_drvdata(dev, drm);
>>
>> 	return 0;
>> }
>>
>> /* This function is called before devres_release_all() */
>> static int driver_remove(struct device *dev)
>> {
>> 	struct drm_device *drm = dev_get_drvdata(dev);
>>
>> 	drm_dev_unplug(drm); OR drm_dev_unregister(drm);
>> 	drm_atomic_helper_shutdown(drm)
>>
>> 	return 0;
>> }
>>
>> struct device_driver driver = {
>> 	.probe = driver_probe,
>> 	.remove = driver_remove,
> 
> That's exactly the pattern I'm trying to avoid, because imo your tiny
> driver _also_ should do this. Or we realize that all the current drivers
> doing drm_atomic_helper_shutdown are misguided, but I'm not really
> understanding why.
> 

I don't know where my head has been, the pipeline on tinydrm isn't
disabled on driver module unload. I've so preoccupied with ensuring that
device removal is working that I must have had some kind of tunnel vision.

The problem is that fbdev is restored on lastclose regardless of it
being in use or not.

I tried to address it in this patch:

drm/fb-helper/generic: Only restore when in use
https://patchwork.freedesktop.org/patch/263271/

> Having a devm helper which cannot be used for some drivers due to
> fundamental design issues is kinda not great, because it means everyone
> will still use it, and shrug the bugs off as "not my problem". Which is
> what's happening right now with all the devm_kzalloc we have in drm
> drivers that all the get lifetim wrong. But because devm_ is convenient
> everyone uses it, so the driver unload of most drivers is full of bugs.

Yes, it's starting to look like this devm idea isn't such a great thing
after all.

And it looks like dropping DRM devm can still give a slim driver:

static int driver_probe(struct device *dev)
{
	struct drm_device *drm;
	int ret;

	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
	if (!drm)
		return -ENOMEM;

	ret = drm_dev_init(dev, drm, &drm_driver);
	if (ret) {
		kfree(drm);
		return ret;
	}

	drm_mode_config_init(drm);

	/*
	 * Aquire various resources, all managed by devres
	 * or being released in drm_driver->release.
	 * goto err_put on failure
	 */

	drm_mode_config_reset(drm);

	ret = drm_dev_register(drm);
	if (ret)
		goto err_put;

	dev_set_drvdata(dev, drm);

	return 0;

err_put:
	drm_dev_put(dev);

	return ret
}

static int driver_remove(struct device *dev)
{
	struct drm_device *drm = dev_get_drvdata(dev);

	drm_atomic_helper_shutdown(drm)
	drm_dev_unplug(drm);

	return 0;
}

Noralf.
On Tue, Jan 29, 2019 at 05:50:05PM +0100, Daniel Vetter wrote:
> On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> > 
> > 
> > Den 24.01.2019 18.57, skrev Daniel Vetter:
> > > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >>
> > >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> > >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> > >>>>
> > >>>>
> > >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> > >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> > >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> > >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> > >>>>>>>>>>> drm_dev_register().
> > >>>>>>>>>>>
> > >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> > >>>>>>>>>>> fbdev emulation as well.
> > >>>>>>>>>>>
> > >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> <snip>
> > >>>>>>>>
> > >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> > >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> > >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> > >>>>>>>>>>> @@ -36,6 +36,7 @@
> > >>>>>>>>>>>
> > >>>>>>>>>>>  #include <drm/drm_client.h>
> > >>>>>>>>>>>  #include <drm/drm_drv.h>
> > >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> > >>>>>>>>>>>  #include <drm/drmP.h>
> > >>>>>>>>>>>
> > >>>>>>>>>>>  #include "drm_crtc_internal.h"
> > >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> > >>>>>>>>>>>  }
> > >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> > >>>>>>>>>>>
> > >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> > >>>>>>>>>>> +{
> > >>>>>>>>>>> + drm_dev_put(data);
> > >>>>>>>>>
> > >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> > >>>>>>>>
> > >>>>>>>> This function is only used to cover the error path if probe fails before
> > >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> > >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> > >>>>>>>
> > >>>>>>> I think I get a prize for being ignorant and blind :-/
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>> +}
> > >>>>>>>>>>> +
> > >>>>>>>>>>> +/**
> > >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> > >>>>>>>>>>> + * @parent: Parent device object
> > >>>>>>>>>>> + * @dev: DRM device
> > >>>>>>>>>>> + * @driver: DRM driver
> > >>>>>>>>>>> + *
> > >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> > >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> > >>>>>>>>>
> > >>>>>>>>> I think a bit more clarity here would be good:
> > >>>>>>>>>
> > >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> > >>>>>>>>>
> > >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> > >>>>>>>>>
> > >>>>>>>>> I think a loud warning for these is in order:
> > >>>>>>>>>
> > >>>>>>>>> "WARNING:
> > >>>>>>>>>
> > >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> > >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> > >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> > >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> > >>>>>>>>> refcount through drm_dev_put().
> > >>>>>>>>>
> > >>>>>>>>> "All other drm structures must still be explicitly released in the
> > >>>>>>>>> &drm_driver.release callback."
> > >>>>>>>>>
> > >>>>>>>>> While thinking about this I just realized that with this design we have no
> > >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> > >>>>>>>>> place to call it:
> > >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> > >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> > >>>>>>>>>   down anything.
> > >>>>>>>>> - drm_driver.release is way too late.
> > >>>>>>>>>
> > >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> > >>>>>>>>> happens automatically and in the right order.
> > >>>>>>>>>
> > >>>>>>>>> So not sure what to do here really.
> > >>>>>>>>
> > >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> > >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> > >>>>>>>> helper instead?)
> > >>>>>>>
> > >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> > >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> > >>>>>>> and in the right order. So:
> > >>>>>>>
> > >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> > >>>>>>>
> > >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> > >>>>>>>
> > >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> > >>>>>>>
> > >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> > >>>>>>>   can call drm_dev_unplug() unconditionally).
> > >>>>>>>
> > >>>>>>
> > >>>>>> Beautiful! I really like this, it's very flexible.
> > >>>>>>
> > >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> > >>>>>> atomic helper...
> > >>>>>
> > >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> > >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> > >>>>> normal cleanup I think it'd be good to have it all together in one
> > >>>>> place. And perhaps even a code example in the DOC: overview.
> > >>>>>
> > >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> > >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> > >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> > >>>>>>> drm_mode_config_reset would be:
> > >>>>>>>
> > >>>>>>> {
> > >>>>>>>       if (drm_dev_enter())
> > >>>>>>>               return;
> > >>>>>>>
> > >>>>>>>       drm_atomic_helper_shutdown();
> > >>>>>>>
> > >>>>>>>       drm_dev_exit();
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>
> > >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> > >>>>>> registered or not, it doesn't say anything about the state of the parent
> > >>>>>> device.
> > >>>>>>
> > >>>>>> All we know is that the device is being unbound from the driver, we
> > >>>>>> don't know if it's the device that's being removed or if it's the driver
> > >>>>>> that's unregistered.
> > >>>>>
> > >>>>> You're right, both paths will have called drm_dev_unplug by then.
> > >>>>> Silly me. I really liked my idea :-)
> > >>>>>
> > >>>>>> I have looked at the various call chains:
> > >>>>>>
> > >>>>>> driver_unregister ->
> > >>>>>>     bus_remove_driver ->
> > >>>>>>         driver_detach ->
> > >>>>>>             device_release_driver_internal
> > >>>>>>
> > >>>>>> device_unregister ->
> > >>>>>>     device_del ->
> > >>>>>>         bus_remove_device ->
> > >>>>>>             device_release_driver ->
> > >>>>>>                 device_release_driver_internal
> > >>>>>>
> > >>>>>> sysfs: unbind_store ->
> > >>>>>>     device_release_driver ->
> > >>>>>>         device_release_driver_internal
> > >>>>>>
> > >>>>>> The only way I've found to differentiate between these in a cleanup
> > >>>>>> action is that device_del() uses the bus notifier to signal
> > >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> > >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> > >>>>>
> > >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> > >>>>> way to do this, but best to prototype a patch with this, send it to
> > >>>>> him and ask how to :-)
> > >>>>>
> > >>>>
> > >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> > >>>> need to touch hw after DRM unregister.
> > >>>>
> > >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> > >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> > >>>>>> world :-)
> > >>>>>
> > >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> > >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> > >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> > >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> > >>>>>
> > >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> > >>>>> apparently the cmdline tool you need, never tried it, I only switch
> > >>>>> the kernel's console between fbcon and dummycon and back, not what
> > >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> > >>>>> And maybe you have a better idea how to deal with this all.
> > >>>>>
> > >>>>> Note also that there's been proposals floating around to only close an
> > >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> > >>>>> does), with that closing userspace would not necessarily lead to a
> > >>>>> full cleanup.
> > >>>>>
> > >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > >>>>> is if you have the display on, but no planes showing (i.e. all black).
> > >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> > >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> > >>>>> even if fbcon/fbdev isn't even enabled.
> > >>>>>
> > >>>>
> > >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> > >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> > >>>> emulation also releases everything.
> > >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> > >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> > >>>
> > >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> > >>>
> > >>> Super short summary: We want to start using devm actions to clean up drm
> > >>> drivers. Here's the problem:
> > >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> > >>>   up the hardware and shut it all down.
> > >>
> > >> Then do it on probe/disconnect.
> > >>
> > >>> - But if the device is unplugged already, that's probably not the best
> > >>>   idea, and we only want to clean up the kernel's resources/allocations.
> > >>
> > >> Again, probe/disconnect will be called either way.
> > >>
> > >> But as you note, memory will NOT be freed by the devm stuff if you
> > >> manually unbind a driver from a device.
> > >>
> > >> So don't touch hardware there, it's not going to work :)
> > >>
> > >>> What's the recommendation here? I see a few options:
> > >>>
> > >>> - Make sure everything can deal with this properly. Hotunplug can happen
> > >>>   anytime, so there's a race no matter what.
> > >>
> > >> Yes.
> > >>
> > >>> - Check with the device model whether the struct device is disappearing or
> > >>>   whether we're just dealing with a driver unbind (no idea how to do
> > >>>   that), and act accordingly.
> > >>
> > >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> > >> Assuming your hardware is still present :)
> > >>
> > >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> > >>>   not, then the pretty nifty plan laid out in this thread wont work.
> > >>
> > >> Nope, that's not going to work, the device could either be long gone, or
> > >> you will not be called due to unbind happening from userspace.
> > >>
> > >> But really, unbind from userspace is very very rare, it's a developer
> > >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> > >> :)
> > >>
> > >>> - Something completely different?
> > >>
> > >> Do it in disconnect :)
> > > 
> > > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > > should be the following sequence:
> > > 
> > > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > > make sure all the pending ioctls are done so that from now on
> > > userspace sees lots of -EIO (in case it still has fd open, which is
> > > going to be the normal for hotunplug.
> > > 
> > > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > > but needs to be able to cope with sudden hotunplug on top anyway).
> > > 
> > > 3. Clean up the kernel mess and release everything.
> > > 
> > > Probe is exactly the other way round, so would perfectly fit into the
> > > devm onion cleanup. See in the commented earlier replies above how
> > > that would match in details, but tldr; if we have to do 2. in
> > > disconnect, then we also have to do 1. in disconnected, and only doing
> > > 3. through devm is almost not worth the bother. But if we could do all
> > > three through devm then simple drivers wouldn't even need any
> > > disconnect/unbind callback at all. That's our motivation for trying to
> > > come up with an answer that's not "do it in disconnect". "do it in
> > > disconnect" is how we do it all today already.
> > > 
> > > Yes we're trying to make tiny drivers even smaller, we have enough
> > > nowadays that this stuff would be worth it :-)
> > > 
> > 
> > I think a solution is to say that drivers that want to touch hw on
> > disconnect needs to use device_driver->remove to do that.
> > 
> > This is an example driver that doesn't need to touch hw because it's so
> > simple that userspace has disabled the pipeline:
> > 
> > static void drm_driver_release(struct drm_device *drm)
> > {
> > 	drm_mode_config_cleanup(drm);
> > 	drm_dev_fini(drm);
> > 	kfree(drm);
> > }
> > 
> > static struct drm_driver drm_driver = {
> > 	.release = drm_driver_release,
> > 	/* ... */
> > };
> > 
> > static int driver_probe(struct device *dev)
> > {
> > 	struct drm_device *drm;
> > 	int ret;
> > 
> > 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > 	if (!drm)
> > 		return -ENOMEM;
> > 
> > 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > 	if (ret) {
> > 		kfree(drm);
> > 		return ret;
> > 	}
> > 
> > 	drm_mode_config_init(drm);
> > 
> > 	/* Aquire various resources, all managed by devres */
> > 
> > 	drm_mode_config_reset(drm);
> > 
> > 	return devm_drm_dev_register(drm);
> > }
> > 
> > struct device_driver driver = {
> > 	.probe = driver_probe,
> > };
> > 
> > 
> > A driver that wants to touch hardware on disconnect, can look like this:
> > 
> > static void drm_driver_release(struct drm_device *drm)
> > {
> > 	drm_mode_config_cleanup(drm);
> > 	drm_dev_fini(drm);
> > 	kfree(drm);
> > }
> > 
> > static struct drm_driver drm_driver = {
> > 	.release = drm_driver_release,
> > 	/* ... */
> > };
> > 
> > static int driver_probe(struct device *dev)
> > {
> > 	struct drm_device *drm;
> > 	int ret;
> > 
> > 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > 	if (!drm)
> > 		return -ENOMEM;
> > 
> > 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > 	if (ret) {
> > 		kfree(drm);
> > 		return ret;
> > 	}
> > 
> > 	drm_mode_config_init(drm);
> > 
> > 	/* Aquire various resources, all managed by devres */
> > 
> > 	drm_mode_config_reset(drm);
> > 
> > 	ret = drm_dev_register(drm);
> > 	if (ret)
> > 		return ret;
> > 
> > 	drm_dev_get(dev); /* If using drm_dev_unplug() */
> > 
> > 	dev_set_drvdata(dev, drm);
> > 
> > 	return 0;
> > }
> > 
> > /* This function is called before devres_release_all() */
> > static int driver_remove(struct device *dev)
> > {
> > 	struct drm_device *drm = dev_get_drvdata(dev);
> > 
> > 	drm_dev_unplug(drm); OR drm_dev_unregister(drm);
> > 	drm_atomic_helper_shutdown(drm)
> > 
> > 	return 0;
> > }
> > 
> > struct device_driver driver = {
> > 	.probe = driver_probe,
> > 	.remove = driver_remove,
> 
> That's exactly the pattern I'm trying to avoid, because imo your tiny
> driver _also_ should do this. Or we realize that all the current drivers
> doing drm_atomic_helper_shutdown are misguided, but I'm not really
> understanding why.
> 
> Having a devm helper which cannot be used for some drivers due to
> fundamental design issues is kinda not great, because it means everyone
> will still use it, and shrug the bugs off as "not my problem". Which is
> what's happening right now with all the devm_kzalloc we have in drm
> drivers that all the get lifetim wrong. But because devm_ is convenient
> everyone uses it, so the driver unload of most drivers is full of bugs.

driver "unload" should not be full of bugs, how would it?  Anything
created with devm_() will just be freed when the device really goes away
in the system, it shouldn't call back into the driver.

And yes, devm_() is a crutch, one that I really don't like, but I
understand the apeal.  And in 95% of the cases, it can work just fine as
no one ever does unbind/bind from userspace, and if they did, they would
never notice the memory issues.

sorry,

greg k-h
On Tue, Jan 29, 2019 at 6:36 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jan 29, 2019 at 05:50:05PM +0100, Daniel Vetter wrote:
> > On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> > >
> > >
> > > Den 24.01.2019 18.57, skrev Daniel Vetter:
> > > > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > >>
> > > >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> > > >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> > > >>>>
> > > >>>>
> > > >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> > > >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > > >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > > >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> > > >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> > > >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> > > >>>>>>>>>>> drm_dev_register().
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> > > >>>>>>>>>>> fbdev emulation as well.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> <snip>
> > > >>>>>>>>
> > > >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> > > >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> > > >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> > > >>>>>>>>>>> @@ -36,6 +36,7 @@
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>  #include <drm/drm_client.h>
> > > >>>>>>>>>>>  #include <drm/drm_drv.h>
> > > >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> > > >>>>>>>>>>>  #include <drm/drmP.h>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>  #include "drm_crtc_internal.h"
> > > >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> > > >>>>>>>>>>>  }
> > > >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> > > >>>>>>>>>>> +{
> > > >>>>>>>>>>> + drm_dev_put(data);
> > > >>>>>>>>>
> > > >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> > > >>>>>>>>
> > > >>>>>>>> This function is only used to cover the error path if probe fails before
> > > >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> > > >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> > > >>>>>>>
> > > >>>>>>> I think I get a prize for being ignorant and blind :-/
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>> +}
> > > >>>>>>>>>>> +
> > > >>>>>>>>>>> +/**
> > > >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> > > >>>>>>>>>>> + * @parent: Parent device object
> > > >>>>>>>>>>> + * @dev: DRM device
> > > >>>>>>>>>>> + * @driver: DRM driver
> > > >>>>>>>>>>> + *
> > > >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> > > >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> > > >>>>>>>>>
> > > >>>>>>>>> I think a bit more clarity here would be good:
> > > >>>>>>>>>
> > > >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> > > >>>>>>>>>
> > > >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> > > >>>>>>>>>
> > > >>>>>>>>> I think a loud warning for these is in order:
> > > >>>>>>>>>
> > > >>>>>>>>> "WARNING:
> > > >>>>>>>>>
> > > >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> > > >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> > > >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> > > >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> > > >>>>>>>>> refcount through drm_dev_put().
> > > >>>>>>>>>
> > > >>>>>>>>> "All other drm structures must still be explicitly released in the
> > > >>>>>>>>> &drm_driver.release callback."
> > > >>>>>>>>>
> > > >>>>>>>>> While thinking about this I just realized that with this design we have no
> > > >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > > >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> > > >>>>>>>>> place to call it:
> > > >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> > > >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> > > >>>>>>>>>   down anything.
> > > >>>>>>>>> - drm_driver.release is way too late.
> > > >>>>>>>>>
> > > >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > > >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> > > >>>>>>>>> happens automatically and in the right order.
> > > >>>>>>>>>
> > > >>>>>>>>> So not sure what to do here really.
> > > >>>>>>>>
> > > >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> > > >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> > > >>>>>>>> helper instead?)
> > > >>>>>>>
> > > >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> > > >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> > > >>>>>>> and in the right order. So:
> > > >>>>>>>
> > > >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> > > >>>>>>>
> > > >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> > > >>>>>>>
> > > >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> > > >>>>>>>
> > > >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> > > >>>>>>>   can call drm_dev_unplug() unconditionally).
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> Beautiful! I really like this, it's very flexible.
> > > >>>>>>
> > > >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> > > >>>>>> atomic helper...
> > > >>>>>
> > > >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> > > >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> > > >>>>> normal cleanup I think it'd be good to have it all together in one
> > > >>>>> place. And perhaps even a code example in the DOC: overview.
> > > >>>>>
> > > >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> > > >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> > > >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> > > >>>>>>> drm_mode_config_reset would be:
> > > >>>>>>>
> > > >>>>>>> {
> > > >>>>>>>       if (drm_dev_enter())
> > > >>>>>>>               return;
> > > >>>>>>>
> > > >>>>>>>       drm_atomic_helper_shutdown();
> > > >>>>>>>
> > > >>>>>>>       drm_dev_exit();
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> > > >>>>>> registered or not, it doesn't say anything about the state of the parent
> > > >>>>>> device.
> > > >>>>>>
> > > >>>>>> All we know is that the device is being unbound from the driver, we
> > > >>>>>> don't know if it's the device that's being removed or if it's the driver
> > > >>>>>> that's unregistered.
> > > >>>>>
> > > >>>>> You're right, both paths will have called drm_dev_unplug by then.
> > > >>>>> Silly me. I really liked my idea :-)
> > > >>>>>
> > > >>>>>> I have looked at the various call chains:
> > > >>>>>>
> > > >>>>>> driver_unregister ->
> > > >>>>>>     bus_remove_driver ->
> > > >>>>>>         driver_detach ->
> > > >>>>>>             device_release_driver_internal
> > > >>>>>>
> > > >>>>>> device_unregister ->
> > > >>>>>>     device_del ->
> > > >>>>>>         bus_remove_device ->
> > > >>>>>>             device_release_driver ->
> > > >>>>>>                 device_release_driver_internal
> > > >>>>>>
> > > >>>>>> sysfs: unbind_store ->
> > > >>>>>>     device_release_driver ->
> > > >>>>>>         device_release_driver_internal
> > > >>>>>>
> > > >>>>>> The only way I've found to differentiate between these in a cleanup
> > > >>>>>> action is that device_del() uses the bus notifier to signal
> > > >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> > > >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> > > >>>>>
> > > >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> > > >>>>> way to do this, but best to prototype a patch with this, send it to
> > > >>>>> him and ask how to :-)
> > > >>>>>
> > > >>>>
> > > >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> > > >>>> need to touch hw after DRM unregister.
> > > >>>>
> > > >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> > > >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> > > >>>>>> world :-)
> > > >>>>>
> > > >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > > >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> > > >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> > > >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> > > >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> > > >>>>>
> > > >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> > > >>>>> apparently the cmdline tool you need, never tried it, I only switch
> > > >>>>> the kernel's console between fbcon and dummycon and back, not what
> > > >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> > > >>>>> And maybe you have a better idea how to deal with this all.
> > > >>>>>
> > > >>>>> Note also that there's been proposals floating around to only close an
> > > >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> > > >>>>> does), with that closing userspace would not necessarily lead to a
> > > >>>>> full cleanup.
> > > >>>>>
> > > >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > > >>>>> is if you have the display on, but no planes showing (i.e. all black).
> > > >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> > > >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> > > >>>>> even if fbcon/fbdev isn't even enabled.
> > > >>>>>
> > > >>>>
> > > >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> > > >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> > > >>>> emulation also releases everything.
> > > >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> > > >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> > > >>>
> > > >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> > > >>>
> > > >>> Super short summary: We want to start using devm actions to clean up drm
> > > >>> drivers. Here's the problem:
> > > >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> > > >>>   up the hardware and shut it all down.
> > > >>
> > > >> Then do it on probe/disconnect.
> > > >>
> > > >>> - But if the device is unplugged already, that's probably not the best
> > > >>>   idea, and we only want to clean up the kernel's resources/allocations.
> > > >>
> > > >> Again, probe/disconnect will be called either way.
> > > >>
> > > >> But as you note, memory will NOT be freed by the devm stuff if you
> > > >> manually unbind a driver from a device.
> > > >>
> > > >> So don't touch hardware there, it's not going to work :)
> > > >>
> > > >>> What's the recommendation here? I see a few options:
> > > >>>
> > > >>> - Make sure everything can deal with this properly. Hotunplug can happen
> > > >>>   anytime, so there's a race no matter what.
> > > >>
> > > >> Yes.
> > > >>
> > > >>> - Check with the device model whether the struct device is disappearing or
> > > >>>   whether we're just dealing with a driver unbind (no idea how to do
> > > >>>   that), and act accordingly.
> > > >>
> > > >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> > > >> Assuming your hardware is still present :)
> > > >>
> > > >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> > > >>>   not, then the pretty nifty plan laid out in this thread wont work.
> > > >>
> > > >> Nope, that's not going to work, the device could either be long gone, or
> > > >> you will not be called due to unbind happening from userspace.
> > > >>
> > > >> But really, unbind from userspace is very very rare, it's a developer
> > > >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> > > >> :)
> > > >>
> > > >>> - Something completely different?
> > > >>
> > > >> Do it in disconnect :)
> > > >
> > > > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > > > should be the following sequence:
> > > >
> > > > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > > > make sure all the pending ioctls are done so that from now on
> > > > userspace sees lots of -EIO (in case it still has fd open, which is
> > > > going to be the normal for hotunplug.
> > > >
> > > > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > > > but needs to be able to cope with sudden hotunplug on top anyway).
> > > >
> > > > 3. Clean up the kernel mess and release everything.
> > > >
> > > > Probe is exactly the other way round, so would perfectly fit into the
> > > > devm onion cleanup. See in the commented earlier replies above how
> > > > that would match in details, but tldr; if we have to do 2. in
> > > > disconnect, then we also have to do 1. in disconnected, and only doing
> > > > 3. through devm is almost not worth the bother. But if we could do all
> > > > three through devm then simple drivers wouldn't even need any
> > > > disconnect/unbind callback at all. That's our motivation for trying to
> > > > come up with an answer that's not "do it in disconnect". "do it in
> > > > disconnect" is how we do it all today already.
> > > >
> > > > Yes we're trying to make tiny drivers even smaller, we have enough
> > > > nowadays that this stuff would be worth it :-)
> > > >
> > >
> > > I think a solution is to say that drivers that want to touch hw on
> > > disconnect needs to use device_driver->remove to do that.
> > >
> > > This is an example driver that doesn't need to touch hw because it's so
> > > simple that userspace has disabled the pipeline:
> > >
> > > static void drm_driver_release(struct drm_device *drm)
> > > {
> > >     drm_mode_config_cleanup(drm);
> > >     drm_dev_fini(drm);
> > >     kfree(drm);
> > > }
> > >
> > > static struct drm_driver drm_driver = {
> > >     .release = drm_driver_release,
> > >     /* ... */
> > > };
> > >
> > > static int driver_probe(struct device *dev)
> > > {
> > >     struct drm_device *drm;
> > >     int ret;
> > >
> > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > >     if (!drm)
> > >             return -ENOMEM;
> > >
> > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > >     if (ret) {
> > >             kfree(drm);
> > >             return ret;
> > >     }
> > >
> > >     drm_mode_config_init(drm);
> > >
> > >     /* Aquire various resources, all managed by devres */
> > >
> > >     drm_mode_config_reset(drm);
> > >
> > >     return devm_drm_dev_register(drm);
> > > }
> > >
> > > struct device_driver driver = {
> > >     .probe = driver_probe,
> > > };
> > >
> > >
> > > A driver that wants to touch hardware on disconnect, can look like this:
> > >
> > > static void drm_driver_release(struct drm_device *drm)
> > > {
> > >     drm_mode_config_cleanup(drm);
> > >     drm_dev_fini(drm);
> > >     kfree(drm);
> > > }
> > >
> > > static struct drm_driver drm_driver = {
> > >     .release = drm_driver_release,
> > >     /* ... */
> > > };
> > >
> > > static int driver_probe(struct device *dev)
> > > {
> > >     struct drm_device *drm;
> > >     int ret;
> > >
> > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > >     if (!drm)
> > >             return -ENOMEM;
> > >
> > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > >     if (ret) {
> > >             kfree(drm);
> > >             return ret;
> > >     }
> > >
> > >     drm_mode_config_init(drm);
> > >
> > >     /* Aquire various resources, all managed by devres */
> > >
> > >     drm_mode_config_reset(drm);
> > >
> > >     ret = drm_dev_register(drm);
> > >     if (ret)
> > >             return ret;
> > >
> > >     drm_dev_get(dev); /* If using drm_dev_unplug() */
> > >
> > >     dev_set_drvdata(dev, drm);
> > >
> > >     return 0;
> > > }
> > >
> > > /* This function is called before devres_release_all() */
> > > static int driver_remove(struct device *dev)
> > > {
> > >     struct drm_device *drm = dev_get_drvdata(dev);
> > >
> > >     drm_dev_unplug(drm); OR drm_dev_unregister(drm);
> > >     drm_atomic_helper_shutdown(drm)
> > >
> > >     return 0;
> > > }
> > >
> > > struct device_driver driver = {
> > >     .probe = driver_probe,
> > >     .remove = driver_remove,
> >
> > That's exactly the pattern I'm trying to avoid, because imo your tiny
> > driver _also_ should do this. Or we realize that all the current drivers
> > doing drm_atomic_helper_shutdown are misguided, but I'm not really
> > understanding why.
> >
> > Having a devm helper which cannot be used for some drivers due to
> > fundamental design issues is kinda not great, because it means everyone
> > will still use it, and shrug the bugs off as "not my problem". Which is
> > what's happening right now with all the devm_kzalloc we have in drm
> > drivers that all the get lifetim wrong. But because devm_ is convenient
> > everyone uses it, so the driver unload of most drivers is full of bugs.
>
> driver "unload" should not be full of bugs, how would it?  Anything
> created with devm_() will just be freed when the device really goes away
> in the system, it shouldn't call back into the driver.

The problem is when drivers use devm_ not to allocate hw resources and
related things, but structures for objects with other lifetimes. Like
open file descriptors shared with the world.

> And yes, devm_() is a crutch, one that I really don't like, but I
> understand the apeal.  And in 95% of the cases, it can work just fine as
> no one ever does unbind/bind from userspace, and if they did, they would
> never notice the memory issues.

Yeah, additionally people tend to not hotunplug gpus. So even there we
tend to get away with piles of bugs. And I'm kinda ok with piles of
bugs in driver unload code, as long as it's just driver unload code
and mostly gets the job done (i.e. doesn't blow up when a developer
first kills X and then unloads their driver).

But for adding new infrastructure in drm core/helpers that's a
different beast. I think we could pull something off, but not quite
there yet with the solution. Maybe we need something similar to devm,
but for drm uapi objects and stuff, adapted to our needs.

Cheers, Daniel
On Tue, Jan 29, 2019 at 07:10:55PM +0100, Daniel Vetter wrote:
> On Tue, Jan 29, 2019 at 6:36 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Jan 29, 2019 at 05:50:05PM +0100, Daniel Vetter wrote:
> > > On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> > > >
> > > >
> > > > Den 24.01.2019 18.57, skrev Daniel Vetter:
> > > > > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > >>
> > > > >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> > > > >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> > > > >>>>
> > > > >>>>
> > > > >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> > > > >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > > > >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > > > >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> > > > >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> > > > >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> > > > >>>>>>>>>>> drm_dev_register().
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> > > > >>>>>>>>>>> fbdev emulation as well.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> <snip>
> > > > >>>>>>>>
> > > > >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> > > > >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> > > > >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> > > > >>>>>>>>>>> @@ -36,6 +36,7 @@
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>  #include <drm/drm_client.h>
> > > > >>>>>>>>>>>  #include <drm/drm_drv.h>
> > > > >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> > > > >>>>>>>>>>>  #include <drm/drmP.h>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>  #include "drm_crtc_internal.h"
> > > > >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> > > > >>>>>>>>>>>  }
> > > > >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> > > > >>>>>>>>>>> +{
> > > > >>>>>>>>>>> + drm_dev_put(data);
> > > > >>>>>>>>>
> > > > >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> > > > >>>>>>>>
> > > > >>>>>>>> This function is only used to cover the error path if probe fails before
> > > > >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> > > > >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> > > > >>>>>>>
> > > > >>>>>>> I think I get a prize for being ignorant and blind :-/
> > > > >>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>>> +}
> > > > >>>>>>>>>>> +
> > > > >>>>>>>>>>> +/**
> > > > >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> > > > >>>>>>>>>>> + * @parent: Parent device object
> > > > >>>>>>>>>>> + * @dev: DRM device
> > > > >>>>>>>>>>> + * @driver: DRM driver
> > > > >>>>>>>>>>> + *
> > > > >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> > > > >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> > > > >>>>>>>>>
> > > > >>>>>>>>> I think a bit more clarity here would be good:
> > > > >>>>>>>>>
> > > > >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> > > > >>>>>>>>>
> > > > >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> > > > >>>>>>>>>
> > > > >>>>>>>>> I think a loud warning for these is in order:
> > > > >>>>>>>>>
> > > > >>>>>>>>> "WARNING:
> > > > >>>>>>>>>
> > > > >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> > > > >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> > > > >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> > > > >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> > > > >>>>>>>>> refcount through drm_dev_put().
> > > > >>>>>>>>>
> > > > >>>>>>>>> "All other drm structures must still be explicitly released in the
> > > > >>>>>>>>> &drm_driver.release callback."
> > > > >>>>>>>>>
> > > > >>>>>>>>> While thinking about this I just realized that with this design we have no
> > > > >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > > > >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> > > > >>>>>>>>> place to call it:
> > > > >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> > > > >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> > > > >>>>>>>>>   down anything.
> > > > >>>>>>>>> - drm_driver.release is way too late.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > > > >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> > > > >>>>>>>>> happens automatically and in the right order.
> > > > >>>>>>>>>
> > > > >>>>>>>>> So not sure what to do here really.
> > > > >>>>>>>>
> > > > >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> > > > >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> > > > >>>>>>>> helper instead?)
> > > > >>>>>>>
> > > > >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> > > > >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> > > > >>>>>>> and in the right order. So:
> > > > >>>>>>>
> > > > >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> > > > >>>>>>>
> > > > >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> > > > >>>>>>>
> > > > >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> > > > >>>>>>>
> > > > >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> > > > >>>>>>>   can call drm_dev_unplug() unconditionally).
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>> Beautiful! I really like this, it's very flexible.
> > > > >>>>>>
> > > > >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> > > > >>>>>> atomic helper...
> > > > >>>>>
> > > > >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> > > > >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> > > > >>>>> normal cleanup I think it'd be good to have it all together in one
> > > > >>>>> place. And perhaps even a code example in the DOC: overview.
> > > > >>>>>
> > > > >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> > > > >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> > > > >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> > > > >>>>>>> drm_mode_config_reset would be:
> > > > >>>>>>>
> > > > >>>>>>> {
> > > > >>>>>>>       if (drm_dev_enter())
> > > > >>>>>>>               return;
> > > > >>>>>>>
> > > > >>>>>>>       drm_atomic_helper_shutdown();
> > > > >>>>>>>
> > > > >>>>>>>       drm_dev_exit();
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> > > > >>>>>> registered or not, it doesn't say anything about the state of the parent
> > > > >>>>>> device.
> > > > >>>>>>
> > > > >>>>>> All we know is that the device is being unbound from the driver, we
> > > > >>>>>> don't know if it's the device that's being removed or if it's the driver
> > > > >>>>>> that's unregistered.
> > > > >>>>>
> > > > >>>>> You're right, both paths will have called drm_dev_unplug by then.
> > > > >>>>> Silly me. I really liked my idea :-)
> > > > >>>>>
> > > > >>>>>> I have looked at the various call chains:
> > > > >>>>>>
> > > > >>>>>> driver_unregister ->
> > > > >>>>>>     bus_remove_driver ->
> > > > >>>>>>         driver_detach ->
> > > > >>>>>>             device_release_driver_internal
> > > > >>>>>>
> > > > >>>>>> device_unregister ->
> > > > >>>>>>     device_del ->
> > > > >>>>>>         bus_remove_device ->
> > > > >>>>>>             device_release_driver ->
> > > > >>>>>>                 device_release_driver_internal
> > > > >>>>>>
> > > > >>>>>> sysfs: unbind_store ->
> > > > >>>>>>     device_release_driver ->
> > > > >>>>>>         device_release_driver_internal
> > > > >>>>>>
> > > > >>>>>> The only way I've found to differentiate between these in a cleanup
> > > > >>>>>> action is that device_del() uses the bus notifier to signal
> > > > >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> > > > >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> > > > >>>>>
> > > > >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> > > > >>>>> way to do this, but best to prototype a patch with this, send it to
> > > > >>>>> him and ask how to :-)
> > > > >>>>>
> > > > >>>>
> > > > >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> > > > >>>> need to touch hw after DRM unregister.
> > > > >>>>
> > > > >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> > > > >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> > > > >>>>>> world :-)
> > > > >>>>>
> > > > >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > > > >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> > > > >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> > > > >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> > > > >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> > > > >>>>>
> > > > >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> > > > >>>>> apparently the cmdline tool you need, never tried it, I only switch
> > > > >>>>> the kernel's console between fbcon and dummycon and back, not what
> > > > >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> > > > >>>>> And maybe you have a better idea how to deal with this all.
> > > > >>>>>
> > > > >>>>> Note also that there's been proposals floating around to only close an
> > > > >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> > > > >>>>> does), with that closing userspace would not necessarily lead to a
> > > > >>>>> full cleanup.
> > > > >>>>>
> > > > >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > > > >>>>> is if you have the display on, but no planes showing (i.e. all black).
> > > > >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> > > > >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> > > > >>>>> even if fbcon/fbdev isn't even enabled.
> > > > >>>>>
> > > > >>>>
> > > > >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> > > > >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> > > > >>>> emulation also releases everything.
> > > > >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> > > > >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> > > > >>>
> > > > >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> > > > >>>
> > > > >>> Super short summary: We want to start using devm actions to clean up drm
> > > > >>> drivers. Here's the problem:
> > > > >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> > > > >>>   up the hardware and shut it all down.
> > > > >>
> > > > >> Then do it on probe/disconnect.
> > > > >>
> > > > >>> - But if the device is unplugged already, that's probably not the best
> > > > >>>   idea, and we only want to clean up the kernel's resources/allocations.
> > > > >>
> > > > >> Again, probe/disconnect will be called either way.
> > > > >>
> > > > >> But as you note, memory will NOT be freed by the devm stuff if you
> > > > >> manually unbind a driver from a device.
> > > > >>
> > > > >> So don't touch hardware there, it's not going to work :)
> > > > >>
> > > > >>> What's the recommendation here? I see a few options:
> > > > >>>
> > > > >>> - Make sure everything can deal with this properly. Hotunplug can happen
> > > > >>>   anytime, so there's a race no matter what.
> > > > >>
> > > > >> Yes.
> > > > >>
> > > > >>> - Check with the device model whether the struct device is disappearing or
> > > > >>>   whether we're just dealing with a driver unbind (no idea how to do
> > > > >>>   that), and act accordingly.
> > > > >>
> > > > >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> > > > >> Assuming your hardware is still present :)
> > > > >>
> > > > >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> > > > >>>   not, then the pretty nifty plan laid out in this thread wont work.
> > > > >>
> > > > >> Nope, that's not going to work, the device could either be long gone, or
> > > > >> you will not be called due to unbind happening from userspace.
> > > > >>
> > > > >> But really, unbind from userspace is very very rare, it's a developer
> > > > >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> > > > >> :)
> > > > >>
> > > > >>> - Something completely different?
> > > > >>
> > > > >> Do it in disconnect :)
> > > > >
> > > > > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > > > > should be the following sequence:
> > > > >
> > > > > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > > > > make sure all the pending ioctls are done so that from now on
> > > > > userspace sees lots of -EIO (in case it still has fd open, which is
> > > > > going to be the normal for hotunplug.
> > > > >
> > > > > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > > > > but needs to be able to cope with sudden hotunplug on top anyway).
> > > > >
> > > > > 3. Clean up the kernel mess and release everything.
> > > > >
> > > > > Probe is exactly the other way round, so would perfectly fit into the
> > > > > devm onion cleanup. See in the commented earlier replies above how
> > > > > that would match in details, but tldr; if we have to do 2. in
> > > > > disconnect, then we also have to do 1. in disconnected, and only doing
> > > > > 3. through devm is almost not worth the bother. But if we could do all
> > > > > three through devm then simple drivers wouldn't even need any
> > > > > disconnect/unbind callback at all. That's our motivation for trying to
> > > > > come up with an answer that's not "do it in disconnect". "do it in
> > > > > disconnect" is how we do it all today already.
> > > > >
> > > > > Yes we're trying to make tiny drivers even smaller, we have enough
> > > > > nowadays that this stuff would be worth it :-)
> > > > >
> > > >
> > > > I think a solution is to say that drivers that want to touch hw on
> > > > disconnect needs to use device_driver->remove to do that.
> > > >
> > > > This is an example driver that doesn't need to touch hw because it's so
> > > > simple that userspace has disabled the pipeline:
> > > >
> > > > static void drm_driver_release(struct drm_device *drm)
> > > > {
> > > >     drm_mode_config_cleanup(drm);
> > > >     drm_dev_fini(drm);
> > > >     kfree(drm);
> > > > }
> > > >
> > > > static struct drm_driver drm_driver = {
> > > >     .release = drm_driver_release,
> > > >     /* ... */
> > > > };
> > > >
> > > > static int driver_probe(struct device *dev)
> > > > {
> > > >     struct drm_device *drm;
> > > >     int ret;
> > > >
> > > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > > >     if (!drm)
> > > >             return -ENOMEM;
> > > >
> > > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > > >     if (ret) {
> > > >             kfree(drm);
> > > >             return ret;
> > > >     }
> > > >
> > > >     drm_mode_config_init(drm);
> > > >
> > > >     /* Aquire various resources, all managed by devres */
> > > >
> > > >     drm_mode_config_reset(drm);
> > > >
> > > >     return devm_drm_dev_register(drm);
> > > > }
> > > >
> > > > struct device_driver driver = {
> > > >     .probe = driver_probe,
> > > > };
> > > >
> > > >
> > > > A driver that wants to touch hardware on disconnect, can look like this:
> > > >
> > > > static void drm_driver_release(struct drm_device *drm)
> > > > {
> > > >     drm_mode_config_cleanup(drm);
> > > >     drm_dev_fini(drm);
> > > >     kfree(drm);
> > > > }
> > > >
> > > > static struct drm_driver drm_driver = {
> > > >     .release = drm_driver_release,
> > > >     /* ... */
> > > > };
> > > >
> > > > static int driver_probe(struct device *dev)
> > > > {
> > > >     struct drm_device *drm;
> > > >     int ret;
> > > >
> > > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > > >     if (!drm)
> > > >             return -ENOMEM;
> > > >
> > > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > > >     if (ret) {
> > > >             kfree(drm);
> > > >             return ret;
> > > >     }
> > > >
> > > >     drm_mode_config_init(drm);
> > > >
> > > >     /* Aquire various resources, all managed by devres */
> > > >
> > > >     drm_mode_config_reset(drm);
> > > >
> > > >     ret = drm_dev_register(drm);
> > > >     if (ret)
> > > >             return ret;
> > > >
> > > >     drm_dev_get(dev); /* If using drm_dev_unplug() */
> > > >
> > > >     dev_set_drvdata(dev, drm);
> > > >
> > > >     return 0;
> > > > }
> > > >
> > > > /* This function is called before devres_release_all() */
> > > > static int driver_remove(struct device *dev)
> > > > {
> > > >     struct drm_device *drm = dev_get_drvdata(dev);
> > > >
> > > >     drm_dev_unplug(drm); OR drm_dev_unregister(drm);
> > > >     drm_atomic_helper_shutdown(drm)
> > > >
> > > >     return 0;
> > > > }
> > > >
> > > > struct device_driver driver = {
> > > >     .probe = driver_probe,
> > > >     .remove = driver_remove,
> > >
> > > That's exactly the pattern I'm trying to avoid, because imo your tiny
> > > driver _also_ should do this. Or we realize that all the current drivers
> > > doing drm_atomic_helper_shutdown are misguided, but I'm not really
> > > understanding why.
> > >
> > > Having a devm helper which cannot be used for some drivers due to
> > > fundamental design issues is kinda not great, because it means everyone
> > > will still use it, and shrug the bugs off as "not my problem". Which is
> > > what's happening right now with all the devm_kzalloc we have in drm
> > > drivers that all the get lifetim wrong. But because devm_ is convenient
> > > everyone uses it, so the driver unload of most drivers is full of bugs.
> >
> > driver "unload" should not be full of bugs, how would it?  Anything
> > created with devm_() will just be freed when the device really goes away
> > in the system, it shouldn't call back into the driver.
> 
> The problem is when drivers use devm_ not to allocate hw resources and
> related things, but structures for objects with other lifetimes. Like
> open file descriptors shared with the world.

And irqs, which bites everyone in the end.  You have to be careful here,
never tie a devm allocation to an object with another reference count,
that's just a bug.

greg k-h
On Tue, Jan 29, 2019 at 8:27 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jan 29, 2019 at 07:10:55PM +0100, Daniel Vetter wrote:
> > On Tue, Jan 29, 2019 at 6:36 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Jan 29, 2019 at 05:50:05PM +0100, Daniel Vetter wrote:
> > > > On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> > > > >
> > > > >
> > > > > Den 24.01.2019 18.57, skrev Daniel Vetter:
> > > > > > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > > > > >>
> > > > > >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> > > > > >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> > > > > >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf@tronnes.org> wrote:
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> > > > > >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> > > > > >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> > > > > >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> > > > > >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> > > > > >>>>>>>>>>> drm_dev_register().
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> > > > > >>>>>>>>>>> fbdev emulation as well.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> <snip>
> > > > > >>>>>>>>
> > > > > >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > > >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> > > > > >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> > > > > >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> > > > > >>>>>>>>>>> @@ -36,6 +36,7 @@
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>  #include <drm/drm_client.h>
> > > > > >>>>>>>>>>>  #include <drm/drm_drv.h>
> > > > > >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> > > > > >>>>>>>>>>>  #include <drm/drmP.h>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>  #include "drm_crtc_internal.h"
> > > > > >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> > > > > >>>>>>>>>>>  }
> > > > > >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> > > > > >>>>>>>>>>> +{
> > > > > >>>>>>>>>>> + drm_dev_put(data);
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> > > > > >>>>>>>>
> > > > > >>>>>>>> This function is only used to cover the error path if probe fails before
> > > > > >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> > > > > >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> > > > > >>>>>>>
> > > > > >>>>>>> I think I get a prize for being ignorant and blind :-/
> > > > > >>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>>> +}
> > > > > >>>>>>>>>>> +
> > > > > >>>>>>>>>>> +/**
> > > > > >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> > > > > >>>>>>>>>>> + * @parent: Parent device object
> > > > > >>>>>>>>>>> + * @dev: DRM device
> > > > > >>>>>>>>>>> + * @driver: DRM driver
> > > > > >>>>>>>>>>> + *
> > > > > >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> > > > > >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I think a bit more clarity here would be good:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I think a loud warning for these is in order:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> "WARNING:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> > > > > >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> > > > > >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> > > > > >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> > > > > >>>>>>>>> refcount through drm_dev_put().
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> "All other drm structures must still be explicitly released in the
> > > > > >>>>>>>>> &drm_driver.release callback."
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> While thinking about this I just realized that with this design we have no
> > > > > >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> > > > > >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> > > > > >>>>>>>>> place to call it:
> > > > > >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> > > > > >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> > > > > >>>>>>>>>   down anything.
> > > > > >>>>>>>>> - drm_driver.release is way too late.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> > > > > >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> > > > > >>>>>>>>> happens automatically and in the right order.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> So not sure what to do here really.
> > > > > >>>>>>>>
> > > > > >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> > > > > >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> > > > > >>>>>>>> helper instead?)
> > > > > >>>>>>>
> > > > > >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> > > > > >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> > > > > >>>>>>> and in the right order. So:
> > > > > >>>>>>>
> > > > > >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> > > > > >>>>>>>
> > > > > >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> > > > > >>>>>>>
> > > > > >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> > > > > >>>>>>>
> > > > > >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> > > > > >>>>>>>   can call drm_dev_unplug() unconditionally).
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>> Beautiful! I really like this, it's very flexible.
> > > > > >>>>>>
> > > > > >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> > > > > >>>>>> atomic helper...
> > > > > >>>>>
> > > > > >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> > > > > >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> > > > > >>>>> normal cleanup I think it'd be good to have it all together in one
> > > > > >>>>> place. And perhaps even a code example in the DOC: overview.
> > > > > >>>>>
> > > > > >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> > > > > >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> > > > > >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> > > > > >>>>>>> drm_mode_config_reset would be:
> > > > > >>>>>>>
> > > > > >>>>>>> {
> > > > > >>>>>>>       if (drm_dev_enter())
> > > > > >>>>>>>               return;
> > > > > >>>>>>>
> > > > > >>>>>>>       drm_atomic_helper_shutdown();
> > > > > >>>>>>>
> > > > > >>>>>>>       drm_dev_exit();
> > > > > >>>>>>> }
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> > > > > >>>>>> registered or not, it doesn't say anything about the state of the parent
> > > > > >>>>>> device.
> > > > > >>>>>>
> > > > > >>>>>> All we know is that the device is being unbound from the driver, we
> > > > > >>>>>> don't know if it's the device that's being removed or if it's the driver
> > > > > >>>>>> that's unregistered.
> > > > > >>>>>
> > > > > >>>>> You're right, both paths will have called drm_dev_unplug by then.
> > > > > >>>>> Silly me. I really liked my idea :-)
> > > > > >>>>>
> > > > > >>>>>> I have looked at the various call chains:
> > > > > >>>>>>
> > > > > >>>>>> driver_unregister ->
> > > > > >>>>>>     bus_remove_driver ->
> > > > > >>>>>>         driver_detach ->
> > > > > >>>>>>             device_release_driver_internal
> > > > > >>>>>>
> > > > > >>>>>> device_unregister ->
> > > > > >>>>>>     device_del ->
> > > > > >>>>>>         bus_remove_device ->
> > > > > >>>>>>             device_release_driver ->
> > > > > >>>>>>                 device_release_driver_internal
> > > > > >>>>>>
> > > > > >>>>>> sysfs: unbind_store ->
> > > > > >>>>>>     device_release_driver ->
> > > > > >>>>>>         device_release_driver_internal
> > > > > >>>>>>
> > > > > >>>>>> The only way I've found to differentiate between these in a cleanup
> > > > > >>>>>> action is that device_del() uses the bus notifier to signal
> > > > > >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> > > > > >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> > > > > >>>>>
> > > > > >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> > > > > >>>>> way to do this, but best to prototype a patch with this, send it to
> > > > > >>>>> him and ask how to :-)
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> > > > > >>>> need to touch hw after DRM unregister.
> > > > > >>>>
> > > > > >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> > > > > >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> > > > > >>>>>> world :-)
> > > > > >>>>>
> > > > > >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> > > > > >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> > > > > >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> > > > > >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> > > > > >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> > > > > >>>>>
> > > > > >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> > > > > >>>>> apparently the cmdline tool you need, never tried it, I only switch
> > > > > >>>>> the kernel's console between fbcon and dummycon and back, not what
> > > > > >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> > > > > >>>>> And maybe you have a better idea how to deal with this all.
> > > > > >>>>>
> > > > > >>>>> Note also that there's been proposals floating around to only close an
> > > > > >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> > > > > >>>>> does), with that closing userspace would not necessarily lead to a
> > > > > >>>>> full cleanup.
> > > > > >>>>>
> > > > > >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> > > > > >>>>> is if you have the display on, but no planes showing (i.e. all black).
> > > > > >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> > > > > >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> > > > > >>>>> even if fbcon/fbdev isn't even enabled.
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> > > > > >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> > > > > >>>> emulation also releases everything.
> > > > > >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> > > > > >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> > > > > >>>
> > > > > >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> > > > > >>>
> > > > > >>> Super short summary: We want to start using devm actions to clean up drm
> > > > > >>> drivers. Here's the problem:
> > > > > >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> > > > > >>>   up the hardware and shut it all down.
> > > > > >>
> > > > > >> Then do it on probe/disconnect.
> > > > > >>
> > > > > >>> - But if the device is unplugged already, that's probably not the best
> > > > > >>>   idea, and we only want to clean up the kernel's resources/allocations.
> > > > > >>
> > > > > >> Again, probe/disconnect will be called either way.
> > > > > >>
> > > > > >> But as you note, memory will NOT be freed by the devm stuff if you
> > > > > >> manually unbind a driver from a device.
> > > > > >>
> > > > > >> So don't touch hardware there, it's not going to work :)
> > > > > >>
> > > > > >>> What's the recommendation here? I see a few options:
> > > > > >>>
> > > > > >>> - Make sure everything can deal with this properly. Hotunplug can happen
> > > > > >>>   anytime, so there's a race no matter what.
> > > > > >>
> > > > > >> Yes.
> > > > > >>
> > > > > >>> - Check with the device model whether the struct device is disappearing or
> > > > > >>>   whether we're just dealing with a driver unbind (no idea how to do
> > > > > >>>   that), and act accordingly.
> > > > > >>
> > > > > >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> > > > > >> Assuming your hardware is still present :)
> > > > > >>
> > > > > >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> > > > > >>>   not, then the pretty nifty plan laid out in this thread wont work.
> > > > > >>
> > > > > >> Nope, that's not going to work, the device could either be long gone, or
> > > > > >> you will not be called due to unbind happening from userspace.
> > > > > >>
> > > > > >> But really, unbind from userspace is very very rare, it's a developer
> > > > > >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> > > > > >> :)
> > > > > >>
> > > > > >>> - Something completely different?
> > > > > >>
> > > > > >> Do it in disconnect :)
> > > > > >
> > > > > > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > > > > > should be the following sequence:
> > > > > >
> > > > > > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > > > > > make sure all the pending ioctls are done so that from now on
> > > > > > userspace sees lots of -EIO (in case it still has fd open, which is
> > > > > > going to be the normal for hotunplug.
> > > > > >
> > > > > > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > > > > > but needs to be able to cope with sudden hotunplug on top anyway).
> > > > > >
> > > > > > 3. Clean up the kernel mess and release everything.
> > > > > >
> > > > > > Probe is exactly the other way round, so would perfectly fit into the
> > > > > > devm onion cleanup. See in the commented earlier replies above how
> > > > > > that would match in details, but tldr; if we have to do 2. in
> > > > > > disconnect, then we also have to do 1. in disconnected, and only doing
> > > > > > 3. through devm is almost not worth the bother. But if we could do all
> > > > > > three through devm then simple drivers wouldn't even need any
> > > > > > disconnect/unbind callback at all. That's our motivation for trying to
> > > > > > come up with an answer that's not "do it in disconnect". "do it in
> > > > > > disconnect" is how we do it all today already.
> > > > > >
> > > > > > Yes we're trying to make tiny drivers even smaller, we have enough
> > > > > > nowadays that this stuff would be worth it :-)
> > > > > >
> > > > >
> > > > > I think a solution is to say that drivers that want to touch hw on
> > > > > disconnect needs to use device_driver->remove to do that.
> > > > >
> > > > > This is an example driver that doesn't need to touch hw because it's so
> > > > > simple that userspace has disabled the pipeline:
> > > > >
> > > > > static void drm_driver_release(struct drm_device *drm)
> > > > > {
> > > > >     drm_mode_config_cleanup(drm);
> > > > >     drm_dev_fini(drm);
> > > > >     kfree(drm);
> > > > > }
> > > > >
> > > > > static struct drm_driver drm_driver = {
> > > > >     .release = drm_driver_release,
> > > > >     /* ... */
> > > > > };
> > > > >
> > > > > static int driver_probe(struct device *dev)
> > > > > {
> > > > >     struct drm_device *drm;
> > > > >     int ret;
> > > > >
> > > > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > > > >     if (!drm)
> > > > >             return -ENOMEM;
> > > > >
> > > > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > > > >     if (ret) {
> > > > >             kfree(drm);
> > > > >             return ret;
> > > > >     }
> > > > >
> > > > >     drm_mode_config_init(drm);
> > > > >
> > > > >     /* Aquire various resources, all managed by devres */
> > > > >
> > > > >     drm_mode_config_reset(drm);
> > > > >
> > > > >     return devm_drm_dev_register(drm);
> > > > > }
> > > > >
> > > > > struct device_driver driver = {
> > > > >     .probe = driver_probe,
> > > > > };
> > > > >
> > > > >
> > > > > A driver that wants to touch hardware on disconnect, can look like this:
> > > > >
> > > > > static void drm_driver_release(struct drm_device *drm)
> > > > > {
> > > > >     drm_mode_config_cleanup(drm);
> > > > >     drm_dev_fini(drm);
> > > > >     kfree(drm);
> > > > > }
> > > > >
> > > > > static struct drm_driver drm_driver = {
> > > > >     .release = drm_driver_release,
> > > > >     /* ... */
> > > > > };
> > > > >
> > > > > static int driver_probe(struct device *dev)
> > > > > {
> > > > >     struct drm_device *drm;
> > > > >     int ret;
> > > > >
> > > > >     drm = kzalloc(sizeof(*drm), GFP_KERNEL);
> > > > >     if (!drm)
> > > > >             return -ENOMEM;
> > > > >
> > > > >     ret = devm_drm_dev_init(dev, drm, &drm_driver);
> > > > >     if (ret) {
> > > > >             kfree(drm);
> > > > >             return ret;
> > > > >     }
> > > > >
> > > > >     drm_mode_config_init(drm);
> > > > >
> > > > >     /* Aquire various resources, all managed by devres */
> > > > >
> > > > >     drm_mode_config_reset(drm);
> > > > >
> > > > >     ret = drm_dev_register(drm);
> > > > >     if (ret)
> > > > >             return ret;
> > > > >
> > > > >     drm_dev_get(dev); /* If using drm_dev_unplug() */
> > > > >
> > > > >     dev_set_drvdata(dev, drm);
> > > > >
> > > > >     return 0;
> > > > > }
> > > > >
> > > > > /* This function is called before devres_release_all() */
> > > > > static int driver_remove(struct device *dev)
> > > > > {
> > > > >     struct drm_device *drm = dev_get_drvdata(dev);
> > > > >
> > > > >     drm_dev_unplug(drm); OR drm_dev_unregister(drm);
> > > > >     drm_atomic_helper_shutdown(drm)
> > > > >
> > > > >     return 0;
> > > > > }
> > > > >
> > > > > struct device_driver driver = {
> > > > >     .probe = driver_probe,
> > > > >     .remove = driver_remove,
> > > >
> > > > That's exactly the pattern I'm trying to avoid, because imo your tiny
> > > > driver _also_ should do this. Or we realize that all the current drivers
> > > > doing drm_atomic_helper_shutdown are misguided, but I'm not really
> > > > understanding why.
> > > >
> > > > Having a devm helper which cannot be used for some drivers due to
> > > > fundamental design issues is kinda not great, because it means everyone
> > > > will still use it, and shrug the bugs off as "not my problem". Which is
> > > > what's happening right now with all the devm_kzalloc we have in drm
> > > > drivers that all the get lifetim wrong. But because devm_ is convenient
> > > > everyone uses it, so the driver unload of most drivers is full of bugs.
> > >
> > > driver "unload" should not be full of bugs, how would it?  Anything
> > > created with devm_() will just be freed when the device really goes away
> > > in the system, it shouldn't call back into the driver.
> >
> > The problem is when drivers use devm_ not to allocate hw resources and
> > related things, but structures for objects with other lifetimes. Like
> > open file descriptors shared with the world.
>
> And irqs, which bites everyone in the end.  You have to be careful here,
> never tie a devm allocation to an object with another reference count,
> that's just a bug.

The classic "I forgot to shut down the interrupt before releasing
driver structures and now the irq handler oopsed" or something more
sinister? gpus definitely needs lots of interrupt stuff, so if that is
fundamentally broken then our devm ideas won't work out at all. And
yes the idea is to register devm actions which call drm_dev_put() and
stuff like that, so people stop freeing stuff to early through the
wrong devm functions because devm_kmalloc is less typing :-)
On Wed, Jan 30, 2019 at 12:14:46AM +0100, Daniel Vetter wrote:
> On Tue, Jan 29, 2019 at 8:27 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Tue, Jan 29, 2019 at 07:10:55PM +0100, Daniel Vetter wrote:
> > > The problem is when drivers use devm_ not to allocate hw resources and
> > > related things, but structures for objects with other lifetimes. Like
> > > open file descriptors shared with the world.
> >
> > And irqs, which bites everyone in the end.  You have to be careful here,
> > never tie a devm allocation to an object with another reference count,
> > that's just a bug.
> 
> The classic "I forgot to shut down the interrupt before releasing
> driver structures and now the irq handler oopsed" or something more
> sinister?

The classic is always the best, and most common :)

> gpus definitely needs lots of interrupt stuff, so if that is
> fundamentally broken then our devm ideas won't work out at all.

You better not be using devm for your irqs right now then.

good luck!

greg k-h