[v2,1/3] drm/atomic: Add a generic infrastructure to track underrun errors

Submitted by Boris Brezillon on Oct. 25, 2018, 12:45 p.m.

Details

Message ID 20181025124546.22145-2-boris.brezillon@bootlin.com
State New
Headers show
Series "drm/vc4: Add a load tracker" ( rev: 1 ) in DRI devel

Browsing this patch as part of:
"drm/vc4: Add a load tracker" rev 1 in DRI devel
<< prev patch [1/3] next patch >>

Commit Message

Boris Brezillon Oct. 25, 2018, 12:45 p.m.
Display controllers usually provide a lot features like overlay planes,
hardware scalers, hardware rotations, ...
Most of the time those features work just fine when enabled
independently, but activating all of them at the same time tend to
show other limitations like the limited memory bus or scaler
bandwidth.

This sort of bandwidth estimation is hard to get right, and we
sometimes fail to predict that a specific workload will not pass,
which will most likely result in underrun errors somewhere in the
display pipeline (most of the time at the CRTC level).

This path aims at making underrun error tracking generic and exposing
underrun errors to userspace through a debugfs file.

This way, CI tools like IGT, can try to enable a bunch of features and
if such underrun errors are reported after the settings have been
accepted and applied by the KMS infrastructure. When that happens it
means the load tracking algorithm needs some tweaking/fixing.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
---
Changes in v2:
- New patch
---
 drivers/gpu/drm/drm_atomic.c        | 12 +++++++
 drivers/gpu/drm/drm_atomic_helper.c |  4 +++
 include/drm/drm_atomic_helper.h     | 53 +++++++++++++++++++++++++++++
 include/drm/drm_device.h            | 15 ++++++++
 include/drm/drm_mode_config.h       | 26 ++++++++++++++
 5 files changed, 110 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 0e27d45feba9..350348961ab3 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -1179,9 +1179,21 @@  static int drm_state_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int drm_underrun_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	drm_printf(&p, "%c\n", atomic_read(&dev->underrun) ? 'Y' : 'N');
+
+	return 0;
+}
+
 /* any use in debugfs files to dump individual planes/crtc/etc? */
 static const struct drm_info_list drm_atomic_debugfs_list[] = {
 	{"state", drm_state_info, 0},
+	{"underrun", drm_underrun_info, 0 },
 };
 
 int drm_atomic_debugfs_init(struct drm_minor *minor)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 6f66777dca4b..9956c2928c5b 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1456,6 +1456,8 @@  void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
 {
 	struct drm_device *dev = old_state->dev;
 
+	drm_atomic_helper_underrun_stop(dev);
+
 	drm_atomic_helper_commit_modeset_disables(dev, old_state);
 
 	drm_atomic_helper_commit_planes(dev, old_state, 0);
@@ -1466,6 +1468,8 @@  void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
 
 	drm_atomic_helper_commit_hw_done(old_state);
 
+	drm_atomic_helper_underrun_start(dev);
+
 	drm_atomic_helper_wait_for_vblanks(dev, old_state);
 
 	drm_atomic_helper_cleanup_planes(dev, old_state);
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 25ca0097563e..f712144dd26d 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -221,4 +221,57 @@  drm_atomic_plane_disabling(struct drm_plane_state *old_plane_state,
 	return old_plane_state->crtc && !new_plane_state->crtc;
 }
 
+/**
+ * drm_atomic_helper_underrun_set() - Set the underrun bit
+ * @dev: device the underrun happened on
+ *
+ * This function should be called when an underrun happens after an update
+ * has been committed to the device.
+ * Underrun interrupts are masked to avoid flooding the CPU with useless
+ * interrupts.
+ */
+static inline void drm_atomic_helper_underrun_set(struct drm_device *dev)
+{
+	atomic_set(&dev->underrun, 1);
+
+	if (dev->mode_config.funcs->atomic_mask_underrun)
+		dev->mode_config.funcs->atomic_mask_underrun(dev);
+}
+
+/**
+ * drm_atomic_helper_underrun_stop() - Stop generating underrun events
+ * @dev: device we want to stop underrun on
+ *
+ * This function should be called in the atomic commit path, just before
+ * committing changes to the hardware.
+ * The underrun interrupt is simply masked prior to getting underrun events
+ * while we are applying the new config.
+ */
+static inline void drm_atomic_helper_underrun_stop(struct drm_device *dev)
+{
+	if (dev->mode_config.funcs->atomic_mask_underrun)
+		dev->mode_config.funcs->atomic_mask_underrun(dev);
+}
+
+/**
+ * drm_atomic_helper_underrun_start() - Start listening to underrun events
+ * @dev: device we want to start underrun on
+ *
+ * This function should be called in the atomic commit path, just after
+ * the changes have been committed to the hardware.
+ * Any pending underrun events are cleared before unmasking the interrupt to
+ * guarantee that further underrun events are really caused by the new
+ * configuration.
+ */
+static inline void drm_atomic_helper_underrun_start(struct drm_device *dev)
+{
+	atomic_set(&dev->underrun, 0);
+
+	if (dev->mode_config.funcs->atomic_clear_underrun)
+		dev->mode_config.funcs->atomic_clear_underrun(dev);
+
+	if (dev->mode_config.funcs->atomic_unmask_underrun)
+		dev->mode_config.funcs->atomic_unmask_underrun(dev);
+}
+
 #endif /* DRM_ATOMIC_HELPER_H_ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 42411b3ea0c8..59c63e03a776 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -231,6 +231,21 @@  struct drm_device {
 	 * Set by drm_fb_helper_init() and cleared by drm_fb_helper_fini().
 	 */
 	struct drm_fb_helper *fb_helper;
+
+	/**
+	 * @underrun:
+	 *
+	 * Set to 1 when an underrun error happened since the last atomic
+	 * commit, 0 otherwise.
+	 * This is particularly useful to detect when a specific modeset is too
+	 * demanding in term of memory bandwidth or other kind limitations that
+	 * are hard to guess at atomic check time.
+	 * Drivers should only set underrun to 1 (using
+	 * drm_atomic_helper_underrun_set()) when an underrun interrupt occurs
+	 * and clear if from their atomic commit hook (using
+	 * drm_atomic_helper_underrun_{stop,start}()).
+	 */
+	atomic_t underrun;
 };
 
 #endif
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 8d540cdae57d..28b83e295fd2 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -325,6 +325,32 @@  struct drm_mode_config_funcs {
 	 * &drm_private_state and &drm_private_obj.
 	 */
 	void (*atomic_state_free)(struct drm_atomic_state *state);
+
+	/**
+	 * @atomic_mask_underrun:
+	 *
+	 * This is an optional hook called when the core needs to disable/mask
+	 * the underrun interrupt. It's used by drm_atomic_helper_underrun_set()
+	 * and drm_atomic_helper_underrun_stop().
+	 */
+	void (*atomic_mask_underrun)(struct drm_device *dev);
+
+	/**
+	 * @atomic_clear_underrun:
+	 *
+	 * This is an optional hook called when the core needs to clear pending
+	 * underrun events. It's used by drm_atomic_helper_underrun_start() to
+	 * before unmask the underrun interrupt.
+	 */
+	void (*atomic_clear_underrun)(struct drm_device *dev);
+
+	/**
+	 * @atomic_unmask_underrun:
+	 *
+	 * This is an optional hook called when the core needs to unmask
+	 * underrun interrupts. It's used by drm_atomic_helper_underrun_start().
+	 */
+	void (*atomic_unmask_underrun)(struct drm_device *dev);
 };
 
 /**

Comments

On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> Display controllers usually provide a lot features like overlay planes,
> hardware scalers, hardware rotations, ...
> Most of the time those features work just fine when enabled
> independently, but activating all of them at the same time tend to
> show other limitations like the limited memory bus or scaler
> bandwidth.
> 
> This sort of bandwidth estimation is hard to get right, and we
> sometimes fail to predict that a specific workload will not pass,
> which will most likely result in underrun errors somewhere in the
> display pipeline (most of the time at the CRTC level).
> 
> This path aims at making underrun error tracking generic and exposing
> underrun errors to userspace through a debugfs file.
> 
> This way, CI tools like IGT, can try to enable a bunch of features and
> if such underrun errors are reported after the settings have been
> accepted and applied by the KMS infrastructure. When that happens it
> means the load tracking algorithm needs some tweaking/fixing.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> ---
> Changes in v2:
> - New patch

Hm, not what I thought off. This is definitely not going to be reusable
for i915, the only other driver with underrun reporting, since we have
tons of underrun sources (not just one). And we need to mask/unmask them
individually.

It's also midlayer mistake, since you're stuff your callbacks into the
drm_mode_config_funcs core structure (which should only be used for uapi
interfaces, not helper stuff).

What I had in mind is just a simple function which spews into dmesg (so
that i915 could use it) and increments the underrun counter in debugfs.
Everything else needs to be handled in drivers I think. E.g.

drm_mode_config_report_underrun(struct drm_device *dev,
				const char *source);

and then rolling it out for i915 too (otherwise there's really no point in
the common infrastructure).

Cheers, Daniel

> ---
>  drivers/gpu/drm/drm_atomic.c        | 12 +++++++
>  drivers/gpu/drm/drm_atomic_helper.c |  4 +++
>  include/drm/drm_atomic_helper.h     | 53 +++++++++++++++++++++++++++++
>  include/drm/drm_device.h            | 15 ++++++++
>  include/drm/drm_mode_config.h       | 26 ++++++++++++++
>  5 files changed, 110 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> index 0e27d45feba9..350348961ab3 100644
> --- a/drivers/gpu/drm/drm_atomic.c
> +++ b/drivers/gpu/drm/drm_atomic.c
> @@ -1179,9 +1179,21 @@ static int drm_state_info(struct seq_file *m, void *data)
>  	return 0;
>  }
>  
> +static int drm_underrun_info(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_printer p = drm_seq_file_printer(m);
> +
> +	drm_printf(&p, "%c\n", atomic_read(&dev->underrun) ? 'Y' : 'N');
> +
> +	return 0;
> +}
> +
>  /* any use in debugfs files to dump individual planes/crtc/etc? */
>  static const struct drm_info_list drm_atomic_debugfs_list[] = {
>  	{"state", drm_state_info, 0},
> +	{"underrun", drm_underrun_info, 0 },
>  };
>  
>  int drm_atomic_debugfs_init(struct drm_minor *minor)
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 6f66777dca4b..9956c2928c5b 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1456,6 +1456,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
>  {
>  	struct drm_device *dev = old_state->dev;
>  
> +	drm_atomic_helper_underrun_stop(dev);
> +
>  	drm_atomic_helper_commit_modeset_disables(dev, old_state);
>  
>  	drm_atomic_helper_commit_planes(dev, old_state, 0);
> @@ -1466,6 +1468,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
>  
>  	drm_atomic_helper_commit_hw_done(old_state);
>  
> +	drm_atomic_helper_underrun_start(dev);
> +
>  	drm_atomic_helper_wait_for_vblanks(dev, old_state);
>  
>  	drm_atomic_helper_cleanup_planes(dev, old_state);
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 25ca0097563e..f712144dd26d 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -221,4 +221,57 @@ drm_atomic_plane_disabling(struct drm_plane_state *old_plane_state,
>  	return old_plane_state->crtc && !new_plane_state->crtc;
>  }
>  
> +/**
> + * drm_atomic_helper_underrun_set() - Set the underrun bit
> + * @dev: device the underrun happened on
> + *
> + * This function should be called when an underrun happens after an update
> + * has been committed to the device.
> + * Underrun interrupts are masked to avoid flooding the CPU with useless
> + * interrupts.
> + */
> +static inline void drm_atomic_helper_underrun_set(struct drm_device *dev)
> +{
> +	atomic_set(&dev->underrun, 1);
> +
> +	if (dev->mode_config.funcs->atomic_mask_underrun)
> +		dev->mode_config.funcs->atomic_mask_underrun(dev);
> +}
> +
> +/**
> + * drm_atomic_helper_underrun_stop() - Stop generating underrun events
> + * @dev: device we want to stop underrun on
> + *
> + * This function should be called in the atomic commit path, just before
> + * committing changes to the hardware.
> + * The underrun interrupt is simply masked prior to getting underrun events
> + * while we are applying the new config.
> + */
> +static inline void drm_atomic_helper_underrun_stop(struct drm_device *dev)
> +{
> +	if (dev->mode_config.funcs->atomic_mask_underrun)
> +		dev->mode_config.funcs->atomic_mask_underrun(dev);
> +}
> +
> +/**
> + * drm_atomic_helper_underrun_start() - Start listening to underrun events
> + * @dev: device we want to start underrun on
> + *
> + * This function should be called in the atomic commit path, just after
> + * the changes have been committed to the hardware.
> + * Any pending underrun events are cleared before unmasking the interrupt to
> + * guarantee that further underrun events are really caused by the new
> + * configuration.
> + */
> +static inline void drm_atomic_helper_underrun_start(struct drm_device *dev)
> +{
> +	atomic_set(&dev->underrun, 0);
> +
> +	if (dev->mode_config.funcs->atomic_clear_underrun)
> +		dev->mode_config.funcs->atomic_clear_underrun(dev);
> +
> +	if (dev->mode_config.funcs->atomic_unmask_underrun)
> +		dev->mode_config.funcs->atomic_unmask_underrun(dev);
> +}
> +
>  #endif /* DRM_ATOMIC_HELPER_H_ */
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index 42411b3ea0c8..59c63e03a776 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -231,6 +231,21 @@ struct drm_device {
>  	 * Set by drm_fb_helper_init() and cleared by drm_fb_helper_fini().
>  	 */
>  	struct drm_fb_helper *fb_helper;
> +
> +	/**
> +	 * @underrun:
> +	 *
> +	 * Set to 1 when an underrun error happened since the last atomic
> +	 * commit, 0 otherwise.
> +	 * This is particularly useful to detect when a specific modeset is too
> +	 * demanding in term of memory bandwidth or other kind limitations that
> +	 * are hard to guess at atomic check time.
> +	 * Drivers should only set underrun to 1 (using
> +	 * drm_atomic_helper_underrun_set()) when an underrun interrupt occurs
> +	 * and clear if from their atomic commit hook (using
> +	 * drm_atomic_helper_underrun_{stop,start}()).
> +	 */
> +	atomic_t underrun;
>  };
>  
>  #endif
> diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
> index 8d540cdae57d..28b83e295fd2 100644
> --- a/include/drm/drm_mode_config.h
> +++ b/include/drm/drm_mode_config.h
> @@ -325,6 +325,32 @@ struct drm_mode_config_funcs {
>  	 * &drm_private_state and &drm_private_obj.
>  	 */
>  	void (*atomic_state_free)(struct drm_atomic_state *state);
> +
> +	/**
> +	 * @atomic_mask_underrun:
> +	 *
> +	 * This is an optional hook called when the core needs to disable/mask
> +	 * the underrun interrupt. It's used by drm_atomic_helper_underrun_set()
> +	 * and drm_atomic_helper_underrun_stop().
> +	 */
> +	void (*atomic_mask_underrun)(struct drm_device *dev);
> +
> +	/**
> +	 * @atomic_clear_underrun:
> +	 *
> +	 * This is an optional hook called when the core needs to clear pending
> +	 * underrun events. It's used by drm_atomic_helper_underrun_start() to
> +	 * before unmask the underrun interrupt.
> +	 */
> +	void (*atomic_clear_underrun)(struct drm_device *dev);
> +
> +	/**
> +	 * @atomic_unmask_underrun:
> +	 *
> +	 * This is an optional hook called when the core needs to unmask
> +	 * underrun interrupts. It's used by drm_atomic_helper_underrun_start().
> +	 */
> +	void (*atomic_unmask_underrun)(struct drm_device *dev);
>  };
>  
>  /**
> -- 
> 2.17.1
>
Hi Daniel,

On Fri, 26 Oct 2018 12:33:37 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > Display controllers usually provide a lot features like overlay planes,
> > hardware scalers, hardware rotations, ...
> > Most of the time those features work just fine when enabled
> > independently, but activating all of them at the same time tend to
> > show other limitations like the limited memory bus or scaler
> > bandwidth.
> > 
> > This sort of bandwidth estimation is hard to get right, and we
> > sometimes fail to predict that a specific workload will not pass,
> > which will most likely result in underrun errors somewhere in the
> > display pipeline (most of the time at the CRTC level).
> > 
> > This path aims at making underrun error tracking generic and exposing
> > underrun errors to userspace through a debugfs file.
> > 
> > This way, CI tools like IGT, can try to enable a bunch of features and
> > if such underrun errors are reported after the settings have been
> > accepted and applied by the KMS infrastructure. When that happens it
> > means the load tracking algorithm needs some tweaking/fixing.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > ---
> > Changes in v2:
> > - New patch  
> 
> Hm, not what I thought off. This is definitely not going to be reusable
> for i915, the only other driver with underrun reporting, since we have
> tons of underrun sources (not just one). And we need to mask/unmask them
> individually.

I agree it's not perfect, but I did it this way so that hopefully we'll
be able to use drm_atomic_helper_commit_tail() at some point instead of
having our own implementation.

If there's a way to add pre/post_commit() hooks (maybe they already
exist?) at various levels of the display pipeline that get called even
when the associated state has *not* changed, then I might be able to
place my mask/unmask/clear operations there. I really need that to
properly unmask the underrun interrupt, otherwise underruns might not be
detected after the first modeset if nothing changed in sub-parts of the
pipeline.

> 
> It's also midlayer mistake, since you're stuff your callbacks into the
> drm_mode_config_funcs core structure (which should only be used for uapi
> interfaces, not helper stuff).

Okay.

> 
> What I had in mind is just a simple function which spews into dmesg (so
> that i915 could use it) and increments the underrun counter in debugfs.
> Everything else needs to be handled in drivers I think. E.g.
> 
> drm_mode_config_report_underrun(struct drm_device *dev,
> 				const char *source);
> 
> and then rolling it out for i915 too (otherwise there's really no point in
> the common infrastructure).

Modifying the i915 driver was a bit premature since I didn't have your
feedback on the general approach. And given your review, I'm glad I
didn't start this conversion :-P.

Anyway, I was not convinced having a generic infrastructure to report
underrun errors was needed in the first place. The fact that these
errors are very HW-specific, and that data attached to such events
might be useful to understand what actually happened makes me think we
don't really need this generic underrun reporting helper.

Also note that IGT tests are likely to be HW specific too, because the
workload needed to trigger underrun errors is likely to depend on the
platform you're running on. And if you already have to write your own
tests, I don't see clear benefit in exposing underrun errors
generically.

Anyway, this is just my opinion, and if you think we actually need the
drm_mode_config_report_underrun() function, I'll implement it.

Regards,

Boris
On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> Hi Daniel,
> 
> On Fri, 26 Oct 2018 12:33:37 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > > Display controllers usually provide a lot features like overlay planes,
> > > hardware scalers, hardware rotations, ...
> > > Most of the time those features work just fine when enabled
> > > independently, but activating all of them at the same time tend to
> > > show other limitations like the limited memory bus or scaler
> > > bandwidth.
> > > 
> > > This sort of bandwidth estimation is hard to get right, and we
> > > sometimes fail to predict that a specific workload will not pass,
> > > which will most likely result in underrun errors somewhere in the
> > > display pipeline (most of the time at the CRTC level).
> > > 
> > > This path aims at making underrun error tracking generic and exposing
> > > underrun errors to userspace through a debugfs file.
> > > 
> > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > if such underrun errors are reported after the settings have been
> > > accepted and applied by the KMS infrastructure. When that happens it
> > > means the load tracking algorithm needs some tweaking/fixing.
> > > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > ---
> > > Changes in v2:
> > > - New patch  
> > 
> > Hm, not what I thought off. This is definitely not going to be reusable
> > for i915, the only other driver with underrun reporting, since we have
> > tons of underrun sources (not just one). And we need to mask/unmask them
> > individually.
> 
> I agree it's not perfect, but I did it this way so that hopefully we'll
> be able to use drm_atomic_helper_commit_tail() at some point instead of
> having our own implementation.

I expect every non-trivial driver to need to overwrite commit_tail. That's
the entire point for having created it. Exceptions are just the very, very
simple ones without anything fancy.

> If there's a way to add pre/post_commit() hooks (maybe they already
> exist?) at various levels of the display pipeline that get called even
> when the associated state has *not* changed, then I might be able to
> place my mask/unmask/clear operations there. I really need that to
> properly unmask the underrun interrupt, otherwise underruns might not be
> detected after the first modeset if nothing changed in sub-parts of the
> pipeline.

Just overwrite commit_tail. That's why it exists.

> > It's also midlayer mistake, since you're stuff your callbacks into the
> > drm_mode_config_funcs core structure (which should only be used for uapi
> > interfaces, not helper stuff).
> 
> Okay.
> 
> > 
> > What I had in mind is just a simple function which spews into dmesg (so
> > that i915 could use it) and increments the underrun counter in debugfs.
> > Everything else needs to be handled in drivers I think. E.g.
> > 
> > drm_mode_config_report_underrun(struct drm_device *dev,
> > 				const char *source);
> > 
> > and then rolling it out for i915 too (otherwise there's really no point in
> > the common infrastructure).
> 
> Modifying the i915 driver was a bit premature since I didn't have your
> feedback on the general approach. And given your review, I'm glad I
> didn't start this conversion :-P.
> 
> Anyway, I was not convinced having a generic infrastructure to report
> underrun errors was needed in the first place. The fact that these
> errors are very HW-specific, and that data attached to such events
> might be useful to understand what actually happened makes me think we
> don't really need this generic underrun reporting helper.
> 
> Also note that IGT tests are likely to be HW specific too, because the
> workload needed to trigger underrun errors is likely to depend on the
> platform you're running on. And if you already have to write your own
> tests, I don't see clear benefit in exposing underrun errors
> generically.

See my other reply in the earlier thread: That's why we're just dumping
errors into dmesg, and catching any such thing in our runners. We do not
have any specific underrun tests, we treat the entire igt test suite of
kms tests as our underrun test suite.

Extending that with more nasty corner cases is obviously great, and will
benefit everyone. But I don't think you want an igt testcase that looks
specifically for underruns. We do use crc to hunt for specific issues, but
again that's portable between drivers.

> Anyway, this is just my opinion, and if you think we actually need the
> drm_mode_config_report_underrun() function, I'll implement it.

I do think having some standard in how these errors are reported would be
useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.
-Daniel
On Fri, 26 Oct 2018 15:36:15 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> > Hi Daniel,
> > 
> > On Fri, 26 Oct 2018 12:33:37 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:  
> > > > Display controllers usually provide a lot features like overlay planes,
> > > > hardware scalers, hardware rotations, ...
> > > > Most of the time those features work just fine when enabled
> > > > independently, but activating all of them at the same time tend to
> > > > show other limitations like the limited memory bus or scaler
> > > > bandwidth.
> > > > 
> > > > This sort of bandwidth estimation is hard to get right, and we
> > > > sometimes fail to predict that a specific workload will not pass,
> > > > which will most likely result in underrun errors somewhere in the
> > > > display pipeline (most of the time at the CRTC level).
> > > > 
> > > > This path aims at making underrun error tracking generic and exposing
> > > > underrun errors to userspace through a debugfs file.
> > > > 
> > > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > > if such underrun errors are reported after the settings have been
> > > > accepted and applied by the KMS infrastructure. When that happens it
> > > > means the load tracking algorithm needs some tweaking/fixing.
> > > > 
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > ---
> > > > Changes in v2:
> > > > - New patch    
> > > 
> > > Hm, not what I thought off. This is definitely not going to be reusable
> > > for i915, the only other driver with underrun reporting, since we have
> > > tons of underrun sources (not just one). And we need to mask/unmask them
> > > individually.  
> > 
> > I agree it's not perfect, but I did it this way so that hopefully we'll
> > be able to use drm_atomic_helper_commit_tail() at some point instead of
> > having our own implementation.  
> 
> I expect every non-trivial driver to need to overwrite commit_tail. That's
> the entire point for having created it. Exceptions are just the very, very
> simple ones without anything fancy.

Again, just my opinion, but doing that means some drivers are left
behind when new stuff are added to the generic helpers, and sometimes
those new features might be of interest to the drivers that are not
directly using those generic helpers, except they usually don't even
know about it until one comes at looking at the generic helper again
and notice the additions.
Maybe I'm not following the dri-devel ML close enough, and I'm probably
the one to blame here, but I see a real benefit in having the maximum
number of drivers using the generic helpers. To be honest, with a few
modifications, I think the VC4 driver could entirely rely on those
generic helpers.

> 
> > If there's a way to add pre/post_commit() hooks (maybe they already
> > exist?) at various levels of the display pipeline that get called even
> > when the associated state has *not* changed, then I might be able to
> > place my mask/unmask/clear operations there. I really need that to
> > properly unmask the underrun interrupt, otherwise underruns might not be
> > detected after the first modeset if nothing changed in sub-parts of the
> > pipeline.  
> 
> Just overwrite commit_tail. That's why it exists.

And that's what we do already, but I was hoping that someday we would
be able to switch to the generic helper...

> 
> > > It's also midlayer mistake, since you're stuff your callbacks into the
> > > drm_mode_config_funcs core structure (which should only be used for uapi
> > > interfaces, not helper stuff).  
> > 
> > Okay.
> >   
> > > 
> > > What I had in mind is just a simple function which spews into dmesg (so
> > > that i915 could use it) and increments the underrun counter in debugfs.
> > > Everything else needs to be handled in drivers I think. E.g.
> > > 
> > > drm_mode_config_report_underrun(struct drm_device *dev,
> > > 				const char *source);
> > > 
> > > and then rolling it out for i915 too (otherwise there's really no point in
> > > the common infrastructure).  
> > 
> > Modifying the i915 driver was a bit premature since I didn't have your
> > feedback on the general approach. And given your review, I'm glad I
> > didn't start this conversion :-P.
> > 
> > Anyway, I was not convinced having a generic infrastructure to report
> > underrun errors was needed in the first place. The fact that these
> > errors are very HW-specific, and that data attached to such events
> > might be useful to understand what actually happened makes me think we
> > don't really need this generic underrun reporting helper.
> > 
> > Also note that IGT tests are likely to be HW specific too, because the
> > workload needed to trigger underrun errors is likely to depend on the
> > platform you're running on. And if you already have to write your own
> > tests, I don't see clear benefit in exposing underrun errors
> > generically.  
> 
> See my other reply in the earlier thread: That's why we're just dumping
> errors into dmesg, and catching any such thing in our runners. We do not
> have any specific underrun tests, we treat the entire igt test suite of
> kms tests as our underrun test suite.
> 
> Extending that with more nasty corner cases is obviously great, and will
> benefit everyone. But I don't think you want an igt testcase that looks
> specifically for underruns.

Actually, that's exactly what I wanted to do: use config that were
known to generate underrun errors and make sure they don't after adding
the HVS load tracker. That's also particularly useful to track
regressions in this portion of the code.

> We do use crc to hunt for specific issues, but
> again that's portable between drivers.

We don't use CRC yet, but there was plans to add tests using writeback
to compare the output image to what we expect, and Maxime also worked
on tests relying on the Chamelium board.

Still, knowing why the image is incorrect is useful, and I guess you
could have a wrong output that is not necessarily caused by an
underrun error. So I think having tests that explicly check for
underrun errors is a good thing.

> 
> > Anyway, this is just my opinion, and if you think we actually need the
> > drm_mode_config_report_underrun() function, I'll implement it.  
> 
> I do think having some standard in how these errors are reported would be
> useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.

I'll define such a macro.
On Fri, Oct 26, 2018 at 4:13 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
>
> On Fri, 26 Oct 2018 15:36:15 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> > > Hi Daniel,
> > >
> > > On Fri, 26 Oct 2018 12:33:37 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > > > > Display controllers usually provide a lot features like overlay planes,
> > > > > hardware scalers, hardware rotations, ...
> > > > > Most of the time those features work just fine when enabled
> > > > > independently, but activating all of them at the same time tend to
> > > > > show other limitations like the limited memory bus or scaler
> > > > > bandwidth.
> > > > >
> > > > > This sort of bandwidth estimation is hard to get right, and we
> > > > > sometimes fail to predict that a specific workload will not pass,
> > > > > which will most likely result in underrun errors somewhere in the
> > > > > display pipeline (most of the time at the CRTC level).
> > > > >
> > > > > This path aims at making underrun error tracking generic and exposing
> > > > > underrun errors to userspace through a debugfs file.
> > > > >
> > > > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > > > if such underrun errors are reported after the settings have been
> > > > > accepted and applied by the KMS infrastructure. When that happens it
> > > > > means the load tracking algorithm needs some tweaking/fixing.
> > > > >
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > ---
> > > > > Changes in v2:
> > > > > - New patch
> > > >
> > > > Hm, not what I thought off. This is definitely not going to be reusable
> > > > for i915, the only other driver with underrun reporting, since we have
> > > > tons of underrun sources (not just one). And we need to mask/unmask them
> > > > individually.
> > >
> > > I agree it's not perfect, but I did it this way so that hopefully we'll
> > > be able to use drm_atomic_helper_commit_tail() at some point instead of
> > > having our own implementation.
> >
> > I expect every non-trivial driver to need to overwrite commit_tail. That's
> > the entire point for having created it. Exceptions are just the very, very
> > simple ones without anything fancy.
>
> Again, just my opinion, but doing that means some drivers are left
> behind when new stuff are added to the generic helpers, and sometimes
> those new features might be of interest to the drivers that are not
> directly using those generic helpers, except they usually don't even
> know about it until one comes at looking at the generic helper again
> and notice the additions.
> Maybe I'm not following the dri-devel ML close enough, and I'm probably
> the one to blame here, but I see a real benefit in having the maximum
> number of drivers using the generic helpers. To be honest, with a few
> modifications, I think the VC4 driver could entirely rely on those
> generic helpers.

commit_tail commits the sw state to hw. Your hw will not magically
gain any new features, so there's really not much point in
automatically adding stuff to the overall hw commit flow. Some
exceptoins apply, and those get reviewed when the core change goes in.

Yes I spend ridiculous amounts of my time reading everyone else's
drivers. And I really don't want to make commit_tail 100% applicapable
to everyone, that just makes reviewing drivers harder (since the flow
isn't obvious by simply looking at commit_tail, instead you have to
hunt down feature flags and other stuff that control things).

This holds in general btw, and it's part of the no-midlayer design
paradigm. No huge blobs that work for everyone, instead make things
modular.

But if you have a concrete example where we added something, and a
driver should have used it but didn't, we'll need to look into that.
-Daniel

> > > If there's a way to add pre/post_commit() hooks (maybe they already
> > > exist?) at various levels of the display pipeline that get called even
> > > when the associated state has *not* changed, then I might be able to
> > > place my mask/unmask/clear operations there. I really need that to
> > > properly unmask the underrun interrupt, otherwise underruns might not be
> > > detected after the first modeset if nothing changed in sub-parts of the
> > > pipeline.
> >
> > Just overwrite commit_tail. That's why it exists.
>
> And that's what we do already, but I was hoping that someday we would
> be able to switch to the generic helper...
>
> >
> > > > It's also midlayer mistake, since you're stuff your callbacks into the
> > > > drm_mode_config_funcs core structure (which should only be used for uapi
> > > > interfaces, not helper stuff).
> > >
> > > Okay.
> > >
> > > >
> > > > What I had in mind is just a simple function which spews into dmesg (so
> > > > that i915 could use it) and increments the underrun counter in debugfs.
> > > > Everything else needs to be handled in drivers I think. E.g.
> > > >
> > > > drm_mode_config_report_underrun(struct drm_device *dev,
> > > >                           const char *source);
> > > >
> > > > and then rolling it out for i915 too (otherwise there's really no point in
> > > > the common infrastructure).
> > >
> > > Modifying the i915 driver was a bit premature since I didn't have your
> > > feedback on the general approach. And given your review, I'm glad I
> > > didn't start this conversion :-P.
> > >
> > > Anyway, I was not convinced having a generic infrastructure to report
> > > underrun errors was needed in the first place. The fact that these
> > > errors are very HW-specific, and that data attached to such events
> > > might be useful to understand what actually happened makes me think we
> > > don't really need this generic underrun reporting helper.
> > >
> > > Also note that IGT tests are likely to be HW specific too, because the
> > > workload needed to trigger underrun errors is likely to depend on the
> > > platform you're running on. And if you already have to write your own
> > > tests, I don't see clear benefit in exposing underrun errors
> > > generically.
> >
> > See my other reply in the earlier thread: That's why we're just dumping
> > errors into dmesg, and catching any such thing in our runners. We do not
> > have any specific underrun tests, we treat the entire igt test suite of
> > kms tests as our underrun test suite.
> >
> > Extending that with more nasty corner cases is obviously great, and will
> > benefit everyone. But I don't think you want an igt testcase that looks
> > specifically for underruns.
>
> Actually, that's exactly what I wanted to do: use config that were
> known to generate underrun errors and make sure they don't after adding
> the HVS load tracker. That's also particularly useful to track
> regressions in this portion of the code.
>
> > We do use crc to hunt for specific issues, but
> > again that's portable between drivers.
>
> We don't use CRC yet, but there was plans to add tests using writeback
> to compare the output image to what we expect, and Maxime also worked
> on tests relying on the Chamelium board.
>
> Still, knowing why the image is incorrect is useful, and I guess you
> could have a wrong output that is not necessarily caused by an
> underrun error. So I think having tests that explicly check for
> underrun errors is a good thing.
>
> >
> > > Anyway, this is just my opinion, and if you think we actually need the
> > > drm_mode_config_report_underrun() function, I'll implement it.
> >
> > I do think having some standard in how these errors are reported would be
> > useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.
>
> I'll define such a macro.

Function, if it does anything more than call DRM_ERROR. Just to avoid confusion.
-Daniel