[V2] drm/vkms: Avoid extra discount in the timestamp value

Submitted by Rodrigo Siqueira on June 5, 2019, 2:45 a.m.

Details

Message ID 20190605024543.pcsnkf74mmgfhtuh@smtp.gmail.com
State New
Headers show
Series "drm/vkms: Avoid extra discount in the timestamp value" ( rev: 2 ) in DRI devel

Not browsing as part of any series.

Commit Message

Rodrigo Siqueira June 5, 2019, 2:45 a.m.
After the commit def35e7c5926 ("drm/vkms: Bugfix extra vblank frame")
some of the crc tests started to fail in the vkms with the following
error:

 [drm:drm_crtc_add_crc_entry [drm]] *ERROR* Overflow of CRC buffer,
    userspace reads too slow.
 [drm] failed to queue vkms_crc_work_handle
 ...

The aforementioned commit fixed the extra vblank added by
`drm_crtc_arm_vblank_event()` which is invoked inside
`vkms_crtc_atomic_flush()` if the vblank event count was zero, otherwise
`drm_crtc_send_vblank_event()` is invoked. The fix was implemented in
`vkms_get_vblank_timestamp()` by subtracting one period from the current
timestamp, as the code snippet below illustrates:

 if (!in_vblank_irq)
  *vblank_time -= output->period_ns;

The above fix works well when `drm_crtc_arm_vblank_event()` is invoked.
However, it does not properly work when `drm_crtc_send_vblank_event()`
executes since it subtracts the correct timestamp, which it shouldn't.
In this case, the `drm_crtc_accurate_vblank_count()` function will
returns the wrong frame number, which generates the aforementioned
error. Such decrease in `get_vblank_timestamp()` produce a negative
number in the following calculation within `drm_update_vblank_count()`:

 u64 diff_ns = ktime_to_ns(ktime_sub(t_vblank, vblank->time));

After this operation, the DIV_ROUND_CLOSEST_ULL macro is invoked using
diff_ns with a negative number, which generates an undefined result;
therefore, the returned frame is a huge and incorrect number. Finally,
the code below is part of the `vkms_crc_work_handle()`, note that the
while loop depends on the returned value from
`drm_crtc_accurate_vblank_count()` which may cause the loop to take a
long time to finish in case of huge value.

 frame_end = drm_crtc_accurate_vblank_count(crtc);
 while (frame_start <= frame_end)
   drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);

This commit fixes this issue by checking if the vblank timestamp
corresponding to the current software vblank counter is equal to the
current vblank; if they are equal, it means that
`drm_crtc_send_vblank_event()` was invoked and vkms does not need to
discount the extra vblank, otherwise, `drm_crtc_arm_vblank_event()` was
executed and vkms have to discount the extra vblank. This fix made the
CRC tests work again whereas keep all tests from kms_flip working as
well.

V2: Update commit message

Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
Signed-off-by: Shayenne Moura <shayenneluzmoura@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_crtc.c | 4 ++++
 1 file changed, 4 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
index 7508815fac11..3ce60e66673e 100644
--- a/drivers/gpu/drm/vkms/vkms_crtc.c
+++ b/drivers/gpu/drm/vkms/vkms_crtc.c
@@ -74,9 +74,13 @@  bool vkms_get_vblank_timestamp(struct drm_device *dev, unsigned int pipe,
 {
 	struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev);
 	struct vkms_output *output = &vkmsdev->output;
+	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
 
 	*vblank_time = output->vblank_hrtimer.node.expires;
 
+	if (*vblank_time == vblank->time)
+		return true;
+
 	if (!in_vblank_irq)
 		*vblank_time -= output->period_ns;
 

Comments

On Tue, Jun 04, 2019 at 11:45:43PM -0300, Rodrigo Siqueira wrote:
> After the commit def35e7c5926 ("drm/vkms: Bugfix extra vblank frame")
> some of the crc tests started to fail in the vkms with the following
> error:
> 
>  [drm:drm_crtc_add_crc_entry [drm]] *ERROR* Overflow of CRC buffer,
>     userspace reads too slow.
>  [drm] failed to queue vkms_crc_work_handle
>  ...
> 
> The aforementioned commit fixed the extra vblank added by
> `drm_crtc_arm_vblank_event()` which is invoked inside
> `vkms_crtc_atomic_flush()` if the vblank event count was zero, otherwise
> `drm_crtc_send_vblank_event()` is invoked. The fix was implemented in
> `vkms_get_vblank_timestamp()` by subtracting one period from the current
> timestamp, as the code snippet below illustrates:
> 
>  if (!in_vblank_irq)
>   *vblank_time -= output->period_ns;
> 
> The above fix works well when `drm_crtc_arm_vblank_event()` is invoked.
> However, it does not properly work when `drm_crtc_send_vblank_event()`
> executes since it subtracts the correct timestamp, which it shouldn't.
> In this case, the `drm_crtc_accurate_vblank_count()` function will
> returns the wrong frame number, which generates the aforementioned
> error. Such decrease in `get_vblank_timestamp()` produce a negative
> number in the following calculation within `drm_update_vblank_count()`:
> 
>  u64 diff_ns = ktime_to_ns(ktime_sub(t_vblank, vblank->time));
> 
> After this operation, the DIV_ROUND_CLOSEST_ULL macro is invoked using
> diff_ns with a negative number, which generates an undefined result;
> therefore, the returned frame is a huge and incorrect number. Finally,
> the code below is part of the `vkms_crc_work_handle()`, note that the
> while loop depends on the returned value from
> `drm_crtc_accurate_vblank_count()` which may cause the loop to take a
> long time to finish in case of huge value.
> 
>  frame_end = drm_crtc_accurate_vblank_count(crtc);
>  while (frame_start <= frame_end)
>    drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32);
> 
> This commit fixes this issue by checking if the vblank timestamp
> corresponding to the current software vblank counter is equal to the
> current vblank; if they are equal, it means that
> `drm_crtc_send_vblank_event()` was invoked and vkms does not need to
> discount the extra vblank, otherwise, `drm_crtc_arm_vblank_event()` was
> executed and vkms have to discount the extra vblank. This fix made the
> CRC tests work again whereas keep all tests from kms_flip working as
> well.
> 
> V2: Update commit message
> 
> Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
> Signed-off-by: Shayenne Moura <shayenneluzmoura@gmail.com>

Thanks a lot for typing up this commit message. I'm still not following
what's going on (this stuff is tricky), but now I think I can at least ask
useful scenarios.

For my understanding: Things go wrong when in the function
vkms_crtc_atomic_flush() the call to drm_crtc_vblank_get() returns 0, and
we call drm_crtc_send_vblank_event() directly? I'm not 100% from your
description above whether that's the failure case where everything blows
up.

The other part I'm having a hard time understanding: If we are in the case
where we call drm_crtc_send_vblank_event() directly, how do we end up in
the vkms_get_vblank_timestamp().

From what I can see there's not caller to where we sample a new vblank ...
I think a full backtrace that hits the new condition you're adding to
understand when exactly we're hitting it would be perfect.

> ---
>  drivers/gpu/drm/vkms/vkms_crtc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c
> index 7508815fac11..3ce60e66673e 100644
> --- a/drivers/gpu/drm/vkms/vkms_crtc.c
> +++ b/drivers/gpu/drm/vkms/vkms_crtc.c
> @@ -74,9 +74,13 @@ bool vkms_get_vblank_timestamp(struct drm_device *dev, unsigned int pipe,
>  {
>  	struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev);
>  	struct vkms_output *output = &vkmsdev->output;
> +	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>  
>  	*vblank_time = output->vblank_hrtimer.node.expires;
>  
> +	if (*vblank_time == vblank->time)
> +		return true;

Ok, I think with the above I do now have a rough idea what's going wrong.
I think that would be a bug in the drm_vblank.c. Or at least it could be,
I think I first need to better understand still what's going on here to
decided that.

I have a bit a hunch this is fallout from our vblank fudging, but I guess
we'll see.

It's definitely clear that things blow up if the vblank time somehow goes
backwards, but I don't understand yet how that's possible.
-Daniel
> +
>  	if (!in_vblank_irq)
>  		*vblank_time -= output->period_ns;
>  
> -- 
> 2.21.0
>