st/dri: decrease input lag by syncing sooner in SwapBuffers

Submitted by Marek Olšák on April 26, 2019, 2:06 a.m.

Details

Message ID 20190426020625.25308-1-maraeo@gmail.com
State New
Headers show
Series "st/dri: decrease input lag by syncing sooner in SwapBuffers" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Marek Olšák April 26, 2019, 2:06 a.m.
From: Marek Olšák <marek.olsak@amd.com>

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
  (instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
    draw frame 2
    wait frame 0
    flush frame 2
    present frame 2

    The input lag is 2 frames.

Now:
    draw frame 2
    flush frame 2
    wait frame 1
    present frame 2

    The input lag is 1 frame. Flushing is done before waiting, because
    otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
---
 src/gallium/auxiliary/util/u_screen.c         |  2 +-
 src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
 2 files changed, 11 insertions(+), 11 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@  u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
    case PIPE_CAP_MAX_VARYINGS:
       return 8;
 
    case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
       return 0;
 
    case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
       return 0;
 
    case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
-      return 2;
+      return 1;
 
    case PIPE_CAP_DMABUF:
 #ifdef PIPE_OS_LINUX
       return 1;
 #else
       return 0;
 #endif
 
    default:
       unreachable("bad PIPE_CAP_*");
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c b/src/gallium/state_trackers/dri/dri_drawable.c
index 26bfdbecc53..c1de3bed9dd 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -555,33 +555,33 @@  dri_flush(__DRIcontext *cPriv,
        *
        * This pulls a fence off the throttling queue and waits for it if the
        * number of fences on the throttling queue has reached the desired
        * number.
        *
        * Then flushes to insert a fence at the current rendering position, and
        * pushes that fence on the queue. This requires that the st_context_iface
        * flush method returns a fence even if there are no commands to flush.
        */
       struct pipe_screen *screen = drawable->screen->base.screen;
-      struct pipe_fence_handle *fence;
+      struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
 
-      fence = swap_fences_pop_front(drawable);
-      if (fence) {
-         (void) screen->fence_finish(screen, NULL, fence, PIPE_TIMEOUT_INFINITE);
-         screen->fence_reference(screen, &fence, NULL);
-      }
+      st->flush(st, flush_flags, &new_fence);
 
-      st->flush(st, flush_flags, &fence);
+      oldest_fence = swap_fences_pop_front(drawable);
+      if (oldest_fence) {
+         screen->fence_finish(screen, NULL, oldest_fence, PIPE_TIMEOUT_INFINITE);
+         screen->fence_reference(screen, &oldest_fence, NULL);
+      }
 
-      if (fence) {
-         swap_fences_push_back(drawable, fence);
-         screen->fence_reference(screen, &fence, NULL);
+      if (new_fence) {
+         swap_fences_push_back(drawable, new_fence);
+         screen->fence_reference(screen, &new_fence, NULL);
       }
    }
    else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
       st->flush(st, flush_flags, NULL);
    }
 
    if (drawable) {
       drawable->flushing = FALSE;
    }
 

Comments

On 2019-04-26 4:06 a.m., Marek Olšák wrote:
> From: Marek Olšák <marek.olsak@amd.com>
> 
> It's done by:
> - decrease the number of frames in flight by 1
> - flush before throttling in SwapBuffers
>   (instead of wait-then-flush, do flush-then-wait)
> 
> The improvement is apparent with Unigine Heaven.
> 
> Previously:
>     draw frame 2
>     wait frame 0
>     flush frame 2
>     present frame 2
> 
>     The input lag is 2 frames.
> 
> Now:
>     draw frame 2
>     flush frame 2
>     wait frame 1
>     present frame 2
> 
>     The input lag is 1 frame. Flushing is done before waiting, because
>     otherwise the device would be idle after waiting.

Nice idea. Not sure offhand about all ramifications, but certainly worth
a go.


> Nine is affected because it also uses the pipe cap.
> ---
>  src/gallium/auxiliary/util/u_screen.c         |  2 +-
>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
> index 27f51e0898e..410f17421e6 100644
> --- a/src/gallium/auxiliary/util/u_screen.c
> +++ b/src/gallium/auxiliary/util/u_screen.c
> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
>     case PIPE_CAP_MAX_VARYINGS:
>        return 8;
>  
>     case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>        return 0;
>  
>     case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>        return 0;
>  
>     case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> -      return 2;
> +      return 1;

This might be slightly misleading, as there can still be two frames in
flight (on the GPU) at the same time. Might be better to leave this at 2
(so Nine isn't affected) and adjust its treatment in
src/gallium/state_trackers/dri/dri_drawable.c .
On 26/04/2019 10:08, Michel Dänzer wrote:
> On 2019-04-26 4:06 a.m., Marek Olšák wrote:
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> It's done by:
>> - decrease the number of frames in flight by 1
>> - flush before throttling in SwapBuffers
>>    (instead of wait-then-flush, do flush-then-wait)
>>
>> The improvement is apparent with Unigine Heaven.
>>
>> Previously:
>>      draw frame 2
>>      wait frame 0
>>      flush frame 2
>>      present frame 2
>>
>>      The input lag is 2 frames.
>>
>> Now:
>>      draw frame 2
>>      flush frame 2
>>      wait frame 1
>>      present frame 2
>>
>>      The input lag is 1 frame. Flushing is done before waiting, because
>>      otherwise the device would be idle after waiting.
> Nice idea. Not sure offhand about all ramifications, but certainly worth
> a go.
>
>
>> Nine is affected because it also uses the pipe cap.
>> ---
>>   src/gallium/auxiliary/util/u_screen.c         |  2 +-
>>   src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
>>   2 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
>> index 27f51e0898e..410f17421e6 100644
>> --- a/src/gallium/auxiliary/util/u_screen.c
>> +++ b/src/gallium/auxiliary/util/u_screen.c
>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
>>      case PIPE_CAP_MAX_VARYINGS:
>>         return 8;
>>   
>>      case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>         return 0;
>>   
>>      case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>         return 0;
>>   
>>      case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> -      return 2;
>> +      return 1;
> This might be slightly misleading, as there can still be two frames in
> flight (on the GPU) at the same time. Might be better to leave this at 2
> (so Nine isn't affected) and adjust its treatment in
> src/gallium/state_trackers/dri/dri_drawable.c .
>
>
Checking what gallium nine does currently, it seems we already do flush 
then wait,
however we call swap_fences_pop_front and swap_fences_push_back in the 
reverse order compared to your patch.
We compensate by taking PIPE_CAP_MAX_FRAMES_IN_FLIGHT + 1

In conclusion, with the proposed patch, gl and nine should have the same 
behaviour (and thus if gl benefits from a value of 1, nine should as well).
I haven't have noticed input lag, I guess I have to test on heaven if 
you see a difference.
How can I slow down my gpu to test that ? I use to use the 
/sys/kernel/debug/dri/0/ vars to force low dpm, but it doesn't seem to 
be possible anymore as the related files are gone (rx480) ?


Axel



On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák <maraeo@gmail.com> wrote:
>
> From: Marek Olšák <marek.olsak@amd.com>
>
> It's done by:
> - decrease the number of frames in flight by 1
> - flush before throttling in SwapBuffers
>   (instead of wait-then-flush, do flush-then-wait)
>
> The improvement is apparent with Unigine Heaven.
>
> Previously:
>     draw frame 2
>     wait frame 0
>     flush frame 2
>     present frame 2
>
>     The input lag is 2 frames.
>
> Now:
>     draw frame 2
>     flush frame 2
>     wait frame 1
>     present frame 2
>
>     The input lag is 1 frame. Flushing is done before waiting, because
>     otherwise the device would be idle after waiting.
>
> Nine is affected because it also uses the pipe cap.
> ---
>  src/gallium/auxiliary/util/u_screen.c         |  2 +-
>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
>  2 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
> index 27f51e0898e..410f17421e6 100644
> --- a/src/gallium/auxiliary/util/u_screen.c
> +++ b/src/gallium/auxiliary/util/u_screen.c
> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
>     case PIPE_CAP_MAX_VARYINGS:
>        return 8;
>
>     case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>        return 0;
>
>     case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>        return 0;
>
>     case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> -      return 2;
> +      return 1;

would it be safer to leave the current default and let drivers opt-in
to the lower # individually?  I guess triple buffering would still be
better for some of the smaller gpu's?

disclaimer: I haven't tested this myself or looked very closely at the
dri code.. so could be misunderstanding something..

BR,
-R

>
>     case PIPE_CAP_DMABUF:
>  #ifdef PIPE_OS_LINUX
>        return 1;
>  #else
>        return 0;
>  #endif
>
>     default:
>        unreachable("bad PIPE_CAP_*");
> diff --git a/src/gallium/state_trackers/dri/dri_drawable.c b/src/gallium/state_trackers/dri/dri_drawable.c
> index 26bfdbecc53..c1de3bed9dd 100644
> --- a/src/gallium/state_trackers/dri/dri_drawable.c
> +++ b/src/gallium/state_trackers/dri/dri_drawable.c
> @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
>         *
>         * This pulls a fence off the throttling queue and waits for it if the
>         * number of fences on the throttling queue has reached the desired
>         * number.
>         *
>         * Then flushes to insert a fence at the current rendering position, and
>         * pushes that fence on the queue. This requires that the st_context_iface
>         * flush method returns a fence even if there are no commands to flush.
>         */
>        struct pipe_screen *screen = drawable->screen->base.screen;
> -      struct pipe_fence_handle *fence;
> +      struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
>
> -      fence = swap_fences_pop_front(drawable);
> -      if (fence) {
> -         (void) screen->fence_finish(screen, NULL, fence, PIPE_TIMEOUT_INFINITE);
> -         screen->fence_reference(screen, &fence, NULL);
> -      }
> +      st->flush(st, flush_flags, &new_fence);
>
> -      st->flush(st, flush_flags, &fence);
> +      oldest_fence = swap_fences_pop_front(drawable);
> +      if (oldest_fence) {
> +         screen->fence_finish(screen, NULL, oldest_fence, PIPE_TIMEOUT_INFINITE);
> +         screen->fence_reference(screen, &oldest_fence, NULL);
> +      }
>
> -      if (fence) {
> -         swap_fences_push_back(drawable, fence);
> -         screen->fence_reference(screen, &fence, NULL);
> +      if (new_fence) {
> +         swap_fences_push_back(drawable, new_fence);
> +         screen->fence_reference(screen, &new_fence, NULL);
>        }
>     }
>     else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
>        st->flush(st, flush_flags, NULL);
>     }
>
>     if (drawable) {
>        drawable->flushing = FALSE;
>     }
>
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On 27/04/2019 18:13, Rob Clark wrote:
> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák <maraeo@gmail.com> wrote:
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> It's done by:
>> - decrease the number of frames in flight by 1
>> - flush before throttling in SwapBuffers
>>    (instead of wait-then-flush, do flush-then-wait)
>>
>> The improvement is apparent with Unigine Heaven.
>>
>> Previously:
>>      draw frame 2
>>      wait frame 0
>>      flush frame 2
>>      present frame 2
>>
>>      The input lag is 2 frames.
>>
>> Now:
>>      draw frame 2
>>      flush frame 2
>>      wait frame 1
>>      present frame 2
>>
>>      The input lag is 1 frame. Flushing is done before waiting, because
>>      otherwise the device would be idle after waiting.
>>
>> Nine is affected because it also uses the pipe cap.
>> ---
>>   src/gallium/auxiliary/util/u_screen.c         |  2 +-
>>   src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
>>   2 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
>> index 27f51e0898e..410f17421e6 100644
>> --- a/src/gallium/auxiliary/util/u_screen.c
>> +++ b/src/gallium/auxiliary/util/u_screen.c
>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
>>      case PIPE_CAP_MAX_VARYINGS:
>>         return 8;
>>
>>      case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>         return 0;
>>
>>      case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>         return 0;
>>
>>      case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> -      return 2;
>> +      return 1;
> would it be safer to leave the current default and let drivers opt-in
> to the lower # individually?  I guess triple buffering would still be
> better for some of the smaller gpu's?
>
> disclaimer: I haven't tested this myself or looked very closely at the
> dri code.. so could be misunderstanding something..
>
> BR,
> -R


I think I can shed some light on the issue to justify (or not) the change.

If we don't do throttling and the CPU renders frames at a faster rate 
than what the GPU can render,
the GPU can become quite late and cause huge frame lag.

The throttling involves forcing a (CPU) wait when a frame is presented 
if the 'x' previous images have not finished rendering.

The lower 'x', the lower the frame lag.

However if 'x' is too low (waiting current frame is rendered for 
example), the GPU can be idle until the CPU is flushing new commands.

Now there is a choice to be made for the value of 'x'. 1 or 2 are 
reasonable values.

if 'x' is 1, we wait the previous frame was rendered when we present the 
current frame. For '2' we wait the frame before.


Thus for smaller gpu's, a value of 1 is better than 2 as it is more 
affected by the frame lag (as frames take longer to render).


However if a game is rendering at a very unstable framerate (some frames 
needing more work than others), using a value of 2 is safer
to maximize performance. (As using a value of 1 would lead to wait if we 
get a frame that takes particularly long, as using 2 smooths that a bit)


I remember years ago I had extremely unstable fps when using catalyst on 
Portal for example. But I believe it is more a driver issue than a game 
issue.

If we assume games don't have unstable framerate, (which seems 
reasonable assumption), using 1 as default makes sense.


If one wants to test experimentally for regression, the ideal test case 
if when the GPU renders at about the same framerate as the CPU feeds it.



Axel



>
>>      case PIPE_CAP_DMABUF:
>>   #ifdef PIPE_OS_LINUX
>>         return 1;
>>   #else
>>         return 0;
>>   #endif
>>
>>      default:
>>         unreachable("bad PIPE_CAP_*");
>> diff --git a/src/gallium/state_trackers/dri/dri_drawable.c b/src/gallium/state_trackers/dri/dri_drawable.c
>> index 26bfdbecc53..c1de3bed9dd 100644
>> --- a/src/gallium/state_trackers/dri/dri_drawable.c
>> +++ b/src/gallium/state_trackers/dri/dri_drawable.c
>> @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
>>          *
>>          * This pulls a fence off the throttling queue and waits for it if the
>>          * number of fences on the throttling queue has reached the desired
>>          * number.
>>          *
>>          * Then flushes to insert a fence at the current rendering position, and
>>          * pushes that fence on the queue. This requires that the st_context_iface
>>          * flush method returns a fence even if there are no commands to flush.
>>          */
>>         struct pipe_screen *screen = drawable->screen->base.screen;
>> -      struct pipe_fence_handle *fence;
>> +      struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
>>
>> -      fence = swap_fences_pop_front(drawable);
>> -      if (fence) {
>> -         (void) screen->fence_finish(screen, NULL, fence, PIPE_TIMEOUT_INFINITE);
>> -         screen->fence_reference(screen, &fence, NULL);
>> -      }
>> +      st->flush(st, flush_flags, &new_fence);
>>
>> -      st->flush(st, flush_flags, &fence);
>> +      oldest_fence = swap_fences_pop_front(drawable);
>> +      if (oldest_fence) {
>> +         screen->fence_finish(screen, NULL, oldest_fence, PIPE_TIMEOUT_INFINITE);
>> +         screen->fence_reference(screen, &oldest_fence, NULL);
>> +      }
>>
>> -      if (fence) {
>> -         swap_fences_push_back(drawable, fence);
>> -         screen->fence_reference(screen, &fence, NULL);
>> +      if (new_fence) {
>> +         swap_fences_push_back(drawable, new_fence);
>> +         screen->fence_reference(screen, &new_fence, NULL);
>>         }
>>      }
>>      else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
>>         st->flush(st, flush_flags, NULL);
>>      }
>>
>>      if (drawable) {
>>         drawable->flushing = FALSE;
>>      }
>>
>> --
>> 2.17.1
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Sat, Apr 27, 2019 at 9:52 AM Axel Davy <davyaxel0@gmail.com> wrote:
>
> On 27/04/2019 18:13, Rob Clark wrote:
> > On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák <maraeo@gmail.com> wrote:
> >> From: Marek Olšák <marek.olsak@amd.com>
> >>
> >> It's done by:
> >> - decrease the number of frames in flight by 1
> >> - flush before throttling in SwapBuffers
> >>    (instead of wait-then-flush, do flush-then-wait)
> >>
> >> The improvement is apparent with Unigine Heaven.
> >>
> >> Previously:
> >>      draw frame 2
> >>      wait frame 0
> >>      flush frame 2
> >>      present frame 2
> >>
> >>      The input lag is 2 frames.
> >>
> >> Now:
> >>      draw frame 2
> >>      flush frame 2
> >>      wait frame 1
> >>      present frame 2
> >>
> >>      The input lag is 1 frame. Flushing is done before waiting, because
> >>      otherwise the device would be idle after waiting.
> >>
> >> Nine is affected because it also uses the pipe cap.
> >> ---
> >>   src/gallium/auxiliary/util/u_screen.c         |  2 +-
> >>   src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
> >>   2 files changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
> >> index 27f51e0898e..410f17421e6 100644
> >> --- a/src/gallium/auxiliary/util/u_screen.c
> >> +++ b/src/gallium/auxiliary/util/u_screen.c
> >> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
> >>      case PIPE_CAP_MAX_VARYINGS:
> >>         return 8;
> >>
> >>      case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
> >>         return 0;
> >>
> >>      case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
> >>         return 0;
> >>
> >>      case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
> >> -      return 2;
> >> +      return 1;
> > would it be safer to leave the current default and let drivers opt-in
> > to the lower # individually?  I guess triple buffering would still be
> > better for some of the smaller gpu's?
> >
> > disclaimer: I haven't tested this myself or looked very closely at the
> > dri code.. so could be misunderstanding something..
> >
> > BR,
> > -R
>
>
> I think I can shed some light on the issue to justify (or not) the change.
>
> If we don't do throttling and the CPU renders frames at a faster rate
> than what the GPU can render,
> the GPU can become quite late and cause huge frame lag.
>
> The throttling involves forcing a (CPU) wait when a frame is presented
> if the 'x' previous images have not finished rendering.
>
> The lower 'x', the lower the frame lag.
>
> However if 'x' is too low (waiting current frame is rendered for
> example), the GPU can be idle until the CPU is flushing new commands.
>
> Now there is a choice to be made for the value of 'x'. 1 or 2 are
> reasonable values.
>
> if 'x' is 1, we wait the previous frame was rendered when we present the
> current frame. For '2' we wait the frame before.
>
>
> Thus for smaller gpu's, a value of 1 is better than 2 as it is more
> affected by the frame lag (as frames take longer to render).
>

I get the latency aspect.. but my comment was more about latency vs
framerate (or maybe more cynically, about games vs benchmarks :-P)

BR,
-R

>
> However if a game is rendering at a very unstable framerate (some frames
> needing more work than others), using a value of 2 is safer
> to maximize performance. (As using a value of 1 would lead to wait if we
> get a frame that takes particularly long, as using 2 smooths that a bit)
>
>
> I remember years ago I had extremely unstable fps when using catalyst on
> Portal for example. But I believe it is more a driver issue than a game
> issue.
>
> If we assume games don't have unstable framerate, (which seems
> reasonable assumption), using 1 as default makes sense.
>
>
> If one wants to test experimentally for regression, the ideal test case
> if when the GPU renders at about the same framerate as the CPU feeds it.
>
>
>
> Axel
>
>
>
> >
> >>      case PIPE_CAP_DMABUF:
> >>   #ifdef PIPE_OS_LINUX
> >>         return 1;
> >>   #else
> >>         return 0;
> >>   #endif
> >>
> >>      default:
> >>         unreachable("bad PIPE_CAP_*");
> >> diff --git a/src/gallium/state_trackers/dri/dri_drawable.c b/src/gallium/state_trackers/dri/dri_drawable.c
> >> index 26bfdbecc53..c1de3bed9dd 100644
> >> --- a/src/gallium/state_trackers/dri/dri_drawable.c
> >> +++ b/src/gallium/state_trackers/dri/dri_drawable.c
> >> @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
> >>          *
> >>          * This pulls a fence off the throttling queue and waits for it if the
> >>          * number of fences on the throttling queue has reached the desired
> >>          * number.
> >>          *
> >>          * Then flushes to insert a fence at the current rendering position, and
> >>          * pushes that fence on the queue. This requires that the st_context_iface
> >>          * flush method returns a fence even if there are no commands to flush.
> >>          */
> >>         struct pipe_screen *screen = drawable->screen->base.screen;
> >> -      struct pipe_fence_handle *fence;
> >> +      struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
> >>
> >> -      fence = swap_fences_pop_front(drawable);
> >> -      if (fence) {
> >> -         (void) screen->fence_finish(screen, NULL, fence, PIPE_TIMEOUT_INFINITE);
> >> -         screen->fence_reference(screen, &fence, NULL);
> >> -      }
> >> +      st->flush(st, flush_flags, &new_fence);
> >>
> >> -      st->flush(st, flush_flags, &fence);
> >> +      oldest_fence = swap_fences_pop_front(drawable);
> >> +      if (oldest_fence) {
> >> +         screen->fence_finish(screen, NULL, oldest_fence, PIPE_TIMEOUT_INFINITE);
> >> +         screen->fence_reference(screen, &oldest_fence, NULL);
> >> +      }
> >>
> >> -      if (fence) {
> >> -         swap_fences_push_back(drawable, fence);
> >> -         screen->fence_reference(screen, &fence, NULL);
> >> +      if (new_fence) {
> >> +         swap_fences_push_back(drawable, new_fence);
> >> +         screen->fence_reference(screen, &new_fence, NULL);
> >>         }
> >>      }
> >>      else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
> >>         st->flush(st, flush_flags, NULL);
> >>      }
> >>
> >>      if (drawable) {
> >>         drawable->flushing = FALSE;
> >>      }
> >>
> >> --
> >> 2.17.1
> >>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
On 27/04/2019 21:02, Rob Clark wrote:
> On Sat, Apr 27, 2019 at 9:52 AM Axel Davy <davyaxel0@gmail.com> wrote:
>> On 27/04/2019 18:13, Rob Clark wrote:
>>> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák <maraeo@gmail.com> wrote:
>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>
>>>> It's done by:
>>>> - decrease the number of frames in flight by 1
>>>> - flush before throttling in SwapBuffers
>>>>     (instead of wait-then-flush, do flush-then-wait)
>>>>
>>>> The improvement is apparent with Unigine Heaven.
>>>>
>>>> Previously:
>>>>       draw frame 2
>>>>       wait frame 0
>>>>       flush frame 2
>>>>       present frame 2
>>>>
>>>>       The input lag is 2 frames.
>>>>
>>>> Now:
>>>>       draw frame 2
>>>>       flush frame 2
>>>>       wait frame 1
>>>>       present frame 2
>>>>
>>>>       The input lag is 1 frame. Flushing is done before waiting, because
>>>>       otherwise the device would be idle after waiting.
>>>>
>>>> Nine is affected because it also uses the pipe cap.
>>>> ---
>>>>    src/gallium/auxiliary/util/u_screen.c         |  2 +-
>>>>    src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
>>>>    2 files changed, 11 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
>>>> index 27f51e0898e..410f17421e6 100644
>>>> --- a/src/gallium/auxiliary/util/u_screen.c
>>>> +++ b/src/gallium/auxiliary/util/u_screen.c
>>>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
>>>>       case PIPE_CAP_MAX_VARYINGS:
>>>>          return 8;
>>>>
>>>>       case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>>>          return 0;
>>>>
>>>>       case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>>>          return 0;
>>>>
>>>>       case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>>>> -      return 2;
>>>> +      return 1;
>>> would it be safer to leave the current default and let drivers opt-in
>>> to the lower # individually?  I guess triple buffering would still be
>>> better for some of the smaller gpu's?
>>>
>>> disclaimer: I haven't tested this myself or looked very closely at the
>>> dri code.. so could be misunderstanding something..
>>>
>>> BR,
>>> -R
>>
>> I think I can shed some light on the issue to justify (or not) the change.
>>
>> If we don't do throttling and the CPU renders frames at a faster rate
>> than what the GPU can render,
>> the GPU can become quite late and cause huge frame lag.
>>
>> The throttling involves forcing a (CPU) wait when a frame is presented
>> if the 'x' previous images have not finished rendering.
>>
>> The lower 'x', the lower the frame lag.
>>
>> However if 'x' is too low (waiting current frame is rendered for
>> example), the GPU can be idle until the CPU is flushing new commands.
>>
>> Now there is a choice to be made for the value of 'x'. 1 or 2 are
>> reasonable values.
>>
>> if 'x' is 1, we wait the previous frame was rendered when we present the
>> current frame. For '2' we wait the frame before.
>>
>>
>> Thus for smaller gpu's, a value of 1 is better than 2 as it is more
>> affected by the frame lag (as frames take longer to render).
>>
> I get the latency aspect.. but my comment was more about latency vs
> framerate (or maybe more cynically, about games vs benchmarks :-P)
>
> BR,
> -R


As long at the GPU has work to do, performance should be maximized.

However in the case I described below, if CPU and GPU render at about 
the same framerate and
the framerate has some variations (whether it being the GPU taking more 
time for one frame, or the CPU),
using more than 1 would give a bit better performance.


Axel



>
>> However if a game is rendering at a very unstable framerate (some frames
>> needing more work than others), using a value of 2 is safer
>> to maximize performance. (As using a value of 1 would lead to wait if we
>> get a frame that takes particularly long, as using 2 smooths that a bit)
>>
>>
>> I remember years ago I had extremely unstable fps when using catalyst on
>> Portal for example. But I believe it is more a driver issue than a game
>> issue.
>>
>> If we assume games don't have unstable framerate, (which seems
>> reasonable assumption), using 1 as default makes sense.
>>
>>
>> If one wants to test experimentally for regression, the ideal test case
>> if when the GPU renders at about the same framerate as the CPU feeds it.
>>
>>
>>
>> Axel
>>
>>
>>
>>

On 2019-04-27 6:13 p.m., Rob Clark wrote:
> On Thu, Apr 25, 2019 at 7:06 PM Marek Olšák <maraeo@gmail.com> wrote:
>>
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> It's done by:
>> - decrease the number of frames in flight by 1
>> - flush before throttling in SwapBuffers
>>   (instead of wait-then-flush, do flush-then-wait)
>>
>> The improvement is apparent with Unigine Heaven.
>>
>> Previously:
>>     draw frame 2
>>     wait frame 0
>>     flush frame 2
>>     present frame 2
>>
>>     The input lag is 2 frames.
>>
>> Now:
>>     draw frame 2
>>     flush frame 2
>>     wait frame 1
>>     present frame 2
>>
>>     The input lag is 1 frame. Flushing is done before waiting, because
>>     otherwise the device would be idle after waiting.
>>
>> Nine is affected because it also uses the pipe cap.
>> ---
>>  src/gallium/auxiliary/util/u_screen.c         |  2 +-
>>  src/gallium/state_trackers/dri/dri_drawable.c | 20 +++++++++----------
>>  2 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c
>> index 27f51e0898e..410f17421e6 100644
>> --- a/src/gallium/auxiliary/util/u_screen.c
>> +++ b/src/gallium/auxiliary/util/u_screen.c
>> @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen,
>>     case PIPE_CAP_MAX_VARYINGS:
>>        return 8;
>>
>>     case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
>>        return 0;
>>
>>     case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
>>        return 0;
>>
>>     case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
>> -      return 2;
>> +      return 1;
> 
> would it be safer to leave the current default and let drivers opt-in
> to the lower # individually?  I guess triple buffering would still be
> better for some of the smaller gpu's?

This patch doesn't prevent triple buffering. The application can still
prepare up to one frame worth of GPU commands before the GPU has
finished processing the commands of the previous frame (while the
pre-previous frame is being displayed).

I just think the term "in flight" should maybe be defined a bit better,
but it's not a blocker and could be done in a follow-up patch.