[Mesa-dev] i965: Use single dispatch mode as fallback to dual object mode when possible.

Submitted by Iago Toral Quiroga on July 1, 2014, 8:06 a.m.

Details

Message ID 1404202017-14930-1-git-send-email-itoral@igalia.com
State Superseded
Headers show

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga July 1, 2014, 8:06 a.m.
Currently, when a geometry shader can't use dual object mode we fall back to
dual instance mode, however, when invocations == 1, single dispatch mode is
more performant and equally efficient in terms of register pressure.

Single dispatch mode requires that the driver can handle interleaving of
registers, but this is already supported (dual instance mode has the same
requirement).
---

There is a comment in the code suggesting that SINGLE dispatch mode could not
be used because the visitor and generator classes did not support interleaving
of general purpose registers so we should fall back to DUAL_ISNTANCE
when DUAL_OBJECT could not be used. However DUAL_INSTANCE also requires
interleaving, so unless I am missing something this does not seem to make
a lot of sense.

Indeed, SINGLE dispatch mode seems to work just fine: I forced SINGLE dispatch
mode with a batch of standalone programs I have that use geometry shaders and
they work ok. I also did a complete piglit test run with this patch applied and
observed no regressions (all this on IvyBridge).

The comment also suggests that SINGLE dispatch mode would have even less
register pressure than DUAL_INSTANCE, but given that both require interleaving
I guess they wold be the same in this regard.

According to the documentation, SINGLE dispatch mode is preferred when 
Invocations == 1 and DUAL_INSTANCE is preferred when Invocations > 1 to
optimize performance, so instead of always falling back to DUAL_INSTANCE mode
as we are doing now, or always falling back to SINGLE mode as the comment
suggested, I select between the two based on the Invocation count.

 src/mesa/drivers/dri/i965/brw_context.h           |  8 ++++---
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 26 +++++++++++------------
 src/mesa/drivers/dri/i965/gen7_gs_state.c         |  4 +---
 src/mesa/drivers/dri/i965/gen8_gs_state.c         |  4 +---
 4 files changed, 20 insertions(+), 22 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index c68167f..4473f04 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -647,10 +647,12 @@  struct brw_gs_prog_data
    int invocations;
 
    /**
-    * True if the thread should be dispatched in DUAL_INSTANCE mode, false if
-    * it should be dispatched in DUAL_OBJECT mode.
+    * Dispatch mode, can be any of:
+    * GEN7_GS_DISPATCH_MODE_DUAL_OBJECT
+    * GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE
+    * GEN7_GS_DISPATCH_MODE_SINGLE
     */
-   bool dual_instanced_dispatch;
+   int dispatch_mode;
 };
 
 /** Number of texture sampler units */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index b245924..8e4d805 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -102,10 +102,11 @@  vec4_gs_visitor::setup_payload()
 {
    int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES];
 
-   /* If we are in dual instanced mode, then attributes are going to be
-    * interleaved, so one register contains two attribute slots.
+   /* If we are in dual instanced or single mode, then attributes are going
+    * to be interleaved, so one register contains two attribute slots.
     */
-   int attributes_per_reg = c->prog_data.dual_instanced_dispatch ? 2 : 1;
+   int attributes_per_reg =
+      c->prog_data.dispatch_mode == GEN7_GS_DISPATCH_MODE_DUAL_OBJECT ? 1 : 2;
 
    /* If a geometry shader tries to read from an input that wasn't written by
     * the vertex shader, that produces undefined results, but it shouldn't
@@ -130,8 +131,7 @@  vec4_gs_visitor::setup_payload()
 
    reg = setup_varying_inputs(reg, attribute_map, attributes_per_reg);
 
-   lower_attributes_to_hw_regs(attribute_map,
-                               c->prog_data.dual_instanced_dispatch);
+   lower_attributes_to_hw_regs(attribute_map, attributes_per_reg > 1);
 
    this->first_non_payload_grf = reg;
 }
@@ -647,7 +647,7 @@  brw_gs_emit(struct brw_context *brw,
     */
    if (c->prog_data.invocations <= 1 &&
        likely(!(INTEL_DEBUG & DEBUG_NO_DUAL_OBJECT_GS))) {
-      c->prog_data.dual_instanced_dispatch = false;
+      c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT;
 
       vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */);
       if (v.run()) {
@@ -659,15 +659,15 @@  brw_gs_emit(struct brw_context *brw,
 
    /* Either we failed to compile in DUAL_OBJECT mode (probably because it
     * would have required spilling) or DUAL_OBJECT mode is disabled.  So fall
-    * back to DUAL_INSTANCED mode, which consumes fewer registers.
+    * back to DUAL_INSTANCED or SINGLE mode, which consumes fewer registers.
     *
-    * FIXME: In an ideal world we'd fall back to SINGLE mode, which would
-    * allow us to interleave general purpose registers (resulting in even less
-    * likelihood of spilling).  But at the moment, the vec4 generator and
-    * visitor classes don't have the infrastructure to interleave general
-    * purpose registers, so DUAL_INSTANCED is the best we can do.
+    * SINGLE mode is more performant when invocations == 1 and DUAL_INSTANCE
+    * mode is more performant when invocations > 1.
     */
-   c->prog_data.dual_instanced_dispatch = true;
+   if (c->prog_data.invocations <= 1)
+      c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_SINGLE;
+   else
+      c->prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE;
 
    vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */);
    if (!v.run()) {
diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c b/src/mesa/drivers/dri/i965/gen7_gs_state.c
index 30dfa6b..cb970d5 100644
--- a/src/mesa/drivers/dri/i965/gen7_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c
@@ -145,9 +145,7 @@  upload_gs_state(struct brw_context *brw)
           GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) |
          ((brw->gs.prog_data->invocations - 1) <<
           GEN7_GS_INSTANCE_CONTROL_SHIFT) |
-         (brw->gs.prog_data->dual_instanced_dispatch ?
-          GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE :
-          GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) |
+         brw->gs.prog_data->dispatch_mode |
          GEN6_GS_STATISTICS_ENABLE |
          (brw->gs.prog_data->include_primitive_id ?
           GEN7_GS_INCLUDE_PRIMITIVE_ID : 0) |
diff --git a/src/mesa/drivers/dri/i965/gen8_gs_state.c b/src/mesa/drivers/dri/i965/gen8_gs_state.c
index a0f933c..1ac62e2 100644
--- a/src/mesa/drivers/dri/i965/gen8_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_gs_state.c
@@ -83,9 +83,7 @@  gen8_upload_gs_state(struct brw_context *brw)
       OUT_BATCH(((brw->max_gs_threads / 2 - 1) << HSW_GS_MAX_THREADS_SHIFT) |
                 (brw->gs.prog_data->control_data_header_size_hwords <<
                  GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) |
-                (brw->gs.prog_data->dual_instanced_dispatch ?
-                 GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE :
-                 GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) |
+                brw->gs.prog_data->dispatch_mode |
                 GEN6_GS_STATISTICS_ENABLE |
                 (brw->gs.prog_data->include_primitive_id ?
                  GEN7_GS_INCLUDE_PRIMITIVE_ID : 0) |