[mesa,v2] i965/gen8+: bo in state base address must be in 32-bit address range

Submitted by Michel Thierry on July 1, 2015, 3:28 p.m.

Details

Message ID 1435764490-12029-3-git-send-email-michel.thierry@intel.com
State New
Headers show

Not browsing as part of any series.

Commit Message

Michel Thierry July 1, 2015, 3:28 p.m.
Gen8+ supports 48-bit virtual addresses, but some objects must always be
allocated inside the 32-bit address range.

In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap or Intruction State Heap must be in a 32-bit range
(GSH / ISH), because the General State Offset and Instruction State Offset
are limited to 32-bits.

Use drm_intel_bo_emit_reloc_48bit when the 4GB limit is not necessary, and
the bo can be in the full address space.

This commit introduces a dependency of libdrm 2.4.63, which introduces the
drm_intel_bo_emit_reloc_48bit function.

v2: s/48baddress/48b_address/,
    Only use in OUT_RELOC64 cases, OUT_RELOC implies a 32-bit address offset
    is needed (Ben)

Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: mesa-dev@lists.freedesktop.org
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 configure.ac                                  |  2 +-
 src/mesa/drivers/dri/i965/gen8_misc_state.c   | 23 +++++++++++++++--------
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  6 +++---
 3 files changed, 19 insertions(+), 12 deletions(-)

Patch hide | download patch | download mbox

diff --git a/configure.ac b/configure.ac
index af61aa2..c92ca44 100644
--- a/configure.ac
+++ b/configure.ac
@@ -68,7 +68,7 @@  AC_SUBST([OSMESA_VERSION])
 dnl Versions for external dependencies
 LIBDRM_REQUIRED=2.4.38
 LIBDRM_RADEON_REQUIRED=2.4.56
-LIBDRM_INTEL_REQUIRED=2.4.60
+LIBDRM_INTEL_REQUIRED=2.4.63
 LIBDRM_NVVIEUX_REQUIRED=2.4.33
 LIBDRM_NOUVEAU_REQUIRED="2.4.33 libdrm >= 2.4.41"
 LIBDRM_FREEDRENO_REQUIRED=2.4.57
diff --git a/src/mesa/drivers/dri/i965/gen8_misc_state.c b/src/mesa/drivers/dri/i965/gen8_misc_state.c
index b20038e..5c8924d 100644
--- a/src/mesa/drivers/dri/i965/gen8_misc_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_misc_state.c
@@ -28,6 +28,11 @@ 
 
 /**
  * Define the base addresses which some state is referenced from.
+ *
+ * Use OUT_RELOC instead of OUT_RELOC64, because the General State
+ * Offset and Instruction State Offset are limited to 32-bits by
+ * hardware [and add OUT_BATCH(0) after each OUT_RELOC to complete
+ * the number of dwords needed for STATE_BASE_ADDRESS].
  */
 void gen8_upload_state_base_address(struct brw_context *brw)
 {
@@ -41,19 +46,21 @@  void gen8_upload_state_base_address(struct brw_context *brw)
    OUT_BATCH(0);
    OUT_BATCH(mocs_wb << 16);
    /* Surface state base address: */
-   OUT_RELOC64(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
-               mocs_wb << 4 | 1);
+   OUT_RELOC(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+             mocs_wb << 4 | 1);
+   OUT_BATCH(0);
    /* Dynamic state base address: */
-   OUT_RELOC64(brw->batch.bo,
-               I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
-               mocs_wb << 4 | 1);
+   OUT_RELOC(brw->batch.bo,
+             I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
+             mocs_wb << 4 | 1);
+   OUT_BATCH(0);
    /* Indirect object base address: MEDIA_OBJECT data */
    OUT_BATCH(mocs_wb << 4 | 1);
    OUT_BATCH(0);
    /* Instruction base address: shader kernels (incl. SIP) */
-   OUT_RELOC64(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
-               mocs_wb << 4 | 1);
-
+   OUT_RELOC(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
+             mocs_wb << 4 | 1);
+   OUT_BATCH(0);
    /* General state buffer size */
    OUT_BATCH(0xfffff001);
    /* Dynamic state buffer size */
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 54081a1..220a35b 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -411,9 +411,9 @@  intel_batchbuffer_emit_reloc64(struct brw_context *brw,
                                uint32_t read_domains, uint32_t write_domain,
 			       uint32_t delta)
 {
-   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
-                                     buffer, delta,
-                                     read_domains, write_domain);
+   int ret = drm_intel_bo_emit_reloc_48bit(brw->batch.bo, 4*brw->batch.used,
+                                           buffer, delta,
+                                           read_domains, write_domain);
    assert(ret == 0);
    (void) ret;
 

Comments

On Wed, Jul 01, 2015 at 04:28:10PM +0100, Michel Thierry wrote:
> Gen8+ supports 48-bit virtual addresses, but some objects must always be
> allocated inside the 32-bit address range.
> 
> In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
> General State Heap or Intruction State Heap must be in a 32-bit range
> (GSH / ISH), because the General State Offset and Instruction State Offset
> are limited to 32-bits.
> 
> Use drm_intel_bo_emit_reloc_48bit when the 4GB limit is not necessary, and
> the bo can be in the full address space.
> 
> This commit introduces a dependency of libdrm 2.4.63, which introduces the
> drm_intel_bo_emit_reloc_48bit function.
> 
> v2: s/48baddress/48b_address/,
>     Only use in OUT_RELOC64 cases, OUT_RELOC implies a 32-bit address offset
>     is needed (Ben)
> 
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: mesa-dev@lists.freedesktop.org
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  configure.ac                                  |  2 +-
>  src/mesa/drivers/dri/i965/gen8_misc_state.c   | 23 +++++++++++++++--------
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c |  6 +++---
>  3 files changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index af61aa2..c92ca44 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -68,7 +68,7 @@ AC_SUBST([OSMESA_VERSION])
>  dnl Versions for external dependencies
>  LIBDRM_REQUIRED=2.4.38
>  LIBDRM_RADEON_REQUIRED=2.4.56
> -LIBDRM_INTEL_REQUIRED=2.4.60
> +LIBDRM_INTEL_REQUIRED=2.4.63
>  LIBDRM_NVVIEUX_REQUIRED=2.4.33
>  LIBDRM_NOUVEAU_REQUIRED="2.4.33 libdrm >= 2.4.41"
>  LIBDRM_FREEDRENO_REQUIRED=2.4.57
> diff --git a/src/mesa/drivers/dri/i965/gen8_misc_state.c b/src/mesa/drivers/dri/i965/gen8_misc_state.c
> index b20038e..5c8924d 100644
> --- a/src/mesa/drivers/dri/i965/gen8_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_misc_state.c
> @@ -28,6 +28,11 @@
>  
>  /**
>   * Define the base addresses which some state is referenced from.
> + *
> + * Use OUT_RELOC instead of OUT_RELOC64, because the General State
> + * Offset and Instruction State Offset are limited to 32-bits by
> + * hardware [and add OUT_BATCH(0) after each OUT_RELOC to complete
> + * the number of dwords needed for STATE_BASE_ADDRESS].
>   */
>  void gen8_upload_state_base_address(struct brw_context *brw)
>  {
> @@ -41,19 +46,21 @@ void gen8_upload_state_base_address(struct brw_context *brw)
>     OUT_BATCH(0);
>     OUT_BATCH(mocs_wb << 16);
>     /* Surface state base address: */
> -   OUT_RELOC64(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
> -               mocs_wb << 4 | 1);
> +   OUT_RELOC(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
> +             mocs_wb << 4 | 1);
> +   OUT_BATCH(0);
>     /* Dynamic state base address: */
> -   OUT_RELOC64(brw->batch.bo,
> -               I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
> -               mocs_wb << 4 | 1);
> +   OUT_RELOC(brw->batch.bo,
> +             I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
> +             mocs_wb << 4 | 1);
> +   OUT_BATCH(0);

Note this is in general dangerous since it lies to the kernel about the
value written into the batch corresponding to the 64bit relocation.
(Aliasing a high object here isn't much of an issue since the relocation
will be forced by having to rebind the buffer low.)

Personally I would have stuck with the OUT_RELOC64 so that any future
cut'n'paste didn't have a subtle bug and that the wa was clearly
indicated by clearing the execobject flag for brw->batch.bo and
brw->cache.bo.

>     /* Indirect object base address: MEDIA_OBJECT data */
>     OUT_BATCH(mocs_wb << 4 | 1);
>     OUT_BATCH(0);
>     /* Instruction base address: shader kernels (incl. SIP) */
> -   OUT_RELOC64(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
> -               mocs_wb << 4 | 1);
> -
> +   OUT_RELOC(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
> +             mocs_wb << 4 | 1);
> +   OUT_BATCH(0);
>     /* General state buffer size */
>     OUT_BATCH(0xfffff001);
>     /* Dynamic state buffer size */
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index 54081a1..220a35b 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -411,9 +411,9 @@ intel_batchbuffer_emit_reloc64(struct brw_context *brw,
>                                 uint32_t read_domains, uint32_t write_domain,
>  			       uint32_t delta)
>  {
> -   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
> -                                     buffer, delta,
> -                                     read_domains, write_domain);
> +   int ret = drm_intel_bo_emit_reloc_48bit(brw->batch.bo, 4*brw->batch.used,
> +                                           buffer, delta,
> +                                           read_domains, write_domain);

I would have just exposed setting the flag on the execobject.  That way you
still have existing userspace safe by default, can set a
bufmgr-level flag to enable 48bit support by default and then
individually turn off 48bit support for the couple of buffers that
require it.

Just my 2c because I have seen what trouble lying to the kernel about
relocation values can cause and would rather not have that in the
interface by design.
-Chris
On 7/2/2015 8:21 AM, Chris Wilson wrote:
> On Wed, Jul 01, 2015 at 04:28:10PM +0100, Michel Thierry wrote:
>> Gen8+ supports 48-bit virtual addresses, but some objects must always be
>> allocated inside the 32-bit address range.
>>
>>      OUT_BATCH(0);
>>      OUT_BATCH(mocs_wb << 16);
>>      /* Surface state base address: */
>> -   OUT_RELOC64(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
>> -               mocs_wb << 4 | 1);
>> +   OUT_RELOC(brw->batch.bo, I915_GEM_DOMAIN_SAMPLER, 0,
>> +             mocs_wb << 4 | 1);
>> +   OUT_BATCH(0);
>>      /* Dynamic state base address: */
>> -   OUT_RELOC64(brw->batch.bo,
>> -               I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
>> -               mocs_wb << 4 | 1);
>> +   OUT_RELOC(brw->batch.bo,
>> +             I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0,
>> +             mocs_wb << 4 | 1);
>> +   OUT_BATCH(0);
>
> Note this is in general dangerous since it lies to the kernel about the
> value written into the batch corresponding to the 64bit relocation.
> (Aliasing a high object here isn't much of an issue since the relocation
> will be forced by having to rebind the buffer low.)
>
> Personally I would have stuck with the OUT_RELOC64 so that any future
> cut'n'paste didn't have a subtle bug and that the wa was clearly
> indicated by clearing the execobject flag for brw->batch.bo and
> brw->cache.bo.
>
>>      /* Indirect object base address: MEDIA_OBJECT data */
>>      OUT_BATCH(mocs_wb << 4 | 1);
>>      OUT_BATCH(0);
>>      /* Instruction base address: shader kernels (incl. SIP) */
>> -   OUT_RELOC64(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
>> -               mocs_wb << 4 | 1);
>> -
>> +   OUT_RELOC(brw->cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
>> +             mocs_wb << 4 | 1);
>> +   OUT_BATCH(0);
>>      /* General state buffer size */
>>      OUT_BATCH(0xfffff001);
>>      /* Dynamic state buffer size */
>> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
>> index 54081a1..220a35b 100644
>> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
>> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
>> @@ -411,9 +411,9 @@ intel_batchbuffer_emit_reloc64(struct brw_context *brw,
>>                                  uint32_t read_domains, uint32_t write_domain,
>>   			       uint32_t delta)
>>   {
>> -   int ret = drm_intel_bo_emit_reloc(brw->batch.bo, 4*brw->batch.used,
>> -                                     buffer, delta,
>> -                                     read_domains, write_domain);
>> +   int ret = drm_intel_bo_emit_reloc_48bit(brw->batch.bo, 4*brw->batch.used,
>> +                                           buffer, delta,
>> +                                           read_domains, write_domain);
>
> I would have just exposed setting the flag on the execobject.  That way you
> still have existing userspace safe by default, can set a
> bufmgr-level flag to enable 48bit support by default and then
> individually turn off 48bit support for the couple of buffers that
> require it.
>
So, something more like v1? 
http://mid.gmane.org/1435062089-19877-2-git-send-email-michel.thierry@intel.com

(apart of the hacky macro name)

> Just my 2c because I have seen what trouble lying to the kernel about
> relocation values can cause and would rather not have that in the
> interface by design.
> -Chris
>
On Thu, Jul 02, 2015 at 02:53:45PM +0100, Michel Thierry wrote:
> >I would have just exposed setting the flag on the execobject.  That way you
> >still have existing userspace safe by default, can set a
> >bufmgr-level flag to enable 48bit support by default and then
> >individually turn off 48bit support for the couple of buffers that
> >require it.
> >
> So, something more like v1? http://mid.gmane.org/1435062089-19877-2-git-send-email-michel.thierry@intel.com
> 
> (apart of the hacky macro name)

Yes. I value having those 2-dwords marked as being the relocation
address even if the top dword must be zero.

You can adopt the libdrm interface Ben suggested underneath the
macros, but having it marked as being both a 64-bit relocation and a w/a
in the source is pricless imo.
-Chris