intel/tools: new intel_sanitize_gpu tool

Submitted by Rogovin, Kevin on Dec. 8, 2017, 10:54 a.m.

Details

Reviewer None
Submitted Dec. 8, 2017, 10:54 a.m.
Last Updated Feb. 14, 2018, 7:39 a.m.
Revision 14

Cover Letter(s)

Revision 1
      From: Kevin Rogovin <kevin.rogovin@intel.com>

This patch series adds a new debug option to pad each GEM BO
allocated by the brw_bufmgr with random noise values which
are then checked after each batchbuffer dispatch to the kernel.
This can be quite valuable to find diffucult to track down
heisenberg style bugs.

A possible follow-up series would be to write to stderr (or
another logging mechanism) if the OOB write is to a GEM BO that
backs a GL buffer object; that features would be quite useful for
application developers.

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for
    each bo used in batchbuffer is correct

 src/intel/common/gen_debug.c                  |  1 +
 src/intel/common/gen_debug.h                  |  1 +
 src/mesa/drivers/dri/i965/brw_bufmgr.c        | 68 ++++++++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_bufmgr.h        | 12 +++++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 15 ++++++
 5 files changed, 96 insertions(+), 1 deletion(-)
    
Revision 2
      From: Kevin Rogovin <kevin.rogovin@intel.com>

This patch series adds a new debug option to pad each GEM BO
allocated by the brw_bufmgr with pseudo-(weak) random noise values
which are then checked after each batchbuffer dispatch to the kernel.
This can be quite valuable to find diffucult to track down heisenberg
style bugs.

A possible follow-up series would be to write to stderr (or
another logging mechanism) if the OOB write is to a GEM BO that
backs a GL buffer object; that features would be quite useful for
application developers.

v2:
 Change from using rand() to using internal generating function
 (requested/suggested by Jason Ekstrand)

 Avoid having extra pointers in brw_bo struct via using the internal
 function and allocating buffer for pread at brw_bo_padding_is_good()
 (requested/suggested by Jason Ekstrand)

 Comments indicating that pread ioctl will do the required waiting
 for GPU commands to finish

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for
    each bo used in batchbuffer is correct

 src/intel/common/gen_debug.c                  |   1 +
 src/intel/common/gen_debug.h                  |   1 +
 src/mesa/drivers/dri/i965/brw_bufmgr.c        | 105 +++++++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_bufmgr.h        |   8 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  19 +++++
 5 files changed, 133 insertions(+), 1 deletion(-)
    
Revision 3
      From: Kevin Rogovin <kevin.rogovin@intel.com>

This patch series adds a new debug option to pad each GEM BO
allocated by the brw_bufmgr with pseudo-(weak) random noise values
which are then checked after each batchbuffer dispatch to the kernel.
This can be quite valuable to find diffucult to track down heisenberg
style bugs.

A possible follow-up series would be to write to stderr (or
another logging mechanism) if the OOB write is to a GEM BO that
backs a GL buffer object; that features would be quite useful for
application developers.

(resending because I sent out v2 earlier today without tagging it as v2).

v2:
 Change from using rand() to using internal generating function
 (requested/suggested by Jason Ekstrand)

 Avoid having extra pointers in brw_bo struct via using the internal
 function and allocating buffer for pread at brw_bo_padding_is_good()
 (requested/suggested by Jason Ekstrand)

 Comments indicating that pread ioctl will do the required waiting
 for GPU commands to finish

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for
    each bo used in batchbuffer is correct

 src/intel/common/gen_debug.c                  |   1 +
 src/intel/common/gen_debug.h                  |   1 +
 src/mesa/drivers/dri/i965/brw_bufmgr.c        | 107 +++++++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_bufmgr.h        |   8 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  19 +++++
 5 files changed, 135 insertions(+), 1 deletion(-)
    
Revision 4
      From: Kevin Rogovin <kevin.rogovin@intel.com>

This patch series adds a new debug option to pad each GEM BO
allocated by the brw_bufmgr with pseudo-(weak) random noise values
which are then checked after each batchbuffer dispatch to the kernel.
This can be quite valuable to find diffucult to track down heisenberg
style bugs.

A possible follow-up series would be to write to stderr (or
another logging mechanism) if the OOB write is to a GEM BO that
backs a GL buffer object; that features would be quite useful for
application developers.

v3:
 Change from using pread to mapping buffer padding on checking
 noise padding.
 (spawned from Chris Wilson feedback)

 Use gen_invalidate_range() to make sure values read are correct.
 (suggested/requested by Chris Wilson)

 Add comment to declaration of brw_bo_padding_is_good() indicating
 that one needs to wait for the GPU to finish the rendering of the
 contents of the batchbuffer that uses the GEM BO before calling
 brw_bo_padding_is_good().
 (spawned from Chris Wilson feedback)

 Call brw_bo_wait_rendering() when DEBUG_OUT_OF_BOUND_CHK bit
 is up before using brw_bo_padding_is_good().
 (spawned from Chris Wilson feedback)

v2:
 Change from using rand() to using internal generating function
 (requested/suggested by Jason Ekstrand)

 Avoid having extra pointers in brw_bo struct via using the internal
 function and allocating buffer for pread at brw_bo_padding_is_good()
 (requested/suggested by Jason Ekstrand)

 Comments indicating that pread ioctl will do the required waiting
 for GPU commands to finish

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for
    each bo used in batchbuffer is correct

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for
    each bo used in batchbuffer is correct

 src/intel/common/gen_debug.c                  |   1 +
 src/intel/common/gen_debug.h                  |   1 +
 src/mesa/drivers/dri/i965/brw_bufmgr.c        | 115 +++++++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_bufmgr.h        |  13 +++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  22 ++++-
 5 files changed, 150 insertions(+), 2 deletions(-)
    
Revision 5
      From: Kevin Rogovin <kevin.rogovin@intel.com>

This patch series adds a new debug option to pad each GEM BO
allocated by the brw_bufmgr with (weak) pseudo-random noise values
which are then checked after each batchbuffer dispatch to the kernel.
This can be quite valuable to find diffucult to track down heisenberg
style bugs.

A possible follow-up series would be to write to stderr (or
another logging mechanism) if the OOB write is to a GEM BO that
backs a GL buffer object; that features would be quite useful for
application developers.

I am resending this series because the Mesa-dev archives had
lost the posting of the series; Patch 2/3 was reviewed by Chris
Wilson with the caveat of stylistic nitpicks that he thought
better handled by Kenneth and/or Ian.

v3:
 Change from using pread to mapping buffer padding on checking
 noise padding.
 (spawned from Chris Wilson feedback)

 Use gen_invalidate_range() to make sure values read are correct.
 (suggested/requested by Chris Wilson)

 Add comment to declaration of brw_bo_padding_is_good() indicating
 that one needs to wait for the GPU to finish the rendering of the
 contents of the batchbuffer that uses the GEM BO before calling
 brw_bo_padding_is_good().
 (spawned from Chris Wilson feedback)

 Call brw_bo_wait_rendering() when DEBUG_OUT_OF_BOUND_CHK bit
 is up before using brw_bo_padding_is_good().
 (spawned from Chris Wilson feedback)

v2:
 Change from using rand() to using internal generating function
 (requested/suggested by Jason Ekstrand)

 Avoid having extra pointers in brw_bo struct via using the internal
 function and allocating buffer for pread at brw_bo_padding_is_good()
 (requested/suggested by Jason Ekstrand)

 Comments indicating that pread ioctl will do the required waiting
 for GPU commands to finish

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for
    each bo used in batchbuffer is correct

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for
    each bo used in batchbuffer is correct

 src/intel/common/gen_debug.c                  |   1 +
 src/intel/common/gen_debug.h                  |   1 +
 src/mesa/drivers/dri/i965/brw_bufmgr.c        | 115 +++++++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_bufmgr.h        |  13 +++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  22 ++++-
 5 files changed, 150 insertions(+), 2 deletions(-)
    
Revision 8
      From: Kevin Rogovin <kevin.rogovin@intel.com>

This patch series adds a new debug option to pad each GEM BO allocated
by the brw_bufmgr with (weak) pseudo-random noise values which are then
checked after each batchbuffer dispatch to the kernel. This can be quite
valuable to find diffucult to track down heisenberg style bugs.

A possible follow-up series would be to write to stderr (or another
logging mechanism) if the OOB write is to a GEM BO that backs a GL
buffer object; that features would be quite useful for application
developers.

v4:
 Change debug macro name value to DEBUG_CHECK_OOB.
 (suggested/requested by Jason Ekstrand)

 Use map as well when filling the noise values to make code style more
 consistent.
 (suggested/requested by Jason Ekstrand)

v3:
 Change from using pread to mapping buffer padding on checking noise
 padding.
 (spawned from Chris Wilson feedback)

 Use gen_invalidate_range() to make sure values read are correct.
 (suggested/requested by Chris Wilson)

 Add comment to declaration of brw_bo_padding_is_good() indicating
 that one needs to wait for the GPU to finish the rendering of the
 contents of the batchbuffer that uses the GEM BO before calling
 brw_bo_padding_is_good().
 (spawned from Chris Wilson feedback)

 Call brw_bo_wait_rendering() when DEBUG_OUT_OF_BOUND_CHK bit
 is up before using brw_bo_padding_is_good().
 (spawned from Chris Wilson feedback)

v2:
 Change from using rand() to using internal generating function
 (requested/suggested by Jason Ekstrand)

 Avoid having extra pointers in brw_bo struct via using the internal
 function and allocating buffer for pread at brw_bo_padding_is_good()
 (requested/suggested by Jason Ekstrand)

 Comments indicating that pread ioctl will do the required waiting
 for GPU commands to finish

Kevin Rogovin (3):
  intel/common:add debug flag for adding and checking padding on BO's
  i965: add noise padding to buffer object and function to check if
    noise is correct
  i965: if DEBUG_CHECK_OOB is up, check that noise padding for each bo
    used in batchbuffer is correct

 src/intel/common/gen_debug.c                  |   1 +
 src/intel/common/gen_debug.h                  |   1 +
 src/mesa/drivers/dri/i965/brw_bufmgr.c        | 101 +++++++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_bufmgr.h        |  13 ++++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c |  22 +++++-
 5 files changed, 136 insertions(+), 2 deletions(-)
    

Revisions