[1/2] tests/gem_eio: New ABI - no EIO even from wait_ioctl

Submitted by Daniel Vetter on Nov. 26, 2015, 11:34 a.m.

Details

Message ID 1448537675-481-1-git-send-email-daniel.vetter@ffwll.ch
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Daniel Vetter Nov. 26, 2015, 11:34 a.m.
So there's 3 competing proposals for what wait_ioctl should do wrt
-EIO:

- return -EIO when the gpu is wedged. Not terribly useful for
  userspace since it might race with a hang and then there's no
  guarantee that a subsequent execbuf won't end up in an -EIO.
  Terminally wedge really can only be reliably signalled at execbuf
  time, and userspace needs to cope with that (or decide not to
  bother).

- EIO for any obj that suffered from a reset. This means big internal
  reorginazation in the kernel since currently we track reset stats
  per-ctx and not on the obj. That's also what arb robustness wants.
  We could do this, but this feels like new ABI territory with the
  usual userspace requirements and high hurdles.

- No -EIO at all. Consistent with set_domain_ioctl and simplest to
  implement. Which is what this patch does.

We can always opt to change this later on if there's a real need.

To make the test really exercise this do a full wedged gpu hang, to
make sure -EIO doesn't leak out at all.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 tests/gem_eio.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index a24c8f1c53b5..8345d1a7a429 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -161,10 +161,14 @@  static void test_wait(int fd)
 {
 	igt_hang_ring_t hang;
 
+	igt_require(i915_reset_control(false));
+
 	hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
-	igt_assert_eq(__gem_wait(fd, hang.handle, -1), -EIO);
+	igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
 	igt_post_hang_ring(fd, hang);
 
+	igt_assert(i915_reset_control(true));
+
 	trigger_reset(fd);
 }
 

Comments

On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> So there's 3 competing proposals for what wait_ioctl should do wrt
> -EIO:
> 
> - return -EIO when the gpu is wedged. Not terribly useful for
>   userspace since it might race with a hang and then there's no
>   guarantee that a subsequent execbuf won't end up in an -EIO.
>   Terminally wedge really can only be reliably signalled at execbuf
>   time, and userspace needs to cope with that (or decide not to
>   bother).
> 
> - EIO for any obj that suffered from a reset. This means big internal
>   reorginazation in the kernel since currently we track reset stats
>   per-ctx and not on the obj. That's also what arb robustness wants.
>   We could do this, but this feels like new ABI territory with the
>   usual userspace requirements and high hurdles.
> 
> - No -EIO at all. Consistent with set_domain_ioctl and simplest to
>   implement. Which is what this patch does.

Since no one else is weighing into the ABI discussion, I'm happy with
losing EIO here. I thought it could be useful, but as no one is using or
seems likely to start using it, begone.

> We can always opt to change this later on if there's a real need.
> 
> To make the test really exercise this do a full wedged gpu hang, to
> make sure -EIO doesn't leak out at all.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  tests/gem_eio.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> index a24c8f1c53b5..8345d1a7a429 100644
> --- a/tests/gem_eio.c
> +++ b/tests/gem_eio.c
> @@ -161,10 +161,14 @@ static void test_wait(int fd)
>  {
>  	igt_hang_ring_t hang;
>  
> +	igt_require(i915_reset_control(false));

However, this is not required to test the ABI change above as the wait
itself will still hang, whether or not it wedges the GPU.
-Chris
On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote:
> On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> > So there's 3 competing proposals for what wait_ioctl should do wrt
> > -EIO:
> > 
> > - return -EIO when the gpu is wedged. Not terribly useful for
> >   userspace since it might race with a hang and then there's no
> >   guarantee that a subsequent execbuf won't end up in an -EIO.
> >   Terminally wedge really can only be reliably signalled at execbuf
> >   time, and userspace needs to cope with that (or decide not to
> >   bother).
> > 
> > - EIO for any obj that suffered from a reset. This means big internal
> >   reorginazation in the kernel since currently we track reset stats
> >   per-ctx and not on the obj. That's also what arb robustness wants.
> >   We could do this, but this feels like new ABI territory with the
> >   usual userspace requirements and high hurdles.
> > 
> > - No -EIO at all. Consistent with set_domain_ioctl and simplest to
> >   implement. Which is what this patch does.
> 
> Since no one else is weighing into the ABI discussion, I'm happy with
> losing EIO here. I thought it could be useful, but as no one is using or
> seems likely to start using it, begone.
> 
> > We can always opt to change this later on if there's a real need.
> > 
> > To make the test really exercise this do a full wedged gpu hang, to
> > make sure -EIO doesn't leak out at all.
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  tests/gem_eio.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > index a24c8f1c53b5..8345d1a7a429 100644
> > --- a/tests/gem_eio.c
> > +++ b/tests/gem_eio.c
> > @@ -161,10 +161,14 @@ static void test_wait(int fd)
> >  {
> >  	igt_hang_ring_t hang;
> >  
> > +	igt_require(i915_reset_control(false));
> 
> However, this is not required to test the ABI change above as the wait
> itself will still hang, whether or not it wedges the GPU.

Yes it's not strictly required, but without it the testcase is fairly
boring. If we move the check_wedge out of wait_request then a normail gpu
reset would always return 0 (after retrying a few times perhaps), so I
figured testing the wedged case is the only one that's worth it.

Maybe we should dupe the subtests all and have wedged and non-wedged cases
for all of them? That would imo make more sense.
-Daniel
On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote:
> On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote:
> > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> > > So there's 3 competing proposals for what wait_ioctl should do wrt
> > > -EIO:
> > > 
> > > - return -EIO when the gpu is wedged. Not terribly useful for
> > >   userspace since it might race with a hang and then there's no
> > >   guarantee that a subsequent execbuf won't end up in an -EIO.
> > >   Terminally wedge really can only be reliably signalled at execbuf
> > >   time, and userspace needs to cope with that (or decide not to
> > >   bother).
> > > 
> > > - EIO for any obj that suffered from a reset. This means big internal
> > >   reorginazation in the kernel since currently we track reset stats
> > >   per-ctx and not on the obj. That's also what arb robustness wants.
> > >   We could do this, but this feels like new ABI territory with the
> > >   usual userspace requirements and high hurdles.
> > > 
> > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to
> > >   implement. Which is what this patch does.
> > 
> > Since no one else is weighing into the ABI discussion, I'm happy with
> > losing EIO here. I thought it could be useful, but as no one is using or
> > seems likely to start using it, begone.
> > 
> > > We can always opt to change this later on if there's a real need.
> > > 
> > > To make the test really exercise this do a full wedged gpu hang, to
> > > make sure -EIO doesn't leak out at all.
> > > 
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > ---
> > >  tests/gem_eio.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > > index a24c8f1c53b5..8345d1a7a429 100644
> > > --- a/tests/gem_eio.c
> > > +++ b/tests/gem_eio.c
> > > @@ -161,10 +161,14 @@ static void test_wait(int fd)
> > >  {
> > >  	igt_hang_ring_t hang;
> > >  
> > > +	igt_require(i915_reset_control(false));
> > 
> > However, this is not required to test the ABI change above as the wait
> > itself will still hang, whether or not it wedges the GPU.
> 
> Yes it's not strictly required, but without it the testcase is fairly
> boring. If we move the check_wedge out of wait_request then a normail gpu
> reset would always return 0 (after retrying a few times perhaps), so I
> figured testing the wedged case is the only one that's worth it.

But wedging during the hang is also not interesting as we have no
opportunity to see the reset failure in the test case. Putting the GPU
into the wedged state before the wait, should be a trivial test that the
object is idle after the reset.
 
> Maybe we should dupe the subtests all and have wedged and non-wedged cases
> for all of them? That would imo make more sense.

The others, what matters is how we handle the GPU being wedged before we
queue an execbuf or throttling. In terms of testing no error is reported
for the hanging case, we should add tests for set-domain (so that it is
explicit and not reliant on implementation inside lib/), GTT faulting
(that would need both wedged and hanging cases, but we are not likely to
be able to cover all the cases like waiting on fence and other secondary
waits) and throttle.
-Chris
On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote:
> On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote:
> > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote:
> > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> > > > So there's 3 competing proposals for what wait_ioctl should do wrt
> > > > -EIO:
> > > > 
> > > > - return -EIO when the gpu is wedged. Not terribly useful for
> > > >   userspace since it might race with a hang and then there's no
> > > >   guarantee that a subsequent execbuf won't end up in an -EIO.
> > > >   Terminally wedge really can only be reliably signalled at execbuf
> > > >   time, and userspace needs to cope with that (or decide not to
> > > >   bother).
> > > > 
> > > > - EIO for any obj that suffered from a reset. This means big internal
> > > >   reorginazation in the kernel since currently we track reset stats
> > > >   per-ctx and not on the obj. That's also what arb robustness wants.
> > > >   We could do this, but this feels like new ABI territory with the
> > > >   usual userspace requirements and high hurdles.
> > > > 
> > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to
> > > >   implement. Which is what this patch does.
> > > 
> > > Since no one else is weighing into the ABI discussion, I'm happy with
> > > losing EIO here. I thought it could be useful, but as no one is using or
> > > seems likely to start using it, begone.
> > > 
> > > > We can always opt to change this later on if there's a real need.
> > > > 
> > > > To make the test really exercise this do a full wedged gpu hang, to
> > > > make sure -EIO doesn't leak out at all.
> > > > 
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > ---
> > > >  tests/gem_eio.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > > > index a24c8f1c53b5..8345d1a7a429 100644
> > > > --- a/tests/gem_eio.c
> > > > +++ b/tests/gem_eio.c
> > > > @@ -161,10 +161,14 @@ static void test_wait(int fd)
> > > >  {
> > > >  	igt_hang_ring_t hang;
> > > >  
> > > > +	igt_require(i915_reset_control(false));
> > > 
> > > However, this is not required to test the ABI change above as the wait
> > > itself will still hang, whether or not it wedges the GPU.
> > 
> > Yes it's not strictly required, but without it the testcase is fairly
> > boring. If we move the check_wedge out of wait_request then a normail gpu
> > reset would always return 0 (after retrying a few times perhaps), so I
> > figured testing the wedged case is the only one that's worth it.
> 
> But wedging during the hang is also not interesting as we have no
> opportunity to see the reset failure in the test case. Putting the GPU
> into the wedged state before the wait, should be a trivial test that the
> object is idle after the reset.

Right now (with current kernels) we see an -EIO with this testcase instead
of 0 in the wait. Without disabling reset we see 0 both on fixed and
broken kernels. So I don't really see why not testing this case is a good
idea? It's the one we're currently failing at and leak -EIO to userspace.

Assuming ofc we still go with the "let's curb -EIO except for execbuf"
ABI.

> > Maybe we should dupe the subtests all and have wedged and non-wedged cases
> > for all of them? That would imo make more sense.
> 
> The others, what matters is how we handle the GPU being wedged before we
> queue an execbuf or throttling. In terms of testing no error is reported
> for the hanging case, we should add tests for set-domain (so that it is
> explicit and not reliant on implementation inside lib/), GTT faulting
> (that would need both wedged and hanging cases, but we are not likely to
> be able to cover all the cases like waiting on fence and other secondary
> waits) and throttle.

Yeah there's more gaps than just the one above.
-Daniel
On Tue, Dec 01, 2015 at 10:13:10AM +0100, Daniel Vetter wrote:
> On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote:
> > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote:
> > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote:
> > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> > > > > So there's 3 competing proposals for what wait_ioctl should do wrt
> > > > > -EIO:
> > > > > 
> > > > > - return -EIO when the gpu is wedged. Not terribly useful for
> > > > >   userspace since it might race with a hang and then there's no
> > > > >   guarantee that a subsequent execbuf won't end up in an -EIO.
> > > > >   Terminally wedge really can only be reliably signalled at execbuf
> > > > >   time, and userspace needs to cope with that (or decide not to
> > > > >   bother).
> > > > > 
> > > > > - EIO for any obj that suffered from a reset. This means big internal
> > > > >   reorginazation in the kernel since currently we track reset stats
> > > > >   per-ctx and not on the obj. That's also what arb robustness wants.
> > > > >   We could do this, but this feels like new ABI territory with the
> > > > >   usual userspace requirements and high hurdles.
> > > > > 
> > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to
> > > > >   implement. Which is what this patch does.
> > > > 
> > > > Since no one else is weighing into the ABI discussion, I'm happy with
> > > > losing EIO here. I thought it could be useful, but as no one is using or
> > > > seems likely to start using it, begone.
> > > > 
> > > > > We can always opt to change this later on if there's a real need.
> > > > > 
> > > > > To make the test really exercise this do a full wedged gpu hang, to
> > > > > make sure -EIO doesn't leak out at all.
> > > > > 
> > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > > ---
> > > > >  tests/gem_eio.c | 6 +++++-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > > > > index a24c8f1c53b5..8345d1a7a429 100644
> > > > > --- a/tests/gem_eio.c
> > > > > +++ b/tests/gem_eio.c
> > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd)
> > > > >  {
> > > > >  	igt_hang_ring_t hang;
> > > > >  
> > > > > +	igt_require(i915_reset_control(false));
> > > > 
> > > > However, this is not required to test the ABI change above as the wait
> > > > itself will still hang, whether or not it wedges the GPU.
> > > 
> > > Yes it's not strictly required, but without it the testcase is fairly
> > > boring. If we move the check_wedge out of wait_request then a normail gpu
> > > reset would always return 0 (after retrying a few times perhaps), so I
> > > figured testing the wedged case is the only one that's worth it.
> > 
> > But wedging during the hang is also not interesting as we have no
> > opportunity to see the reset failure in the test case. Putting the GPU
> > into the wedged state before the wait, should be a trivial test that the
> > object is idle after the reset.
> 
> Right now (with current kernels) we see an -EIO with this testcase instead
> of 0 in the wait. Without disabling reset we see 0 both on fixed and
> broken kernels. So I don't really see why not testing this case is a good
> idea? It's the one we're currently failing at and leak -EIO to userspace.

Because that wasn't the ABI I was trying to test here and expected to
maintain! I was just aiming for covering the ioctl calls with expected
EIO (to do the opposite would need something like trinity or a guided
fuzzer).
-Chris
On Tue, Dec 01, 2015 at 09:20:02AM +0000, Chris Wilson wrote:
> On Tue, Dec 01, 2015 at 10:13:10AM +0100, Daniel Vetter wrote:
> > On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote:
> > > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote:
> > > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote:
> > > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> > > > > > So there's 3 competing proposals for what wait_ioctl should do wrt
> > > > > > -EIO:
> > > > > > 
> > > > > > - return -EIO when the gpu is wedged. Not terribly useful for
> > > > > >   userspace since it might race with a hang and then there's no
> > > > > >   guarantee that a subsequent execbuf won't end up in an -EIO.
> > > > > >   Terminally wedge really can only be reliably signalled at execbuf
> > > > > >   time, and userspace needs to cope with that (or decide not to
> > > > > >   bother).
> > > > > > 
> > > > > > - EIO for any obj that suffered from a reset. This means big internal
> > > > > >   reorginazation in the kernel since currently we track reset stats
> > > > > >   per-ctx and not on the obj. That's also what arb robustness wants.
> > > > > >   We could do this, but this feels like new ABI territory with the
> > > > > >   usual userspace requirements and high hurdles.
> > > > > > 
> > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to
> > > > > >   implement. Which is what this patch does.
> > > > > 
> > > > > Since no one else is weighing into the ABI discussion, I'm happy with
> > > > > losing EIO here. I thought it could be useful, but as no one is using or
> > > > > seems likely to start using it, begone.
> > > > > 
> > > > > > We can always opt to change this later on if there's a real need.
> > > > > > 
> > > > > > To make the test really exercise this do a full wedged gpu hang, to
> > > > > > make sure -EIO doesn't leak out at all.
> > > > > > 
> > > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > > > ---
> > > > > >  tests/gem_eio.c | 6 +++++-
> > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > > > > > index a24c8f1c53b5..8345d1a7a429 100644
> > > > > > --- a/tests/gem_eio.c
> > > > > > +++ b/tests/gem_eio.c
> > > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd)
> > > > > >  {
> > > > > >  	igt_hang_ring_t hang;
> > > > > >  
> > > > > > +	igt_require(i915_reset_control(false));
> > > > > 
> > > > > However, this is not required to test the ABI change above as the wait
> > > > > itself will still hang, whether or not it wedges the GPU.
> > > > 
> > > > Yes it's not strictly required, but without it the testcase is fairly
> > > > boring. If we move the check_wedge out of wait_request then a normail gpu
> > > > reset would always return 0 (after retrying a few times perhaps), so I
> > > > figured testing the wedged case is the only one that's worth it.
> > > 
> > > But wedging during the hang is also not interesting as we have no
> > > opportunity to see the reset failure in the test case. Putting the GPU
> > > into the wedged state before the wait, should be a trivial test that the
> > > object is idle after the reset.
> > 
> > Right now (with current kernels) we see an -EIO with this testcase instead
> > of 0 in the wait. Without disabling reset we see 0 both on fixed and
> > broken kernels. So I don't really see why not testing this case is a good
> > idea? It's the one we're currently failing at and leak -EIO to userspace.
> 
> Because that wasn't the ABI I was trying to test here and expected to
> maintain! I was just aiming for covering the ioctl calls with expected
> EIO (to do the opposite would need something like trinity or a guided
> fuzzer).

Well yeah, but I thought we decided to change it? So Ack or not?
-Daniel
On Thu, Dec 03, 2015 at 09:50:25AM +0100, Daniel Vetter wrote:
> On Tue, Dec 01, 2015 at 09:20:02AM +0000, Chris Wilson wrote:
> > On Tue, Dec 01, 2015 at 10:13:10AM +0100, Daniel Vetter wrote:
> > > On Tue, Dec 01, 2015 at 09:04:23AM +0000, Chris Wilson wrote:
> > > > On Tue, Dec 01, 2015 at 09:28:08AM +0100, Daniel Vetter wrote:
> > > > > On Mon, Nov 30, 2015 at 10:11:12AM +0000, Chris Wilson wrote:
> > > > > > On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> > > > > > > So there's 3 competing proposals for what wait_ioctl should do wrt
> > > > > > > -EIO:
> > > > > > > 
> > > > > > > - return -EIO when the gpu is wedged. Not terribly useful for
> > > > > > >   userspace since it might race with a hang and then there's no
> > > > > > >   guarantee that a subsequent execbuf won't end up in an -EIO.
> > > > > > >   Terminally wedge really can only be reliably signalled at execbuf
> > > > > > >   time, and userspace needs to cope with that (or decide not to
> > > > > > >   bother).
> > > > > > > 
> > > > > > > - EIO for any obj that suffered from a reset. This means big internal
> > > > > > >   reorginazation in the kernel since currently we track reset stats
> > > > > > >   per-ctx and not on the obj. That's also what arb robustness wants.
> > > > > > >   We could do this, but this feels like new ABI territory with the
> > > > > > >   usual userspace requirements and high hurdles.
> > > > > > > 
> > > > > > > - No -EIO at all. Consistent with set_domain_ioctl and simplest to
> > > > > > >   implement. Which is what this patch does.
> > > > > > 
> > > > > > Since no one else is weighing into the ABI discussion, I'm happy with
> > > > > > losing EIO here. I thought it could be useful, but as no one is using or
> > > > > > seems likely to start using it, begone.
> > > > > > 
> > > > > > > We can always opt to change this later on if there's a real need.
> > > > > > > 
> > > > > > > To make the test really exercise this do a full wedged gpu hang, to
> > > > > > > make sure -EIO doesn't leak out at all.
> > > > > > > 
> > > > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > > > > ---
> > > > > > >  tests/gem_eio.c | 6 +++++-
> > > > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > > > > > > index a24c8f1c53b5..8345d1a7a429 100644
> > > > > > > --- a/tests/gem_eio.c
> > > > > > > +++ b/tests/gem_eio.c
> > > > > > > @@ -161,10 +161,14 @@ static void test_wait(int fd)
> > > > > > >  {
> > > > > > >  	igt_hang_ring_t hang;
> > > > > > >  
> > > > > > > +	igt_require(i915_reset_control(false));
> > > > > > 
> > > > > > However, this is not required to test the ABI change above as the wait
> > > > > > itself will still hang, whether or not it wedges the GPU.
> > > > > 
> > > > > Yes it's not strictly required, but without it the testcase is fairly
> > > > > boring. If we move the check_wedge out of wait_request then a normail gpu
> > > > > reset would always return 0 (after retrying a few times perhaps), so I
> > > > > figured testing the wedged case is the only one that's worth it.
> > > > 
> > > > But wedging during the hang is also not interesting as we have no
> > > > opportunity to see the reset failure in the test case. Putting the GPU
> > > > into the wedged state before the wait, should be a trivial test that the
> > > > object is idle after the reset.
> > > 
> > > Right now (with current kernels) we see an -EIO with this testcase instead
> > > of 0 in the wait. Without disabling reset we see 0 both on fixed and
> > > broken kernels. So I don't really see why not testing this case is a good
> > > idea? It's the one we're currently failing at and leak -EIO to userspace.
> > 
> > Because that wasn't the ABI I was trying to test here and expected to
> > maintain! I was just aiming for covering the ioctl calls with expected
> > EIO (to do the opposite would need something like trinity or a guided
> > fuzzer).
> 
> Well yeah, but I thought we decided to change it? So Ack or not?

If you change the subtest to do both, the more likely case where there
is a GPU hang whilst we wait for a request, and the second where the GPU
hang leads to a failed reset. There's actually a another pair where the
GPU hangs (and reset succeeds or not) before we call the wait_ioctl. And
then we can use the guts of that function to do the set-domain and
gtt-fault testing.

As far as wait-ioctl reporting 0 if the request is completed due to a
reset, that's an ACK.
-Chris
On Thu, Nov 26, 2015 at 12:34:34PM +0100, Daniel Vetter wrote:
> So there's 3 competing proposals for what wait_ioctl should do wrt
> -EIO:
> 
> - return -EIO when the gpu is wedged. Not terribly useful for
>   userspace since it might race with a hang and then there's no
>   guarantee that a subsequent execbuf won't end up in an -EIO.
>   Terminally wedge really can only be reliably signalled at execbuf
>   time, and userspace needs to cope with that (or decide not to
>   bother).
> 
> - EIO for any obj that suffered from a reset. This means big internal
>   reorginazation in the kernel since currently we track reset stats
>   per-ctx and not on the obj. That's also what arb robustness wants.
>   We could do this, but this feels like new ABI territory with the
>   usual userspace requirements and high hurdles.
> 
> - No -EIO at all. Consistent with set_domain_ioctl and simplest to
>   implement. Which is what this patch does.
> 
> We can always opt to change this later on if there's a real need.
> 
> To make the test really exercise this do a full wedged gpu hang, to
> make sure -EIO doesn't leak out at all.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

I changed the test to cover both i915.reset=true and i915.reset=false
possibilities and pushed.
-Chris