[i-g-t] i915/gem_exec_parallel: Add skip_on_simulation to whole binary

Submitted by Katarzyna Dec on Feb. 8, 2019, 1:10 p.m.

Details

Message ID 20190208131037.22811-1-katarzyna.dec@intel.com
State New
Series "i915/gem_exec_parallel: Add skip_on_simulation to whole binary"
Headers show

Commit Message

Katarzyna Dec Feb. 8, 2019, 1:10 p.m.
On simulation environment these tests were failing. During debug
we decided that whole binary should be skipped, because there is
no additional coverage in simulation.

Signed-off-by: Katarzyna Dec <katarzyna.dec@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 tests/i915/gem_exec_parallel.c | 2 ++
 1 file changed, 2 insertions(+)

Patch hide | download patch | download mbox

diff --git a/tests/i915/gem_exec_parallel.c b/tests/i915/gem_exec_parallel.c
index a6fa698e..c6a20c4f 100644
--- a/tests/i915/gem_exec_parallel.c
+++ b/tests/i915/gem_exec_parallel.c
@@ -238,6 +238,8 @@  igt_main
 	};
 	int fd;
 
+	igt_skip_on_simulation();
+
 	igt_fixture {
 		fd = drm_open_driver_master(DRIVER_INTEL);
 		igt_require_gem(fd);

Comments

Chris Wilson Feb. 8, 2019, 1:14 p.m.
Quoting Katarzyna Dec (2019-02-08 13:10:37)
> On simulation environment these tests were failing. During debug
> we decided that whole binary should be skipped, because there is
> no additional coverage in simulation.

How did they fail? There are some paths here that are rarely touched
elsewhere.
-Chris
Katarzyna Dec Feb. 8, 2019, 1:23 p.m.
On Fri, Feb 08, 2019 at 01:14:26PM +0000, Chris Wilson wrote:
> Quoting Katarzyna Dec (2019-02-08 13:10:37)
> > On simulation environment these tests were failing. During debug
> > we decided that whole binary should be skipped, because there is
> > no additional coverage in simulation.
> 
> How did they fail? There are some paths here that are rarely touched
> elsewhere.
> -Chris
I like you challenges Chris :)

gem_exec_parallel@*context* were failing with error:

(gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Test assertion failure
function gem_execbuf, file ../lib/ioctl_wrappers.c:605:
(gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Failed assertion:
__gem_execbuf(fd, execbuf) == 0
(gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: error: -28 != 0

and @*fds* with:

(gem_exec_parallel:1315) igt_debugfs-CRITICAL: Last errno: 24, Too many open
files

After discussion with Ursulin Tvrtko we decided that it is redundant to run it
on simulation.
Probably my commit msg was lacking some information.

Kasia :)
Chris Wilson Feb. 8, 2019, 1:27 p.m.
Quoting Katarzyna Dec (2019-02-08 13:23:59)
> On Fri, Feb 08, 2019 at 01:14:26PM +0000, Chris Wilson wrote:
> > Quoting Katarzyna Dec (2019-02-08 13:10:37)
> > > On simulation environment these tests were failing. During debug
> > > we decided that whole binary should be skipped, because there is
> > > no additional coverage in simulation.
> > 
> > How did they fail? There are some paths here that are rarely touched
> > elsewhere.
> > -Chris
> I like you challenges Chris :)
> 
> gem_exec_parallel@*context* were failing with error:
> 
> (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Test assertion failure
> function gem_execbuf, file ../lib/ioctl_wrappers.c:605:
> (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Failed assertion:
> __gem_execbuf(fd, execbuf) == 0
> (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: error: -28 != 0

Should not happen.
 
> and @*fds* with:
> 
> (gem_exec_parallel:1315) igt_debugfs-CRITICAL: Last errno: 24, Too many open
> files

That is boring, but more indicative of system setup failure. We should
be configuring the system to allow ourselves, as root, to use as many fd
as can fit in memory.

I'd recommend a pass through the basic stress test at least. It won't
take a lot to exercise some unusual conditions.
-Chris
Katarzyna Dec Feb. 8, 2019, 1:32 p.m.
On Fri, Feb 08, 2019 at 01:27:51PM +0000, Chris Wilson wrote:
> Quoting Katarzyna Dec (2019-02-08 13:23:59)
> > On Fri, Feb 08, 2019 at 01:14:26PM +0000, Chris Wilson wrote:
> > > Quoting Katarzyna Dec (2019-02-08 13:10:37)
> > > > On simulation environment these tests were failing. During debug
> > > > we decided that whole binary should be skipped, because there is
> > > > no additional coverage in simulation.
> > > 
> > > How did they fail? There are some paths here that are rarely touched
> > > elsewhere.
> > > -Chris
> > I like you challenges Chris :)
> > 
> > gem_exec_parallel@*context* were failing with error:
> > 
> > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Test assertion failure
> > function gem_execbuf, file ../lib/ioctl_wrappers.c:605:
> > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Failed assertion:
> > __gem_execbuf(fd, execbuf) == 0
> > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: error: -28 != 0
> 
> Should not happen.
But happends and is reproducible on some CI runs.
>  
> > and @*fds* with:
> > 
> > (gem_exec_parallel:1315) igt_debugfs-CRITICAL: Last errno: 24, Too many open
> > files
> 
> That is boring, but more indicative of system setup failure. We should
> be configuring the system to allow ourselves, as root, to use as many fd
> as can fit in memory.
> 
> I'd recommend a pass through the basic stress test at least. It won't
> take a lot to exercise some unusual conditions.
> -Chris

What if we skip all *context* and all *fds* and leave rest as it is?
Will it be resonable in this shape?

Kasia :)
Chris Wilson Feb. 8, 2019, 1:42 p.m.
Quoting Katarzyna Dec (2019-02-08 13:32:38)
> On Fri, Feb 08, 2019 at 01:27:51PM +0000, Chris Wilson wrote:
> > Quoting Katarzyna Dec (2019-02-08 13:23:59)
> > > On Fri, Feb 08, 2019 at 01:14:26PM +0000, Chris Wilson wrote:
> > > > Quoting Katarzyna Dec (2019-02-08 13:10:37)
> > > > > On simulation environment these tests were failing. During debug
> > > > > we decided that whole binary should be skipped, because there is
> > > > > no additional coverage in simulation.
> > > > 
> > > > How did they fail? There are some paths here that are rarely touched
> > > > elsewhere.
> > > > -Chris
> > > I like you challenges Chris :)
> > > 
> > > gem_exec_parallel@*context* were failing with error:
> > > 
> > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Test assertion failure
> > > function gem_execbuf, file ../lib/ioctl_wrappers.c:605:
> > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Failed assertion:
> > > __gem_execbuf(fd, execbuf) == 0
> > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: error: -28 != 0
> > 
> > Should not happen.
> But happends and is reproducible on some CI runs.

Well fix it. It is not a valid error in this case. ENOSPC is only valid
in the case the user submitted more than could be fitted into the GGTT
(or had an overlap in their softpin). In this instance, it means that
one _context_ impacted another client, and that is simply not allowed.

> > > and @*fds* with:
> > > 
> > > (gem_exec_parallel:1315) igt_debugfs-CRITICAL: Last errno: 24, Too many open
> > > files
> > 
> > That is boring, but more indicative of system setup failure. We should
> > be configuring the system to allow ourselves, as root, to use as many fd
> > as can fit in memory.
> > 
> > I'd recommend a pass through the basic stress test at least. It won't
> > take a lot to exercise some unusual conditions.
> > -Chris
> 
> What if we skip all *context* and all *fds* and leave rest as it is?
> Will it be resonable in this shape?

I would rewrite it such that the threads ran for a certain period of
time rather than iterations; set basic to something small (say 2s) and
left everything else to run for 150s (tends to be the common value I've
used for random stress tests). Then skip everything but the basic-all
test.
-Chris
Katarzyna Dec Feb. 8, 2019, 1:49 p.m.
On Fri, Feb 08, 2019 at 01:42:42PM +0000, Chris Wilson wrote:
> Quoting Katarzyna Dec (2019-02-08 13:32:38)
> > On Fri, Feb 08, 2019 at 01:27:51PM +0000, Chris Wilson wrote:
> > > Quoting Katarzyna Dec (2019-02-08 13:23:59)
> > > > On Fri, Feb 08, 2019 at 01:14:26PM +0000, Chris Wilson wrote:
> > > > > Quoting Katarzyna Dec (2019-02-08 13:10:37)
> > > > > > On simulation environment these tests were failing. During debug
> > > > > > we decided that whole binary should be skipped, because there is
> > > > > > no additional coverage in simulation.
> > > > > 
> > > > > How did they fail? There are some paths here that are rarely touched
> > > > > elsewhere.
> > > > > -Chris
> > > > I like you challenges Chris :)
> > > > 
> > > > gem_exec_parallel@*context* were failing with error:
> > > > 
> > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Test assertion failure
> > > > function gem_execbuf, file ../lib/ioctl_wrappers.c:605:
> > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Failed assertion:
> > > > __gem_execbuf(fd, execbuf) == 0
> > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: error: -28 != 0
> > > 
> > > Should not happen.
> > But happends and is reproducible on some CI runs.
> 
> Well fix it. It is not a valid error in this case. ENOSPC is only valid
> in the case the user submitted more than could be fitted into the GGTT
> (or had an overlap in their softpin). In this instance, it means that
> one _context_ impacted another client, and that is simply not allowed.
>
According to what you are saying we need to debug and fix the issue.
> > > > and @*fds* with:
> > > > 
> > > > (gem_exec_parallel:1315) igt_debugfs-CRITICAL: Last errno: 24, Too many open
> > > > files
> > > 
> > > That is boring, but more indicative of system setup failure. We should
> > > be configuring the system to allow ourselves, as root, to use as many fd
> > > as can fit in memory.
> > > 
> > > I'd recommend a pass through the basic stress test at least. It won't
> > > take a lot to exercise some unusual conditions.
> > > -Chris
> > 
> > What if we skip all *context* and all *fds* and leave rest as it is?
> > Will it be resonable in this shape?
> 
> I would rewrite it such that the threads ran for a certain period of
> time rather than iterations; set basic to something small (say 2s) and
> left everything else to run for 150s (tends to be the common value I've
Is this comment to *fds* testcases or general to whole binary?
Kasia :)
> used for random stress tests). Then skip everything but the basic-all
> test.
> -Chris
Chris Wilson Feb. 8, 2019, 1:57 p.m.
Quoting Katarzyna Dec (2019-02-08 13:49:28)
> On Fri, Feb 08, 2019 at 01:42:42PM +0000, Chris Wilson wrote:
> > Quoting Katarzyna Dec (2019-02-08 13:32:38)
> > > On Fri, Feb 08, 2019 at 01:27:51PM +0000, Chris Wilson wrote:
> > > > Quoting Katarzyna Dec (2019-02-08 13:23:59)
> > > > > On Fri, Feb 08, 2019 at 01:14:26PM +0000, Chris Wilson wrote:
> > > > > > Quoting Katarzyna Dec (2019-02-08 13:10:37)
> > > > > > > On simulation environment these tests were failing. During debug
> > > > > > > we decided that whole binary should be skipped, because there is
> > > > > > > no additional coverage in simulation.
> > > > > > 
> > > > > > How did they fail? There are some paths here that are rarely touched
> > > > > > elsewhere.
> > > > > > -Chris
> > > > > I like you challenges Chris :)
> > > > > 
> > > > > gem_exec_parallel@*context* were failing with error:
> > > > > 
> > > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Test assertion failure
> > > > > function gem_execbuf, file ../lib/ioctl_wrappers.c:605:
> > > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Failed assertion:
> > > > > __gem_execbuf(fd, execbuf) == 0
> > > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: error: -28 != 0
> > > > 
> > > > Should not happen.
> > > But happends and is reproducible on some CI runs.
> > 
> > Well fix it. It is not a valid error in this case. ENOSPC is only valid
> > in the case the user submitted more than could be fitted into the GGTT
> > (or had an overlap in their softpin). In this instance, it means that
> > one _context_ impacted another client, and that is simply not allowed.
> >
> According to what you are saying we need to debug and fix the issue.

Yes, it'll be either unable to pin the context image or allocate the id.
Both of which should invoke a GC to recover resources (no client should
have a permanent allocation, mere temporarily leases) rather than fail
with ENOSPC, 

> > > > > and @*fds* with:
> > > > > 
> > > > > (gem_exec_parallel:1315) igt_debugfs-CRITICAL: Last errno: 24, Too many open
> > > > > files
> > > > 
> > > > That is boring, but more indicative of system setup failure. We should
> > > > be configuring the system to allow ourselves, as root, to use as many fd
> > > > as can fit in memory.
> > > > 
> > > > I'd recommend a pass through the basic stress test at least. It won't
> > > > take a lot to exercise some unusual conditions.
> > > > -Chris
> > > 
> > > What if we skip all *context* and all *fds* and leave rest as it is?
> > > Will it be resonable in this shape?
> > 
> > I would rewrite it such that the threads ran for a certain period of
> > time rather than iterations; set basic to something small (say 2s) and
> > left everything else to run for 150s (tends to be the common value I've
> Is this comment to *fds* testcases or general to whole binary?

Whole binary; I think it's useful enough to run at least 2 passes with a
1024 or more competing threads (more!) because this test has proven
itself to be an early warning sign of problems that have escaped the
"purpose built" tests. It doesn't have to be in the first round of
testing, and it doesn't have to be extensive, just enough to load the
system up to as much as it can handle and check it doesn't fall over.

It would be nice to extend the normal test out to huge number of threads
(even say 16,000 but there are realistic limits to both the number of fd
and contexts that can fit in memory that need to be determined to find
the ideal number).
-Chris