[i-g-t] i915/gem_exec_parse: Switch to a fixed timeout for basic-allocations

Submitted by Chris Wilson on Feb. 11, 2019, 2:35 p.m.

Details

Message ID 20190211143544.16184-1-chris@chris-wilson.co.uk
State New
Series "i915/gem_exec_parse: Switch to a fixed timeout for basic-allocations"
Headers show

Commit Message

Chris Wilson Feb. 11, 2019, 2:35 p.m.
basic-allocations was written to demonstrate a flaw in our continual
reallocation of cmdparser shadow bo, largely fixed by keeping a small
cache of bo of different lengths (to speed up the search for the correct
sized bo). We only care enough to exercise the slowdown by submitting
lots of execbufs, and can see the effect of bo caching on the rate, so
replace the fixed number of iterations with a timeout and count how many
batches we could submit instead.

Similarly, we now do not need to wait for all of our queue to complete
as we can tell the kernel to drop the queue instead.

References: https://bugs.freedesktop.org/show_bug.cgi?id=107936
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/i915/gem_exec_parse.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Patch hide | download patch | download mbox

diff --git a/tests/i915/gem_exec_parse.c b/tests/i915/gem_exec_parse.c
index b653b1bdc..62e8d0a51 100644
--- a/tests/i915/gem_exec_parse.c
+++ b/tests/i915/gem_exec_parse.c
@@ -303,15 +303,15 @@  test_lri(int fd, uint32_t handle, struct test_lri *test)
 
 static void test_allocations(int fd)
 {
-	uint32_t bbe = MI_BATCH_BUFFER_END;
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
 	struct drm_i915_gem_execbuffer2 execbuf;
 	struct drm_i915_gem_exec_object2 obj[17];
-	int i, j;
+	unsigned long count;
 
 	intel_require_memory(2, 1ull<<(12 + ARRAY_SIZE(obj)), CHECK_RAM);
 
 	memset(obj, 0, sizeof(obj));
-	for (i = 0; i < ARRAY_SIZE(obj); i++) {
+	for (int i = 0; i < ARRAY_SIZE(obj); i++) {
 		uint64_t size = 1ull << (12 + i);
 
 		obj[i].handle = gem_create(fd, size);
@@ -322,17 +322,21 @@  static void test_allocations(int fd)
 
 	memset(&execbuf, 0, sizeof(execbuf));
 	execbuf.buffer_count = 1;
-	for (j = 0; j < 16384; j++) {
-		igt_progress("allocations ", j, 16384);
-		i = rand() % ARRAY_SIZE(obj);
+
+	count = 0;
+	igt_until_timeout(20) {
+		int i = rand() % ARRAY_SIZE(obj);
 		execbuf.buffers_ptr = to_user_pointer(&obj[i]);
 		execbuf.batch_start_offset = (rand() % (1ull<<i)) << 12;
 		execbuf.batch_start_offset += 64 * (rand() % 64);
 		execbuf.batch_len = (1ull<<(12+i)) - execbuf.batch_start_offset;
 		gem_execbuf(fd, &execbuf);
+		count++;
 	}
+	igt_info("Submitted %lu execbufs\n", count);
+	igt_drop_caches_set(fd, DROP_RESET_ACTIVE); /* Cancel the queued work */
 
-	for (i = 0; i < ARRAY_SIZE(obj); i++) {
+	for (int i = 0; i < ARRAY_SIZE(obj); i++) {
 		gem_sync(fd, obj[i].handle);
 		gem_close(fd, obj[i].handle);
 	}

Comments

Tvrtko Ursulin Feb. 11, 2019, 5:18 p.m.
On 11/02/2019 14:35, Chris Wilson wrote:
> basic-allocations was written to demonstrate a flaw in our continual
> reallocation of cmdparser shadow bo, largely fixed by keeping a small
> cache of bo of different lengths (to speed up the search for the correct
> sized bo). We only care enough to exercise the slowdown by submitting
> lots of execbufs, and can see the effect of bo caching on the rate, so
> replace the fixed number of iterations with a timeout and count how many
> batches we could submit instead.
> 
> Similarly, we now do not need to wait for all of our queue to complete
> as we can tell the kernel to drop the queue instead.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=107936
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/i915/gem_exec_parse.c | 18 +++++++++++-------
>   1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/tests/i915/gem_exec_parse.c b/tests/i915/gem_exec_parse.c
> index b653b1bdc..62e8d0a51 100644
> --- a/tests/i915/gem_exec_parse.c
> +++ b/tests/i915/gem_exec_parse.c
> @@ -303,15 +303,15 @@ test_lri(int fd, uint32_t handle, struct test_lri *test)
>   
>   static void test_allocations(int fd)
>   {
> -	uint32_t bbe = MI_BATCH_BUFFER_END;
> +	const uint32_t bbe = MI_BATCH_BUFFER_END;
>   	struct drm_i915_gem_execbuffer2 execbuf;
>   	struct drm_i915_gem_exec_object2 obj[17];
> -	int i, j;
> +	unsigned long count;
>   
>   	intel_require_memory(2, 1ull<<(12 + ARRAY_SIZE(obj)), CHECK_RAM);
>   
>   	memset(obj, 0, sizeof(obj));
> -	for (i = 0; i < ARRAY_SIZE(obj); i++) {
> +	for (int i = 0; i < ARRAY_SIZE(obj); i++) {
>   		uint64_t size = 1ull << (12 + i);
>   
>   		obj[i].handle = gem_create(fd, size);
> @@ -322,17 +322,21 @@ static void test_allocations(int fd)
>   
>   	memset(&execbuf, 0, sizeof(execbuf));
>   	execbuf.buffer_count = 1;
> -	for (j = 0; j < 16384; j++) {
> -		igt_progress("allocations ", j, 16384);
> -		i = rand() % ARRAY_SIZE(obj);
> +
> +	count = 0;
> +	igt_until_timeout(20) {
> +		int i = rand() % ARRAY_SIZE(obj);
>   		execbuf.buffers_ptr = to_user_pointer(&obj[i]);
>   		execbuf.batch_start_offset = (rand() % (1ull<<i)) << 12;
>   		execbuf.batch_start_offset += 64 * (rand() % 64);
>   		execbuf.batch_len = (1ull<<(12+i)) - execbuf.batch_start_offset;
>   		gem_execbuf(fd, &execbuf);
> +		count++;
>   	}
> +	igt_info("Submitted %lu execbufs\n", count);
> +	igt_drop_caches_set(fd, DROP_RESET_ACTIVE); /* Cancel the queued work */

Downside here is that tests start to exercise a lot more driver paths. 
Or is that an upside? It's confusing these days.

I'd prefer if we just let it run and don't involve wedge/unwedge. Well 
actually... we could modify the submit loop to sync a bit rather than 
build a queue for 20 seconds? Would sync after each execbuf be 
detrimental to test goals? Alternatively submit maybe ARRAY_SIZE worth 
and then sync?

Regards,

Tvrtko

>   
> -	for (i = 0; i < ARRAY_SIZE(obj); i++) {
> +	for (int i = 0; i < ARRAY_SIZE(obj); i++) {
>   		gem_sync(fd, obj[i].handle);
>   		gem_close(fd, obj[i].handle);
>   	}
>
Chris Wilson Feb. 11, 2019, 5:23 p.m.
Quoting Tvrtko Ursulin (2019-02-11 17:18:02)
> 
> On 11/02/2019 14:35, Chris Wilson wrote:
> > basic-allocations was written to demonstrate a flaw in our continual
> > reallocation of cmdparser shadow bo, largely fixed by keeping a small
> > cache of bo of different lengths (to speed up the search for the correct
> > sized bo). We only care enough to exercise the slowdown by submitting
> > lots of execbufs, and can see the effect of bo caching on the rate, so
> > replace the fixed number of iterations with a timeout and count how many
> > batches we could submit instead.
> > 
> > Similarly, we now do not need to wait for all of our queue to complete
> > as we can tell the kernel to drop the queue instead.
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=107936
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   tests/i915/gem_exec_parse.c | 18 +++++++++++-------
> >   1 file changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tests/i915/gem_exec_parse.c b/tests/i915/gem_exec_parse.c
> > index b653b1bdc..62e8d0a51 100644
> > --- a/tests/i915/gem_exec_parse.c
> > +++ b/tests/i915/gem_exec_parse.c
> > @@ -303,15 +303,15 @@ test_lri(int fd, uint32_t handle, struct test_lri *test)
> >   
> >   static void test_allocations(int fd)
> >   {
> > -     uint32_t bbe = MI_BATCH_BUFFER_END;
> > +     const uint32_t bbe = MI_BATCH_BUFFER_END;
> >       struct drm_i915_gem_execbuffer2 execbuf;
> >       struct drm_i915_gem_exec_object2 obj[17];
> > -     int i, j;
> > +     unsigned long count;
> >   
> >       intel_require_memory(2, 1ull<<(12 + ARRAY_SIZE(obj)), CHECK_RAM);
> >   
> >       memset(obj, 0, sizeof(obj));
> > -     for (i = 0; i < ARRAY_SIZE(obj); i++) {
> > +     for (int i = 0; i < ARRAY_SIZE(obj); i++) {
> >               uint64_t size = 1ull << (12 + i);
> >   
> >               obj[i].handle = gem_create(fd, size);
> > @@ -322,17 +322,21 @@ static void test_allocations(int fd)
> >   
> >       memset(&execbuf, 0, sizeof(execbuf));
> >       execbuf.buffer_count = 1;
> > -     for (j = 0; j < 16384; j++) {
> > -             igt_progress("allocations ", j, 16384);
> > -             i = rand() % ARRAY_SIZE(obj);
> > +
> > +     count = 0;
> > +     igt_until_timeout(20) {
> > +             int i = rand() % ARRAY_SIZE(obj);
> >               execbuf.buffers_ptr = to_user_pointer(&obj[i]);
> >               execbuf.batch_start_offset = (rand() % (1ull<<i)) << 12;
> >               execbuf.batch_start_offset += 64 * (rand() % 64);
> >               execbuf.batch_len = (1ull<<(12+i)) - execbuf.batch_start_offset;
> >               gem_execbuf(fd, &execbuf);
> > +             count++;
> >       }
> > +     igt_info("Submitted %lu execbufs\n", count);
> > +     igt_drop_caches_set(fd, DROP_RESET_ACTIVE); /* Cancel the queued work */
> 
> Downside here is that tests start to exercise a lot more driver paths. 
> Or is that an upside? It's confusing these days.
> 
> I'd prefer if we just let it run and don't involve wedge/unwedge. Well 
> actually... we could modify the submit loop to sync a bit rather than 
> build a queue for 20 seconds? Would sync after each execbuf be 
> detrimental to test goals? Alternatively submit maybe ARRAY_SIZE worth 
> and then sync?

Yes, syncing affects i915_gem_batch_pool.c. The length of the cache
lists is largely determined by the number of batches in flight.
-Chris
Chris Wilson Feb. 11, 2019, 8:37 p.m.
Quoting Chris Wilson (2019-02-11 17:23:57)
> Quoting Tvrtko Ursulin (2019-02-11 17:18:02)
> > I'd prefer if we just let it run and don't involve wedge/unwedge. Well 
> > actually... we could modify the submit loop to sync a bit rather than 
> > build a queue for 20 seconds? Would sync after each execbuf be 
> > detrimental to test goals? Alternatively submit maybe ARRAY_SIZE worth 
> > and then sync?
> 
> Yes, syncing affects i915_gem_batch_pool.c. The length of the cache
> lists is largely determined by the number of batches in flight.

Along those lines, it's probably worthwhile to stress fixed object sizes
as well, and make those batches long last, yet still quick to parse.

It's worth bearing in mind that another user of i915_gem_batch_pool are
GPU relocs, so this issue isn't just limited to gen7-cmdparser.
-Chris