[v2] drm/i915/selftests: Add a GuC doorbells selftest

Submitted by Michel Thierry on Oct. 26, 2017, 6:51 p.m.

Details

Message ID 20171026185142.33176-1-michel.thierry@intel.com
State New
Headers show
Series "drm/i915/selftests: Add a GuC doorbells selftest" ( rev: 2 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Michel Thierry Oct. 26, 2017, 6:51 p.m.
Try to create as many clients as it is currently possible (currently
limited to max number of doorbells) and exercise the doorbell
alloc/dealloc code.

Since our usage mode require very few clients/doorbells, this code has
been exercised very lightly and it's good to have a simple test for it.

As reference, this test already helped identify the bug fixed by
commit 7f1ea2ac3017 ("drm/i915/guc: Fix doorbell id selection").

v2: Extend number of clients; check for client allocation failure when
number of doorbells is exceeded; validate client properties; reuse
guc_init_doorbell_hw (Chris).

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_guc_submission.c         |  15 +-
 .../gpu/drm/i915/selftests/i915_live_selftests.h   |   1 +
 drivers/gpu/drm/i915/selftests/intel_guc.c         | 186 +++++++++++++++++++++
 3 files changed, 199 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_guc.c

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 13d383cdc4c9..e643f47c8697 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -725,10 +725,15 @@  static int __reset_doorbell(struct i915_guc_client* client, u16 db_id)
  * client. Also, tell GuC about all the doorbells in use by all clients.
  * We do this because the KMD, the GuC and the doorbell HW can easily go out of
  * sync (e.g. we can reset the GuC, but not the doorbel HW).
+ *
+ * The optional test_client is a hook to use this function with multiple
+ * clients in the guc selftest.
  */
-static int guc_init_doorbell_hw(struct intel_guc *guc)
+static int guc_init_doorbell_hw(struct intel_guc *guc,
+				struct i915_guc_client *test_client)
 {
-	struct i915_guc_client *client = guc->execbuf_client;
+	struct i915_guc_client *client = !test_client ? guc->execbuf_client :
+							test_client;
 	bool recreate_first_client = false;
 	u16 db_id;
 	int ret;
@@ -1244,7 +1249,7 @@  int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
 
 	guc_reset_wq(client);
 
-	err = guc_init_doorbell_hw(guc);
+	err = guc_init_doorbell_hw(guc, NULL);
 	if (err)
 		goto err_execbuf_client;
 
@@ -1280,3 +1285,7 @@  void i915_guc_submission_disable(struct drm_i915_private *dev_priv)
 	guc_client_free(guc->execbuf_client);
 	guc->execbuf_client = NULL;
 }
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/intel_guc.c"
+#endif
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index 54a73534b37e..1b750e92dd08 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -19,3 +19,4 @@  selftest(evict, i915_gem_evict_live_selftests)
 selftest(hugepages, i915_gem_huge_page_live_selftests)
 selftest(contexts, i915_gem_context_live_selftests)
 selftest(hangcheck, intel_hangcheck_live_selftests)
+selftest(guc, intel_guc_live_selftest)
diff --git a/drivers/gpu/drm/i915/selftests/intel_guc.c b/drivers/gpu/drm/i915/selftests/intel_guc.c
new file mode 100644
index 000000000000..86e0be919d7f
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/intel_guc.c
@@ -0,0 +1,186 @@ 
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "../i915_selftest.h"
+
+/* max doorbell number + negative test for each client type */
+#define ATTEMPTS (GUC_NUM_DOORBELLS + GUC_CLIENT_PRIORITY_NUM)
+
+struct i915_guc_client *clients[ATTEMPTS];
+
+static bool available_dbs(struct intel_guc *guc, u32 priority)
+{
+	unsigned long offset;
+	unsigned long end;
+	u16 id;
+
+	/* first hal is used for normal priority, second half for high */
+	offset = 0;
+	end = GUC_NUM_DOORBELLS/2;
+	if (priority <= GUC_CLIENT_PRIORITY_HIGH) {
+		offset = end;
+		end += offset;
+	}
+
+	id = find_next_zero_bit(guc->doorbell_bitmap, end, offset);
+	if (id < end)
+		return true;
+
+	return false;
+}
+
+static int check_all_doorbells(struct intel_guc *guc)
+{
+	u16 db_id;
+
+	pr_info_once("Max number of doorbells: %d", GUC_NUM_DOORBELLS);
+	for (db_id = 0; db_id < GUC_NUM_DOORBELLS; ++db_id) {
+		if (!doorbell_ok(guc, db_id)) {
+			pr_err("doorbell %d, not ok\n", db_id);
+			return -EIO;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Create as many clients as number of doorbells (note that there's already
+ * one client/db created during driver load).
+ */
+static int igt_guc_doorbells(void *arg)
+{
+	struct drm_i915_private *dev_priv = arg;
+	struct intel_guc *guc;
+	int i, err = 0;
+	u16 db_id;
+
+	DRM_INFO("GuC Doorbells selftest\n");
+	GEM_BUG_ON(!HAS_GUC(dev_priv));
+	mutex_lock(&dev_priv->drm.struct_mutex);
+
+	guc = &dev_priv->guc;
+	if (!guc) {
+		pr_err("No guc object!\n");
+		err = -EINVAL;
+		goto unlock;
+	}
+
+	err = check_all_doorbells(guc);
+	if (err)
+		goto unlock;
+
+	for (i = 0; i < ATTEMPTS; i++) {
+		clients[i] = guc_client_alloc(dev_priv,
+					      INTEL_INFO(dev_priv)->ring_mask,
+					      i % GUC_CLIENT_PRIORITY_NUM,
+					      dev_priv->kernel_context);
+
+		if (!clients[i]) {
+			pr_err("[%d] No guc client\n", i);
+			err = -EINVAL;
+			goto out;
+		}
+
+		if (IS_ERR(clients[i])) {
+			if (PTR_ERR(clients[i]) != -ENOSPC) {
+				pr_err("[%d] unexpected error\n", i);
+				err = PTR_ERR(clients[i]);
+				goto out;
+			}
+
+			if (available_dbs(guc, i % GUC_CLIENT_PRIORITY_NUM)) {
+				pr_err("unexpected client alloc failure\n");
+				err = -EINVAL;
+				goto out;
+			}
+
+			/* expected, ran out of dbs for this client type */
+			continue;
+		}
+
+		/*
+		 * The check below is only valid while we keep a doorbell
+		 * assigned during the whole life of the client.
+		 */
+		if (clients[i]->stage_id >= GUC_NUM_DOORBELLS) {
+			pr_err("more clients than doorbells (%d >= %d)\n",
+			       clients[i]->id, GUC_NUM_DOORBELLS);
+			err = -EINVAL;
+			goto out;
+		}
+
+		/* basic client sanity check */
+		if (clients[i]->owner != dev_priv->kernel_context ||
+		    clients[i]->engines != INTEL_INFO(dev_priv)->ring_mask ||
+		    clients[i]->priority != (i % GUC_CLIENT_PRIORITY_NUM) ||
+		    clients[i]->doorbell_id == GUC_DOORBELL_INVALID) {
+			pr_err("[%d] client_alloc sanity check failed!\n", i);
+			err = -EINVAL;
+			goto out;
+		}
+
+		db_id = clients[i]->doorbell_id;
+
+		/* recreate client's doorbell using existing code */
+		err = guc_init_doorbell_hw(guc, clients[i]);
+		if (err) {
+			pr_err("Failed to create a doorbell\n");
+			goto out;
+		}
+
+		/* doorbell id shouldn't change, we are holding the mutex */
+		if (db_id != clients[i]->doorbell_id) {
+			pr_err("doorbell id changed (%d - %d)\n",
+			       db_id, clients[i]->doorbell_id);
+			err = -EINVAL;
+			goto out;
+		}
+
+		/* guc_init_doorbell_hw already did this, but don't trust it */
+		err = check_all_doorbells(guc);
+		if (err)
+			goto out;
+	}
+
+out:
+	for (i = 0; i < ATTEMPTS; i++)
+		if (!IS_ERR_OR_NULL(clients[i]))
+			guc_client_free(clients[i]);
+unlock:
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+	return err;
+}
+
+int intel_guc_live_selftest(struct drm_i915_private *dev_priv)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(igt_guc_doorbells),
+	};
+
+	if (!i915_modparams.enable_guc_submission)
+		return 0;
+
+	return i915_subtests(tests, dev_priv);
+}

Comments

On 26/10/17 11:51, Michel Thierry wrote:
> Try to create as many clients as it is currently possible (currently
> limited to max number of doorbells) and exercise the doorbell
> alloc/dealloc code.
> 
> Since our usage mode require very few clients/doorbells, this code has
> been exercised very lightly and it's good to have a simple test for it.
> 
> As reference, this test already helped identify the bug fixed by
> commit 7f1ea2ac3017 ("drm/i915/guc: Fix doorbell id selection").
> 
> v2: Extend number of clients; check for client allocation failure when
> number of doorbells is exceeded; validate client properties; reuse
> guc_init_doorbell_hw (Chris).

This will conflict with Michal's patch ("Add a second client, to be used 
for preemption") but I want to get feedback about reusing 
guc_init_doorbell_hw in the selftest.

Thanks,

> 
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Winiarski <michal.winiarski@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_guc_submission.c         |  15 +-
>   .../gpu/drm/i915/selftests/i915_live_selftests.h   |   1 +
>   drivers/gpu/drm/i915/selftests/intel_guc.c         | 186 +++++++++++++++++++++
>   3 files changed, 199 insertions(+), 3 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_guc.c
> 
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 13d383cdc4c9..e643f47c8697 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -725,10 +725,15 @@ static int __reset_doorbell(struct i915_guc_client* client, u16 db_id)
>    * client. Also, tell GuC about all the doorbells in use by all clients.
>    * We do this because the KMD, the GuC and the doorbell HW can easily go out of
>    * sync (e.g. we can reset the GuC, but not the doorbel HW).
> + *
> + * The optional test_client is a hook to use this function with multiple
> + * clients in the guc selftest.
>    */
> -static int guc_init_doorbell_hw(struct intel_guc *guc)
> +static int guc_init_doorbell_hw(struct intel_guc *guc,
> +                               struct i915_guc_client *test_client)
>   {
> -       struct i915_guc_client *client = guc->execbuf_client;
> +       struct i915_guc_client *client = !test_client ? guc->execbuf_client :
> +                                                       test_client;
>          bool recreate_first_client = false;
>          u16 db_id;
>          int ret;
> @@ -1244,7 +1249,7 @@ int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
> 
>          guc_reset_wq(client);
> 
> -       err = guc_init_doorbell_hw(guc);
> +       err = guc_init_doorbell_hw(guc, NULL);
>          if (err)
>                  goto err_execbuf_client;
> 
> @@ -1280,3 +1285,7 @@ void i915_guc_submission_disable(struct drm_i915_private *dev_priv)
>          guc_client_free(guc->execbuf_client);
>          guc->execbuf_client = NULL;
>   }
> +
> +#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> +#include "selftests/intel_guc.c"
> +#endif
> diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
> index 54a73534b37e..1b750e92dd08 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
> +++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
> @@ -19,3 +19,4 @@ selftest(evict, i915_gem_evict_live_selftests)
>   selftest(hugepages, i915_gem_huge_page_live_selftests)
>   selftest(contexts, i915_gem_context_live_selftests)
>   selftest(hangcheck, intel_hangcheck_live_selftests)
> +selftest(guc, intel_guc_live_selftest)
> diff --git a/drivers/gpu/drm/i915/selftests/intel_guc.c b/drivers/gpu/drm/i915/selftests/intel_guc.c
> new file mode 100644
> index 000000000000..86e0be919d7f
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/selftests/intel_guc.c
> @@ -0,0 +1,186 @@
> +/*
> + * Copyright © 2017 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "../i915_selftest.h"
> +
> +/* max doorbell number + negative test for each client type */
> +#define ATTEMPTS (GUC_NUM_DOORBELLS + GUC_CLIENT_PRIORITY_NUM)
> +
> +struct i915_guc_client *clients[ATTEMPTS];
> +
> +static bool available_dbs(struct intel_guc *guc, u32 priority)
> +{
> +       unsigned long offset;
> +       unsigned long end;
> +       u16 id;
> +
> +       /* first hal is used for normal priority, second half for high */
> +       offset = 0;
> +       end = GUC_NUM_DOORBELLS/2;
> +       if (priority <= GUC_CLIENT_PRIORITY_HIGH) {
> +               offset = end;
> +               end += offset;
> +       }
> +
> +       id = find_next_zero_bit(guc->doorbell_bitmap, end, offset);
> +       if (id < end)
> +               return true;
> +
> +       return false;
> +}
> +
> +static int check_all_doorbells(struct intel_guc *guc)
> +{
> +       u16 db_id;
> +
> +       pr_info_once("Max number of doorbells: %d", GUC_NUM_DOORBELLS);
> +       for (db_id = 0; db_id < GUC_NUM_DOORBELLS; ++db_id) {
> +               if (!doorbell_ok(guc, db_id)) {
> +                       pr_err("doorbell %d, not ok\n", db_id);
> +                       return -EIO;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +/*
> + * Create as many clients as number of doorbells (note that there's already
> + * one client/db created during driver load).
> + */
> +static int igt_guc_doorbells(void *arg)
> +{
> +       struct drm_i915_private *dev_priv = arg;
> +       struct intel_guc *guc;
> +       int i, err = 0;
> +       u16 db_id;
> +
> +       DRM_INFO("GuC Doorbells selftest\n");
> +       GEM_BUG_ON(!HAS_GUC(dev_priv));
> +       mutex_lock(&dev_priv->drm.struct_mutex);
> +
> +       guc = &dev_priv->guc;
> +       if (!guc) {
> +               pr_err("No guc object!\n");
> +               err = -EINVAL;
> +               goto unlock;
> +       }
> +
> +       err = check_all_doorbells(guc);
> +       if (err)
> +               goto unlock;
> +
> +       for (i = 0; i < ATTEMPTS; i++) {
> +               clients[i] = guc_client_alloc(dev_priv,
> +                                             INTEL_INFO(dev_priv)->ring_mask,
> +                                             i % GUC_CLIENT_PRIORITY_NUM,
> +                                             dev_priv->kernel_context);
> +
> +               if (!clients[i]) {
> +                       pr_err("[%d] No guc client\n", i);
> +                       err = -EINVAL;
> +                       goto out;
> +               }
> +
> +               if (IS_ERR(clients[i])) {
> +                       if (PTR_ERR(clients[i]) != -ENOSPC) {
> +                               pr_err("[%d] unexpected error\n", i);
> +                               err = PTR_ERR(clients[i]);
> +                               goto out;
> +                       }
> +
> +                       if (available_dbs(guc, i % GUC_CLIENT_PRIORITY_NUM)) {
> +                               pr_err("unexpected client alloc failure\n");
> +                               err = -EINVAL;
> +                               goto out;
> +                       }
> +
> +                       /* expected, ran out of dbs for this client type */
> +                       continue;
> +               }
> +
> +               /*
> +                * The check below is only valid while we keep a doorbell
> +                * assigned during the whole life of the client.
> +                */
> +               if (clients[i]->stage_id >= GUC_NUM_DOORBELLS) {
> +                       pr_err("more clients than doorbells (%d >= %d)\n",
> +                              clients[i]->id, GUC_NUM_DOORBELLS);
> +                       err = -EINVAL;
> +                       goto out;
> +               }
> +
> +               /* basic client sanity check */
> +               if (clients[i]->owner != dev_priv->kernel_context ||
> +                   clients[i]->engines != INTEL_INFO(dev_priv)->ring_mask ||
> +                   clients[i]->priority != (i % GUC_CLIENT_PRIORITY_NUM) ||
> +                   clients[i]->doorbell_id == GUC_DOORBELL_INVALID) {
> +                       pr_err("[%d] client_alloc sanity check failed!\n", i);
> +                       err = -EINVAL;
> +                       goto out;
> +               }
> +
> +               db_id = clients[i]->doorbell_id;
> +
> +               /* recreate client's doorbell using existing code */
> +               err = guc_init_doorbell_hw(guc, clients[i]);
> +               if (err) {
> +                       pr_err("Failed to create a doorbell\n");
> +                       goto out;
> +               }
> +
> +               /* doorbell id shouldn't change, we are holding the mutex */
> +               if (db_id != clients[i]->doorbell_id) {
> +                       pr_err("doorbell id changed (%d - %d)\n",
> +                              db_id, clients[i]->doorbell_id);
> +                       err = -EINVAL;
> +                       goto out;
> +               }
> +
> +               /* guc_init_doorbell_hw already did this, but don't trust it */
> +               err = check_all_doorbells(guc);
> +               if (err)
> +                       goto out;
> +       }
> +
> +out:
> +       for (i = 0; i < ATTEMPTS; i++)
> +               if (!IS_ERR_OR_NULL(clients[i]))
> +                       guc_client_free(clients[i]);
> +unlock:
> +       mutex_unlock(&dev_priv->drm.struct_mutex);
> +       return err;
> +}
> +
> +int intel_guc_live_selftest(struct drm_i915_private *dev_priv)
> +{
> +       static const struct i915_subtest tests[] = {
> +               SUBTEST(igt_guc_doorbells),
> +       };
> +
> +       if (!i915_modparams.enable_guc_submission)
> +               return 0;
> +
> +       return i915_subtests(tests, dev_priv);
> +}
> --
> 2.14.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Quoting Michel Thierry (2017-10-26 19:54:06)
> On 26/10/17 11:51, Michel Thierry wrote:
> > Try to create as many clients as it is currently possible (currently
> > limited to max number of doorbells) and exercise the doorbell
> > alloc/dealloc code.
> > 
> > Since our usage mode require very few clients/doorbells, this code has
> > been exercised very lightly and it's good to have a simple test for it.
> > 
> > As reference, this test already helped identify the bug fixed by
> > commit 7f1ea2ac3017 ("drm/i915/guc: Fix doorbell id selection").
> > 
> > v2: Extend number of clients; check for client allocation failure when
> > number of doorbells is exceeded; validate client properties; reuse
> > guc_init_doorbell_hw (Chris).
> 
> This will conflict with Michal's patch ("Add a second client, to be used 
> for preemption") but I want to get feedback about reusing 
> guc_init_doorbell_hw in the selftest.

It's more work, but I think you basically want both. You want unittests
for the individual lowlevel functions, plus you want to unittest init_hw
as it will typically be called after reset so a failure there may be
hard to disentangle from the cause of the reset.

In essence, we want to treat the doorbell data struct as a library api
and so exercise its contract (if it was easy enough, I would suggest
mocking the hw interactions). And then similar at the next level, we
want to look at the interface guc exposes to the driver and check that
behaves as expected (that is more akin to live testing with hw).

As such, they are probably two different subtests. :)
-Chris