[48/55] drm/i915: Add *_ring_begin() to request allocation

Submitted by John.C.Harrison@Intel.com on May 29, 2015, 4:44 p.m.

Details

Message ID 1432917856-12261-49-git-send-email-John.C.Harrison@Intel.com
State New
Headers show

Not browsing as part of any series.

Commit Message

John.C.Harrison@Intel.com May 29, 2015, 4:44 p.m.
From: John Harrison <John.C.Harrison@Intel.com>

Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         |   25 +++++++++++++------------
 drivers/gpu/drm/i915/intel_lrc.c        |   15 +++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |    1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   29 ++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
 5 files changed, 46 insertions(+), 25 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9f3e0717..1261792 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2680,19 +2680,20 @@  int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	 * i915_add_request() call can't fail. Note that the reserve may need
 	 * to be redone if the request is not actually submitted straight
 	 * away, e.g. because a GPU scheduler has deferred it.
-	 *
-	 * Note further that this call merely notes the reserve request. A
-	 * subsequent call to *_ring_begin() is required to actually ensure
-	 * that the reservation is available. Without the begin, if the
-	 * request creator immediately submitted the request without adding
-	 * any commands to it then there might not actually be sufficient
-	 * room for the submission commands. Unfortunately, the current
-	 * *_ring_begin() implementations potentially call back here to
-	 * i915_gem_request_alloc(). Thus calling _begin() here would lead to
-	 * infinite recursion! Until that back call path is removed, it is
-	 * necessary to do a manual _begin() outside.
 	 */
-	intel_ring_reserved_space_reserve(req->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
+	if (i915.enable_execlists)
+		ret = intel_logical_ring_reserve_space(req);
+	else
+		ret = intel_ring_reserve_space(req);
+	if (ret) {
+		/*
+		 * At this point, the request is fully allocated even if not
+		 * fully prepared. Thus it can be cleaned up using the proper
+		 * free code.
+		 */
+		i915_gem_request_cancel(req);
+		return ret;
+	}
 
 	*req_out = ring->outstanding_lazy_request = req;
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 548c53d..e164ac0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -823,6 +823,21 @@  static int intel_logical_ring_begin(struct drm_i915_gem_request *req,
 	return 0;
 }
 
+int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request)
+{
+	/*
+	 * The first call merely notes the reserve request and is common for
+	 * all back ends. The subsequent localised _begin() call actually
+	 * ensures that the reservation is available. Without the begin, if
+	 * the request creator immediately submitted the request without
+	 * adding any commands to it then there might not actually be
+	 * sufficient room for the submission commands.
+	 */
+	intel_ring_reserved_space_reserve(request->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
+
+	return intel_logical_ring_begin(request, 0);
+}
+
 /**
  * execlists_submission() - submit a batchbuffer for execution, Execlists style
  * @dev: DRM device.
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 044c0e5..f59940a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -37,6 +37,7 @@ 
 
 /* Logical Rings */
 int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request);
+int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request);
 void intel_logical_ring_stop(struct intel_engine_cs *ring);
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
 int intel_logical_rings_init(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index bb10fc2..0ba5787 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2192,24 +2192,27 @@  int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
 	return 0;
 }
 
+int intel_ring_reserve_space(struct drm_i915_gem_request *request)
+{
+	/*
+	 * The first call merely notes the reserve request and is common for
+	 * all back ends. The subsequent localised _begin() call actually
+	 * ensures that the reservation is available. Without the begin, if
+	 * the request creator immediately submitted the request without
+	 * adding any commands to it then there might not actually be
+	 * sufficient room for the submission commands.
+	 */
+	intel_ring_reserved_space_reserve(request->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
+
+	return intel_ring_begin(request, 0);
+}
+
 void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
 {
-	/* NB: Until request management is fully tidied up and the OLR is
-	 * removed, there are too many ways for get false hits on this
-	 * anti-recursion check! */
-	/*WARN_ON(ringbuf->reserved_size);*/
+	WARN_ON(ringbuf->reserved_size);
 	WARN_ON(ringbuf->reserved_in_use);
 
 	ringbuf->reserved_size = size;
-
-	/*
-	 * Really need to call _begin() here but that currently leads to
-	 * recursion problems! This will be fixed later but for now just
-	 * return and hope for the best. Note that there is only a real
-	 * problem if the create of the request never actually calls _begin()
-	 * but if they are not submitting any work then why did they create
-	 * the request in the first place?
-	 */
 }
 
 void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 16fd9ba..f4633ca 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -450,6 +450,7 @@  intel_ring_get_request(struct intel_engine_cs *ring)
 
 #define MIN_SPACE_FOR_ADD_REQUEST	128
 
+int intel_ring_reserve_space(struct drm_i915_gem_request *request);
 void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size);
 void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf);
 void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);

Comments

On Fri, May 29, 2015 at 05:44:09PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Now that the *_ring_begin() functions no longer call the request allocation
> code, it is finally safe for the request allocation code to call *_ring_begin().
> This is important to guarantee that the space reserved for the subsequent
> i915_add_request() call does actually get reserved.
> 
> v2: Renamed functions according to review feedback (Tomas Elf).
> 
> For: VIZ-5115
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Still has my question open from the previos round:

http://mid.gmane.org/20150323091030.GL1349@phenom.ffwll.local

Note that this isn't all that unlikely with GuC mode since there the
ringbuffer is substantially smaller (due to firmware limitations) than
what we allocate ourselves right now.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c         |   25 +++++++++++++------------
>  drivers/gpu/drm/i915/intel_lrc.c        |   15 +++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h        |    1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   29 ++++++++++++++++-------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    1 +
>  5 files changed, 46 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9f3e0717..1261792 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2680,19 +2680,20 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  	 * i915_add_request() call can't fail. Note that the reserve may need
>  	 * to be redone if the request is not actually submitted straight
>  	 * away, e.g. because a GPU scheduler has deferred it.
> -	 *
> -	 * Note further that this call merely notes the reserve request. A
> -	 * subsequent call to *_ring_begin() is required to actually ensure
> -	 * that the reservation is available. Without the begin, if the
> -	 * request creator immediately submitted the request without adding
> -	 * any commands to it then there might not actually be sufficient
> -	 * room for the submission commands. Unfortunately, the current
> -	 * *_ring_begin() implementations potentially call back here to
> -	 * i915_gem_request_alloc(). Thus calling _begin() here would lead to
> -	 * infinite recursion! Until that back call path is removed, it is
> -	 * necessary to do a manual _begin() outside.
>  	 */
> -	intel_ring_reserved_space_reserve(req->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
> +	if (i915.enable_execlists)
> +		ret = intel_logical_ring_reserve_space(req);
> +	else
> +		ret = intel_ring_reserve_space(req);
> +	if (ret) {
> +		/*
> +		 * At this point, the request is fully allocated even if not
> +		 * fully prepared. Thus it can be cleaned up using the proper
> +		 * free code.
> +		 */
> +		i915_gem_request_cancel(req);
> +		return ret;
> +	}
>  
>  	*req_out = ring->outstanding_lazy_request = req;
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 548c53d..e164ac0 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -823,6 +823,21 @@ static int intel_logical_ring_begin(struct drm_i915_gem_request *req,
>  	return 0;
>  }
>  
> +int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request)
> +{
> +	/*
> +	 * The first call merely notes the reserve request and is common for
> +	 * all back ends. The subsequent localised _begin() call actually
> +	 * ensures that the reservation is available. Without the begin, if
> +	 * the request creator immediately submitted the request without
> +	 * adding any commands to it then there might not actually be
> +	 * sufficient room for the submission commands.
> +	 */
> +	intel_ring_reserved_space_reserve(request->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
> +
> +	return intel_logical_ring_begin(request, 0);
> +}
> +
>  /**
>   * execlists_submission() - submit a batchbuffer for execution, Execlists style
>   * @dev: DRM device.
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 044c0e5..f59940a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -37,6 +37,7 @@
>  
>  /* Logical Rings */
>  int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request);
> +int intel_logical_ring_reserve_space(struct drm_i915_gem_request *request);
>  void intel_logical_ring_stop(struct intel_engine_cs *ring);
>  void intel_logical_ring_cleanup(struct intel_engine_cs *ring);
>  int intel_logical_rings_init(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index bb10fc2..0ba5787 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2192,24 +2192,27 @@ int intel_ring_alloc_request_extras(struct drm_i915_gem_request *request)
>  	return 0;
>  }
>  
> +int intel_ring_reserve_space(struct drm_i915_gem_request *request)
> +{
> +	/*
> +	 * The first call merely notes the reserve request and is common for
> +	 * all back ends. The subsequent localised _begin() call actually
> +	 * ensures that the reservation is available. Without the begin, if
> +	 * the request creator immediately submitted the request without
> +	 * adding any commands to it then there might not actually be
> +	 * sufficient room for the submission commands.
> +	 */
> +	intel_ring_reserved_space_reserve(request->ringbuf, MIN_SPACE_FOR_ADD_REQUEST);
> +
> +	return intel_ring_begin(request, 0);
> +}
> +
>  void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size)
>  {
> -	/* NB: Until request management is fully tidied up and the OLR is
> -	 * removed, there are too many ways for get false hits on this
> -	 * anti-recursion check! */
> -	/*WARN_ON(ringbuf->reserved_size);*/
> +	WARN_ON(ringbuf->reserved_size);
>  	WARN_ON(ringbuf->reserved_in_use);
>  
>  	ringbuf->reserved_size = size;
> -
> -	/*
> -	 * Really need to call _begin() here but that currently leads to
> -	 * recursion problems! This will be fixed later but for now just
> -	 * return and hope for the best. Note that there is only a real
> -	 * problem if the create of the request never actually calls _begin()
> -	 * but if they are not submitting any work then why did they create
> -	 * the request in the first place?
> -	 */
>  }
>  
>  void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 16fd9ba..f4633ca 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -450,6 +450,7 @@ intel_ring_get_request(struct intel_engine_cs *ring)
>  
>  #define MIN_SPACE_FOR_ADD_REQUEST	128
>  
> +int intel_ring_reserve_space(struct drm_i915_gem_request *request);
>  void intel_ring_reserved_space_reserve(struct intel_ringbuffer *ringbuf, int size);
>  void intel_ring_reserved_space_cancel(struct intel_ringbuffer *ringbuf);
>  void intel_ring_reserved_space_use(struct intel_ringbuffer *ringbuf);
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Jun 17, 2015 at 03:31:59PM +0200, Daniel Vetter wrote:
> On Fri, May 29, 2015 at 05:44:09PM +0100, John.C.Harrison@Intel.com wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > Now that the *_ring_begin() functions no longer call the request allocation
> > code, it is finally safe for the request allocation code to call *_ring_begin().
> > This is important to guarantee that the space reserved for the subsequent
> > i915_add_request() call does actually get reserved.
> > 
> > v2: Renamed functions according to review feedback (Tomas Elf).
> > 
> > For: VIZ-5115
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> 
> Still has my question open from the previos round:
> 
> http://mid.gmane.org/20150323091030.GL1349@phenom.ffwll.local
> 
> Note that this isn't all that unlikely with GuC mode since there the
> ringbuffer is substantially smaller (due to firmware limitations) than
> what we allocate ourselves right now.

Looking at this patch, I am still fundamentally opposed to reserving
space for the request. Detecting a request that wraps and cancelling
that request (after the appropriate WARN for the overlow) is trivial and
such a rare case (as it is a programming error) that it should only be
handled in the slow path.
-Chris
On Wed, Jun 17, 2015 at 03:27:08PM +0100, Chris Wilson wrote:
> On Wed, Jun 17, 2015 at 03:31:59PM +0200, Daniel Vetter wrote:
> > On Fri, May 29, 2015 at 05:44:09PM +0100, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > Now that the *_ring_begin() functions no longer call the request allocation
> > > code, it is finally safe for the request allocation code to call *_ring_begin().
> > > This is important to guarantee that the space reserved for the subsequent
> > > i915_add_request() call does actually get reserved.
> > > 
> > > v2: Renamed functions according to review feedback (Tomas Elf).
> > > 
> > > For: VIZ-5115
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > 
> > Still has my question open from the previos round:
> > 
> > http://mid.gmane.org/20150323091030.GL1349@phenom.ffwll.local
> > 
> > Note that this isn't all that unlikely with GuC mode since there the
> > ringbuffer is substantially smaller (due to firmware limitations) than
> > what we allocate ourselves right now.
> 
> Looking at this patch, I am still fundamentally opposed to reserving
> space for the request. Detecting a request that wraps and cancelling
> that request (after the appropriate WARN for the overlow) is trivial and
> such a rare case (as it is a programming error) that it should only be
> handled in the slow path.

I thought the entire point here that we don't have request half-committed
because the final request ringcmds didn't fit in. And that does require
that we reserve a bit of space for that postamble.

I guess if it's too much (atm it's super-pessimistic due to ilk) we can
make per-platform reservation limits to be really minimal.

Maybe we could go towards a rollback model longterm of rewingind the
ringbuffer. But if there's no clear need I'd like to avoid that
complexity.
-Daniel
On Wed, Jun 17, 2015 at 04:54:42PM +0200, Daniel Vetter wrote:
> On Wed, Jun 17, 2015 at 03:27:08PM +0100, Chris Wilson wrote:
> > On Wed, Jun 17, 2015 at 03:31:59PM +0200, Daniel Vetter wrote:
> > > On Fri, May 29, 2015 at 05:44:09PM +0100, John.C.Harrison@Intel.com wrote:
> > > > From: John Harrison <John.C.Harrison@Intel.com>
> > > > 
> > > > Now that the *_ring_begin() functions no longer call the request allocation
> > > > code, it is finally safe for the request allocation code to call *_ring_begin().
> > > > This is important to guarantee that the space reserved for the subsequent
> > > > i915_add_request() call does actually get reserved.
> > > > 
> > > > v2: Renamed functions according to review feedback (Tomas Elf).
> > > > 
> > > > For: VIZ-5115
> > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > Still has my question open from the previos round:
> > > 
> > > http://mid.gmane.org/20150323091030.GL1349@phenom.ffwll.local
> > > 
> > > Note that this isn't all that unlikely with GuC mode since there the
> > > ringbuffer is substantially smaller (due to firmware limitations) than
> > > what we allocate ourselves right now.
> > 
> > Looking at this patch, I am still fundamentally opposed to reserving
> > space for the request. Detecting a request that wraps and cancelling
> > that request (after the appropriate WARN for the overlow) is trivial and
> > such a rare case (as it is a programming error) that it should only be
> > handled in the slow path.
> 
> I thought the entire point here that we don't have request half-committed
> because the final request ringcmds didn't fit in. And that does require
> that we reserve a bit of space for that postamble.
> 
> I guess if it's too much (atm it's super-pessimistic due to ilk) we can
> make per-platform reservation limits to be really minimal.
> 
> Maybe we could go towards a rollback model longterm of rewingind the
> ringbuffer. But if there's no clear need I'd like to avoid that
> complexity.

Even if you didn't like the rollback model which helps handling the
partial state from context switches and what not, if you run out of
ringspace you can set the GPU as wedged. Issuing a request that fills
the entire ringbuffer is a programming bug that needs to be caught very
early in development.
-Chris
On 17/06/2015 16:52, Chris Wilson wrote:
> On Wed, Jun 17, 2015 at 04:54:42PM +0200, Daniel Vetter wrote:
>> On Wed, Jun 17, 2015 at 03:27:08PM +0100, Chris Wilson wrote:
>>> On Wed, Jun 17, 2015 at 03:31:59PM +0200, Daniel Vetter wrote:
>>>> On Fri, May 29, 2015 at 05:44:09PM +0100, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> Now that the *_ring_begin() functions no longer call the request allocation
>>>>> code, it is finally safe for the request allocation code to call *_ring_begin().
>>>>> This is important to guarantee that the space reserved for the subsequent
>>>>> i915_add_request() call does actually get reserved.
>>>>>
>>>>> v2: Renamed functions according to review feedback (Tomas Elf).
>>>>>
>>>>> For: VIZ-5115
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> Still has my question open from the previos round:
>>>>
>>>> http://mid.gmane.org/20150323091030.GL1349@phenom.ffwll.local
>>>>
>>>> Note that this isn't all that unlikely with GuC mode since there the
>>>> ringbuffer is substantially smaller (due to firmware limitations) than
>>>> what we allocate ourselves right now.
>>> Looking at this patch, I am still fundamentally opposed to reserving
>>> space for the request. Detecting a request that wraps and cancelling
>>> that request (after the appropriate WARN for the overlow) is trivial and
>>> such a rare case (as it is a programming error) that it should only be
>>> handled in the slow path.
>> I thought the entire point here that we don't have request half-committed
>> because the final request ringcmds didn't fit in. And that does require
>> that we reserve a bit of space for that postamble.
>>
>> I guess if it's too much (atm it's super-pessimistic due to ilk) we can
>> make per-platform reservation limits to be really minimal.
>>
>> Maybe we could go towards a rollback model longterm of rewingind the
>> ringbuffer. But if there's no clear need I'd like to avoid that
>> complexity.
> Even if you didn't like the rollback model which helps handling the
> partial state from context switches and what not, if you run out of
> ringspace you can set the GPU as wedged. Issuing a request that fills
> the entire ringbuffer is a programming bug that needs to be caught very
> early in development.
> -Chris
>

I'm still confused by what you are saying in the above referenced email. 
Part of it is about the sanity checks failing to handle the wrapping 
case correctly which has been fixed in the base reserve space patch 
(patch 2 in the series). The rest is either saying that you think we are 
potentially wrappping too early and wasting a few bytes of the ring 
buffer or that something is actually broken?

Point 2: 100 bytes of reserve, 160 bytes of execbuf and 200 bytes 
remaining. You seem to think this will fail somehow? Why? The 
wait_for_space(160) in the execbuf code will cause a wrap because the 
the 100 bytes for the add_request reservation is added on and the wait 
is actually being done for 260 bytes. So yes, we wrap earlier than would 
otherwise have been necessary but that is the only way to absolutely 
guarantee that the add_request() call cannot fail when trying to do the 
wrap itself.

As Chris says, if the driver is attempting to create a single request 
that fills the entire ringbuffer then that is a bug that should be 
caught as soon as possible. Even with a Guc, the ring buffer is not 
small compared to the size of requests the driver currently produces. 
Part of the scheduler work is to limit the number of batch buffers that 
a given application/context can have outstanding in the ring buffer at 
any given time in order to prevent starvation of the rest of the system 
by one badly behaved app. Thus completely filling a large ring buffer 
becomes impossible anyway - the application will be blocked before it 
gets that far.

Note that with the removal of the OLR, all requests now have a definite 
start and a definite end. Thus the scheme could be extended to provide 
rollback of the ring buffer. Each new request takes a note of the ring 
pointers at creation time. If the request is cancelled it can reset the 
pointers to where they were before. Thus all half submitted work is 
discarded. That is a much bigger semantic change however, so I would 
really like to get the bare minimum anti-OLR patch set in first before 
trying to do fancy extra features.
On Thu, Jun 18, 2015 at 1:21 PM, John Harrison
<John.C.Harrison@intel.com> wrote:
> I'm still confused by what you are saying in the above referenced email.
> Part of it is about the sanity checks failing to handle the wrapping case
> correctly which has been fixed in the base reserve space patch (patch 2 in
> the series). The rest is either saying that you think we are potentially
> wrappping too early and wasting a few bytes of the ring buffer or that
> something is actually broken?

Yeah I didn't realize that this change was meant to fix the
ring->reserved_tail check since I didn't make that connection. It is
correct with that change, but the problem I see is that the
correctness of that debug aid isn't assured locally: No we both need
that check _and_ the correct handling of the reservation tracking at
wrap-around. If the check just handles wrapping it'll robustly stay in
working shape even when the wrapping behaviour changes.

> Point 2: 100 bytes of reserve, 160 bytes of execbuf and 200 bytes remaining.
> You seem to think this will fail somehow? Why? The wait_for_space(160) in
> the execbuf code will cause a wrap because the the 100 bytes for the
> add_request reservation is added on and the wait is actually being done for
> 260 bytes. So yes, we wrap earlier than would otherwise have been necessary
> but that is the only way to absolutely guarantee that the add_request() call
> cannot fail when trying to do the wrap itself.

There's no problem except that it's wasteful. And I tried to explain
that no unconditionally force-wrapping for the entire reservation is
actually not needed, since the additional space needed to account for
the eventual wrapping is bounded by a factor of 2. It's much less in
practice since we split up the final request bits into multiple
smaller intel_ring_begin. And if feels a bit wasteful to throw that
space away (and make the gpu eat through MI_NOP) just because it makes
caring for the worst-case harder. And with GuC the 160 dwords is
actually a fairly substantial part of the ring.

Even more so when we completely switch to a transaction model for
request, where we only need to wrap for individual commands and hence
could place intel_ring_being per-cmd (which is mostly what we do
already anyway).

> As Chris says, if the driver is attempting to create a single request that
> fills the entire ringbuffer then that is a bug that should be caught as soon
> as possible. Even with a Guc, the ring buffer is not small compared to the
> size of requests the driver currently produces. Part of the scheduler work
> is to limit the number of batch buffers that a given application/context can
> have outstanding in the ring buffer at any given time in order to prevent
> starvation of the rest of the system by one badly behaved app. Thus
> completely filling a large ring buffer becomes impossible anyway - the
> application will be blocked before it gets that far.

My proposal for this reservation wrapping business would have been:
- Increase the reservation by 31 dwords (to account for the worst-case
wrap in pc_render_add_request).
- Rework the reservation overflow WARN_ON in reserve_space_end to work
correctly even when wrapping while the reservation has been in use.
- Move the addition of reserved_space below the point where we wrap
the ring and only check against total free space, neglecting wrapping.
- Remove all other complications you've added.

Result is no forced wrapping for reservation and a debug check which
should even survive random changes by monkeys since the logic for that
check is fully contained within reserve_space_end. And for the check
we should be able to reuse __intel_free_space.

If I'm reading things correctly this shouldn't have any effect outside
of patch 2 and shouldn't cause any conflicts.
-Daniel
On 18/06/2015 14:29, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 1:21 PM, John Harrison
> <John.C.Harrison@intel.com> wrote:
>> I'm still confused by what you are saying in the above referenced email.
>> Part of it is about the sanity checks failing to handle the wrapping case
>> correctly which has been fixed in the base reserve space patch (patch 2 in
>> the series). The rest is either saying that you think we are potentially
>> wrappping too early and wasting a few bytes of the ring buffer or that
>> something is actually broken?
> Yeah I didn't realize that this change was meant to fix the
> ring->reserved_tail check since I didn't make that connection. It is
> correct with that change, but the problem I see is that the
> correctness of that debug aid isn't assured locally: No we both need
> that check _and_ the correct handling of the reservation tracking at
> wrap-around. If the check just handles wrapping it'll robustly stay in
> working shape even when the wrapping behaviour changes.
>
>> Point 2: 100 bytes of reserve, 160 bytes of execbuf and 200 bytes remaining.
>> You seem to think this will fail somehow? Why? The wait_for_space(160) in
>> the execbuf code will cause a wrap because the the 100 bytes for the
>> add_request reservation is added on and the wait is actually being done for
>> 260 bytes. So yes, we wrap earlier than would otherwise have been necessary
>> but that is the only way to absolutely guarantee that the add_request() call
>> cannot fail when trying to do the wrap itself.
> There's no problem except that it's wasteful. And I tried to explain
> that no unconditionally force-wrapping for the entire reservation is
> actually not needed, since the additional space needed to account for
> the eventual wrapping is bounded by a factor of 2. It's much less in
> practice since we split up the final request bits into multiple
> smaller intel_ring_begin. And if feels a bit wasteful to throw that
> space away (and make the gpu eat through MI_NOP) just because it makes
> caring for the worst-case harder. And with GuC the 160 dwords is
> actually a fairly substantial part of the ring.
>
> Even more so when we completely switch to a transaction model for
> request, where we only need to wrap for individual commands and hence
> could place intel_ring_being per-cmd (which is mostly what we do
> already anyway).
>
>> As Chris says, if the driver is attempting to create a single request that
>> fills the entire ringbuffer then that is a bug that should be caught as soon
>> as possible. Even with a Guc, the ring buffer is not small compared to the
>> size of requests the driver currently produces. Part of the scheduler work
>> is to limit the number of batch buffers that a given application/context can
>> have outstanding in the ring buffer at any given time in order to prevent
>> starvation of the rest of the system by one badly behaved app. Thus
>> completely filling a large ring buffer becomes impossible anyway - the
>> application will be blocked before it gets that far.
> My proposal for this reservation wrapping business would have been:
> - Increase the reservation by 31 dwords (to account for the worst-case
> wrap in pc_render_add_request).
> - Rework the reservation overflow WARN_ON in reserve_space_end to work
> correctly even when wrapping while the reservation has been in use.
> - Move the addition of reserved_space below the point where we wrap
> the ring and only check against total free space, neglecting wrapping.
> - Remove all other complications you've added.
>
> Result is no forced wrapping for reservation and a debug check which
> should even survive random changes by monkeys since the logic for that
> check is fully contained within reserve_space_end. And for the check
> we should be able to reuse __intel_free_space.
>
> If I'm reading things correctly this shouldn't have any effect outside
> of patch 2 and shouldn't cause any conflicts.
> -Daniel

See new patch #2...