drm: dp_mst_topology: Ensure the proposed pbn does not exceed available bandwidth

Submitted by Sean Paul on Aug. 27, 2019, 8:35 p.m.

Details

Message ID 20190827203517.228139-1-sean@poorly.run
State New
Headers show
Series "drm: dp_mst_topology: Ensure the proposed pbn does not exceed available bandwidth" ( rev: 1 ) in DRI devel

Not browsing as part of any series.

Commit Message

Sean Paul Aug. 27, 2019, 8:35 p.m.
From: Sean Paul <seanpaul@chromium.org>

The PBN is calculated by the DP helpers during atomic check using the
adjusted mode. In some cases, the EDID may contain modes which are not
possible given the available PBN. One such example of this is if you
downgrade the DP version of a 4k display from 1.2 to 1.1. The EDID would
still contain the 4k mode, but it is not possible without HBR2. Another
case might be if the branch device and sink have to downgrade their link
speed during training.

This patch checks that the proposed pbn does not exceed a port's
available pbn before allocating vcpi slots.

Signed-off-by: Sean Paul <seanpaul@chromium.org>
---

This is my first dip into MST, so it's possible (probable?) that I'm
doing something wrong. However this seems to fix the issue I'm
experiencing, so at least we have a base to work with.

I'm more than a bit confused when available_pbn is 0, and I've tried
retrying enum_path_resources and even doing a phy powerup before epr,
but it still insists available_pbn=0. I've been looking at the DP 1.3
spec and it isn't too clear on why this might be. If there are better
resources, I'm interested :)

 drivers/gpu/drm/drm_dp_mst_topology.c | 31 ++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 82add736e17d..16a88230091a 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2182,7 +2182,26 @@  static int drm_dp_send_enum_path_resources(struct drm_dp_mst_topology_mgr *mgr,
 				DRM_ERROR("got incorrect port in response\n");
 			DRM_DEBUG_KMS("enum path resources %d: %d %d\n", txmsg->reply.u.path_resources.port_number, txmsg->reply.u.path_resources.full_payload_bw_number,
 			       txmsg->reply.u.path_resources.avail_payload_bw_number);
-			port->available_pbn = txmsg->reply.u.path_resources.avail_payload_bw_number;
+
+			/*
+			 * Some monitors (Samsung U28D590 at least) return 0 for
+			 * available pbn while in low power mode.
+			 *
+			 * If we were to trust this value, we'd end up failing
+			 * all vcpi allocations, since the requested bandwidth
+			 * will be compared to 0. So use the full pbn as
+			 * available. Doing this will cap the vcpi allocations
+			 * at the trained link rate and will at least prevent
+			 * us from trying to allocate payloads larger than our
+			 * full pbn.
+			 *
+			 * If there is actually no bandwidth left, we'll fail
+			 * on ALLOCATE_PAYLOAD anyways.
+			 */
+			if (txmsg->reply.u.path_resources.avail_payload_bw_number)
+				port->available_pbn = txmsg->reply.u.path_resources.avail_payload_bw_number;
+			else
+				port->available_pbn = txmsg->reply.u.path_resources.full_payload_bw_number;
 		}
 	}
 
@@ -3239,6 +3258,16 @@  int drm_dp_atomic_find_vcpi_slots(struct drm_atomic_state *state,
 	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
 	int prev_slots, req_slots, ret;
 
+	/*
+	 * Sanity check that the pbn proposed doesn't exceed the maximum
+	 * available pbn for the port. This could happen if the EDID is
+	 * advertising a mode which needs a faster link rate than has been
+	 * trained between the sink and the branch device (easy to repro by
+	 * using "compatibility" mode on a 4k display and downgrading to DP 1.1)
+	 */
+	if (pbn > port->available_pbn)
+		return -EINVAL;
+
 	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
 	if (IS_ERR(topology_state))
 		return PTR_ERR(topology_state);

Comments

Hi! So, I don't have access to the DP 1.3 spec or anything later than 1.3.
However, I'm fairly sure this is very much againt spec since there's no way
for us to determine the available PBN otherwise...
Honestly though, being against spec might not surprise me here.

I kinda want to look into this more before giving an r-b on this, although
I've got some comments down below on the patch itself. Feel free to poke me
tommorrow so we can take a closer look and maybe figure out more about what's
going on here, I'll try to remember to poke you as well.

On Tue, 2019-08-27 at 16:35 -0400, Sean Paul wrote:
> From: Sean Paul <seanpaul@chromium.org>
> 
> The PBN is calculated by the DP helpers during atomic check using the
> adjusted mode. In some cases, the EDID may contain modes which are not
> possible given the available PBN. One such example of this is if you
> downgrade the DP version of a 4k display from 1.2 to 1.1. The EDID would
> still contain the 4k mode, but it is not possible without HBR2. Another
> case might be if the branch device and sink have to downgrade their link
> speed during training.
> 
> This patch checks that the proposed pbn does not exceed a port's
> available pbn before allocating vcpi slots.
> 
> Signed-off-by: Sean Paul <seanpaul@chromium.org>
> ---
> 
> This is my first dip into MST, so it's possible (probable?) that I'm
> doing something wrong. However this seems to fix the issue I'm
> experiencing, so at least we have a base to work with.
> 
> I'm more than a bit confused when available_pbn is 0, and I've tried
> retrying enum_path_resources and even doing a phy powerup before epr,
> but it still insists available_pbn=0. I've been looking at the DP 1.3
> spec and it isn't too clear on why this might be. If there are better
> resources, I'm interested :)
> 
>  drivers/gpu/drm/drm_dp_mst_topology.c | 31 ++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 82add736e17d..16a88230091a 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2182,7 +2182,26 @@ static int drm_dp_send_enum_path_resources(struct
> drm_dp_mst_topology_mgr *mgr,
>  				DRM_ERROR("got incorrect port in response\n");
>  			DRM_DEBUG_KMS("enum path resources %d: %d %d\n",
> txmsg->reply.u.path_resources.port_number, txmsg-
> >reply.u.path_resources.full_payload_bw_number,
>  			       txmsg-
> >reply.u.path_resources.avail_payload_bw_number);
> -			port->available_pbn = txmsg-
> >reply.u.path_resources.avail_payload_bw_number;
> +
> +			/*
> +			 * Some monitors (Samsung U28D590 at least) return 0
> for
> +			 * available pbn while in low power mode.
> +			 *
> +			 * If we were to trust this value, we'd end up failing
> +			 * all vcpi allocations, since the requested bandwidth
> +			 * will be compared to 0. So use the full pbn as
> +			 * available. Doing this will cap the vcpi allocations
> +			 * at the trained link rate and will at least prevent
> +			 * us from trying to allocate payloads larger than our
> +			 * full pbn.
> +			 *
> +			 * If there is actually no bandwidth left, we'll fail
> +			 * on ALLOCATE_PAYLOAD anyways.

I would hope this would be the case, but I've seen a lot of situations where
MST hubs will just stop responding in situations like this instead of
providing an actual error. So it's probably safer to validate this as much as
possible beforehand without relying on the sink to tell us when we've done
something wrong.

> +			 */
> +			if (txmsg-
> >reply.u.path_resources.avail_payload_bw_number)
> +				port->available_pbn = txmsg-
> >reply.u.path_resources.avail_payload_bw_number;
> +			else
> +				port->available_pbn = txmsg-
> >reply.u.path_resources.full_payload_bw_number;

I think we should use a DP quirk for this so that we only follow this behavior
for this monitor, and not any others. It's possible that other things can
cause bandwidth restrictions, and while I haven't had a chance to look further
into it I've noticed that sometimes sinks even end up allocating more
handwidth then we actually asked for.

>  		}
>  	}
>  
> @@ -3239,6 +3258,16 @@ int drm_dp_atomic_find_vcpi_slots(struct
> drm_atomic_state *state,
>  	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
>  	int prev_slots, req_slots, ret;
>  
> +	/*
> +	 * Sanity check that the pbn proposed doesn't exceed the maximum
> +	 * available pbn for the port. This could happen if the EDID is
> +	 * advertising a mode which needs a faster link rate than has been
> +	 * trained between the sink and the branch device (easy to repro by
> +	 * using "compatibility" mode on a 4k display and downgrading to DP
> 1.1)
> +	 */
> +	if (pbn > port->available_pbn)
> +		return -EINVAL;
> +

port->available_pbn isn't really protected by any actual locking yet
unfortunately :(. See

https://patchwork.freedesktop.org/patch/318683/?series=63847&rev=1

so I'm not sure we should be validating the PBN during the atomic check until
that's landed (we already don't do this, so dropping this wouldn't make things
any worse then they are right now). Additionally, we also don't have a handler
for DP_RESOURCE_STATUS_NOTIFY UP messages yet either.

>  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
>  	if (IS_ERR(topology_state))
>  		return PTR_ERR(topology_state);
On Tue, Aug 27, 2019 at 08:31:29PM -0400, Lyude Paul wrote:
> Hi! So, I don't have access to the DP 1.3 spec or anything later than 1.3.

1.3 was just what I was looking at, I checked 1.2 and it looks the same (aside
from a typo fix).

> However, I'm fairly sure this is very much againt spec since there's no way
> for us to determine the available PBN otherwise...
> Honestly though, being against spec might not surprise me here.

Yeah, I was pretty surprised by this behavior too. Everything in the spec states
that Available_Payload_Bandwidth_Number is what we should be using to determine
maximum PBN.

> 
> I kinda want to look into this more before giving an r-b on this, although
> I've got some comments down below on the patch itself. Feel free to poke me
> tommorrow so we can take a closer look and maybe figure out more about what's
> going on here, I'll try to remember to poke you as well.

Help would be very much appreciated, thanks!

> 
> On Tue, 2019-08-27 at 16:35 -0400, Sean Paul wrote:
> > From: Sean Paul <seanpaul@chromium.org>
> > 
> > The PBN is calculated by the DP helpers during atomic check using the
> > adjusted mode. In some cases, the EDID may contain modes which are not
> > possible given the available PBN. One such example of this is if you
> > downgrade the DP version of a 4k display from 1.2 to 1.1. The EDID would
> > still contain the 4k mode, but it is not possible without HBR2. Another
> > case might be if the branch device and sink have to downgrade their link
> > speed during training.
> > 
> > This patch checks that the proposed pbn does not exceed a port's
> > available pbn before allocating vcpi slots.
> > 
> > Signed-off-by: Sean Paul <seanpaul@chromium.org>
> > ---
> > 
> > This is my first dip into MST, so it's possible (probable?) that I'm
> > doing something wrong. However this seems to fix the issue I'm
> > experiencing, so at least we have a base to work with.
> > 
> > I'm more than a bit confused when available_pbn is 0, and I've tried
> > retrying enum_path_resources and even doing a phy powerup before epr,
> > but it still insists available_pbn=0. I've been looking at the DP 1.3
> > spec and it isn't too clear on why this might be. If there are better
> > resources, I'm interested :)
> > 
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 31 ++++++++++++++++++++++++++-
> >  1 file changed, 30 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 82add736e17d..16a88230091a 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -2182,7 +2182,26 @@ static int drm_dp_send_enum_path_resources(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  				DRM_ERROR("got incorrect port in response\n");
> >  			DRM_DEBUG_KMS("enum path resources %d: %d %d\n",
> > txmsg->reply.u.path_resources.port_number, txmsg-
> > >reply.u.path_resources.full_payload_bw_number,
> >  			       txmsg-
> > >reply.u.path_resources.avail_payload_bw_number);
> > -			port->available_pbn = txmsg-
> > >reply.u.path_resources.avail_payload_bw_number;
> > +
> > +			/*
> > +			 * Some monitors (Samsung U28D590 at least) return 0
> > for
> > +			 * available pbn while in low power mode.
> > +			 *
> > +			 * If we were to trust this value, we'd end up failing
> > +			 * all vcpi allocations, since the requested bandwidth
> > +			 * will be compared to 0. So use the full pbn as
> > +			 * available. Doing this will cap the vcpi allocations
> > +			 * at the trained link rate and will at least prevent
> > +			 * us from trying to allocate payloads larger than our
> > +			 * full pbn.
> > +			 *
> > +			 * If there is actually no bandwidth left, we'll fail
> > +			 * on ALLOCATE_PAYLOAD anyways.
> 
> I would hope this would be the case, but I've seen a lot of situations where
> MST hubs will just stop responding in situations like this instead of
> providing an actual error. So it's probably safer to validate this as much as
> possible beforehand without relying on the sink to tell us when we've done
> something wrong.
> 

thismakesmesad.gif

> > +			 */
> > +			if (txmsg-
> > >reply.u.path_resources.avail_payload_bw_number)
> > +				port->available_pbn = txmsg-
> > >reply.u.path_resources.avail_payload_bw_number;
> > +			else
> > +				port->available_pbn = txmsg-
> > >reply.u.path_resources.full_payload_bw_number;
> 
> I think we should use a DP quirk for this so that we only follow this behavior
> for this monitor, and not any others. It's possible that other things can
> cause bandwidth restrictions, and while I haven't had a chance to look further
> into it I've noticed that sometimes sinks even end up allocating more
> handwidth then we actually asked for.
> 

After reading your reply, I tested a few other monitors I had laying around and
all are behaving the same way -- even without the "compatibility" mode enabled.
The common theme seems to be that when I reboot my source the available_pbn on
boot is 0. If I hotplug after that, available_pbn is correct.

I'm wondering whether the branch device (CableMatters USB-C 2x DP hub) is
holding onto that allocation across reboot. That said, the payload allocation
I'm making doesn't use the full available PBN, so I would kind of expect the
available pbn to be non-zero if this were the case, but ¯\_(ツ)_/¯

> >  		}
> >  	}
> >  
> > @@ -3239,6 +3258,16 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > drm_atomic_state *state,
> >  	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
> >  	int prev_slots, req_slots, ret;
> >  
> > +	/*
> > +	 * Sanity check that the pbn proposed doesn't exceed the maximum
> > +	 * available pbn for the port. This could happen if the EDID is
> > +	 * advertising a mode which needs a faster link rate than has been
> > +	 * trained between the sink and the branch device (easy to repro by
> > +	 * using "compatibility" mode on a 4k display and downgrading to DP
> > 1.1)
> > +	 */
> > +	if (pbn > port->available_pbn)
> > +		return -EINVAL;
> > +
> 
> port->available_pbn isn't really protected by any actual locking yet
> unfortunately :(. See
> 
> https://patchwork.freedesktop.org/patch/318683/?series=63847&rev=1
> 
> so I'm not sure we should be validating the PBN during the atomic check until
> that's landed (we already don't do this, so dropping this wouldn't make things
> any worse then they are right now). Additionally, we also don't have a handler
> for DP_RESOURCE_STATUS_NOTIFY UP messages yet either.

Yep, that's fine with me.

Sean

> 
> >  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
> >  	if (IS_ERR(topology_state))
> >  		return PTR_ERR(topology_state);
> -- 
> Cheers,
> 	Lyude Paul
>
On Wed, Aug 28, 2019 at 10:29:32AM -0400, Sean Paul wrote:
> On Tue, Aug 27, 2019 at 08:31:29PM -0400, Lyude Paul wrote:
> > Hi! So, I don't have access to the DP 1.3 spec or anything later than 1.3.
> 
> 1.3 was just what I was looking at, I checked 1.2 and it looks the same (aside
> from a typo fix).
> 
> > However, I'm fairly sure this is very much againt spec since there's no way
> > for us to determine the available PBN otherwise...
> > Honestly though, being against spec might not surprise me here.
> 
> Yeah, I was pretty surprised by this behavior too. Everything in the spec states
> that Available_Payload_Bandwidth_Number is what we should be using to determine
> maximum PBN.
> 
> > 
> > I kinda want to look into this more before giving an r-b on this, although
> > I've got some comments down below on the patch itself. Feel free to poke me
> > tommorrow so we can take a closer look and maybe figure out more about what's
> > going on here, I'll try to remember to poke you as well.
> 
> Help would be very much appreciated, thanks!
> 
> > 
> > On Tue, 2019-08-27 at 16:35 -0400, Sean Paul wrote:
> > > From: Sean Paul <seanpaul@chromium.org>
> > > 
> > > The PBN is calculated by the DP helpers during atomic check using the
> > > adjusted mode. In some cases, the EDID may contain modes which are not
> > > possible given the available PBN. One such example of this is if you
> > > downgrade the DP version of a 4k display from 1.2 to 1.1. The EDID would
> > > still contain the 4k mode, but it is not possible without HBR2. Another
> > > case might be if the branch device and sink have to downgrade their link
> > > speed during training.
> > > 
> > > This patch checks that the proposed pbn does not exceed a port's
> > > available pbn before allocating vcpi slots.
> > > 
> > > Signed-off-by: Sean Paul <seanpaul@chromium.org>
> > > ---
> > > 
> > > This is my first dip into MST, so it's possible (probable?) that I'm
> > > doing something wrong. However this seems to fix the issue I'm
> > > experiencing, so at least we have a base to work with.
> > > 
> > > I'm more than a bit confused when available_pbn is 0, and I've tried
> > > retrying enum_path_resources and even doing a phy powerup before epr,
> > > but it still insists available_pbn=0. I've been looking at the DP 1.3
> > > spec and it isn't too clear on why this might be. If there are better
> > > resources, I'm interested :)
> > > 
> > >  drivers/gpu/drm/drm_dp_mst_topology.c | 31 ++++++++++++++++++++++++++-
> > >  1 file changed, 30 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > index 82add736e17d..16a88230091a 100644
> > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > > @@ -2182,7 +2182,26 @@ static int drm_dp_send_enum_path_resources(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >  				DRM_ERROR("got incorrect port in response\n");
> > >  			DRM_DEBUG_KMS("enum path resources %d: %d %d\n",
> > > txmsg->reply.u.path_resources.port_number, txmsg-
> > > >reply.u.path_resources.full_payload_bw_number,
> > >  			       txmsg-
> > > >reply.u.path_resources.avail_payload_bw_number);
> > > -			port->available_pbn = txmsg-
> > > >reply.u.path_resources.avail_payload_bw_number;
> > > +
> > > +			/*
> > > +			 * Some monitors (Samsung U28D590 at least) return 0
> > > for
> > > +			 * available pbn while in low power mode.
> > > +			 *
> > > +			 * If we were to trust this value, we'd end up failing
> > > +			 * all vcpi allocations, since the requested bandwidth
> > > +			 * will be compared to 0. So use the full pbn as
> > > +			 * available. Doing this will cap the vcpi allocations
> > > +			 * at the trained link rate and will at least prevent
> > > +			 * us from trying to allocate payloads larger than our
> > > +			 * full pbn.
> > > +			 *
> > > +			 * If there is actually no bandwidth left, we'll fail
> > > +			 * on ALLOCATE_PAYLOAD anyways.
> > 
> > I would hope this would be the case, but I've seen a lot of situations where
> > MST hubs will just stop responding in situations like this instead of
> > providing an actual error. So it's probably safer to validate this as much as
> > possible beforehand without relying on the sink to tell us when we've done
> > something wrong.
> > 
> 
> thismakesmesad.gif
> 
> > > +			 */
> > > +			if (txmsg-
> > > >reply.u.path_resources.avail_payload_bw_number)
> > > +				port->available_pbn = txmsg-
> > > >reply.u.path_resources.avail_payload_bw_number;
> > > +			else
> > > +				port->available_pbn = txmsg-
> > > >reply.u.path_resources.full_payload_bw_number;
> > 
> > I think we should use a DP quirk for this so that we only follow this behavior
> > for this monitor, and not any others. It's possible that other things can
> > cause bandwidth restrictions, and while I haven't had a chance to look further
> > into it I've noticed that sometimes sinks even end up allocating more
> > handwidth then we actually asked for.
> > 
> 
> After reading your reply, I tested a few other monitors I had laying around and
> all are behaving the same way -- even without the "compatibility" mode enabled.
> The common theme seems to be that when I reboot my source the available_pbn on
> boot is 0. If I hotplug after that, available_pbn is correct.
> 
> I'm wondering whether the branch device (CableMatters USB-C 2x DP hub) is
> holding onto that allocation across reboot. That said, the payload allocation
> I'm making doesn't use the full available PBN, so I would kind of expect the
> available pbn to be non-zero if this were the case, but ¯\_(ツ)_/¯
> 

After playing around with this theory, it might hold some water. I added a
CLEAR_PAYLOAD_ID_TABLE broadcast before ENUM_PATH_RESOURCES when a port is
created. Now I'm getting the expected value in available_pbn reliably.

I'm not really sure why this works, since I'd expect the DPCD writes in
drm_dp_mst_topology_mgr_set_mst() to the PAYLOAD_ALLOCATE_* registers to be 
sufficient.

thoughts?

Sean

> > >  		}
> > >  	}
> > >  
> > > @@ -3239,6 +3258,16 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > > drm_atomic_state *state,
> > >  	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
> > >  	int prev_slots, req_slots, ret;
> > >  
> > > +	/*
> > > +	 * Sanity check that the pbn proposed doesn't exceed the maximum
> > > +	 * available pbn for the port. This could happen if the EDID is
> > > +	 * advertising a mode which needs a faster link rate than has been
> > > +	 * trained between the sink and the branch device (easy to repro by
> > > +	 * using "compatibility" mode on a 4k display and downgrading to DP
> > > 1.1)
> > > +	 */
> > > +	if (pbn > port->available_pbn)
> > > +		return -EINVAL;
> > > +
> > 
> > port->available_pbn isn't really protected by any actual locking yet
> > unfortunately :(. See
> > 
> > https://patchwork.freedesktop.org/patch/318683/?series=63847&rev=1
> > 
> > so I'm not sure we should be validating the PBN during the atomic check until
> > that's landed (we already don't do this, so dropping this wouldn't make things
> > any worse then they are right now). Additionally, we also don't have a handler
> > for DP_RESOURCE_STATUS_NOTIFY UP messages yet either.
> 
> Yep, that's fine with me.
> 
> Sean
> 
> > 
> > >  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
> > >  	if (IS_ERR(topology_state))
> > >  		return PTR_ERR(topology_state);
> > -- 
> > Cheers,
> > 	Lyude Paul
> > 
> 
> -- 
> Sean Paul, Software Engineer, Google / Chromium OS