[1/2] drm/amdgpu: fix power state when port pm is unavailable

Submitted by Peter Wu on Nov. 26, 2016, 11:38 a.m.

Details

Message ID 20161126113853.GA25175@al
State New
Headers show
Series "Series without cover letter" ( rev: 2 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Peter Wu Nov. 26, 2016, 11:38 a.m.
Hi Mike,

To grab console output, you could also try to blacklist amdgpu at
startup and load it later.

Can you provide your acpidump (and lspci -nn, especially if you do not
have a dmesg).

This is the call chain:

+ amdgpu_atpx_detect
  + amdgpu_atpx_pci_probe_handle
    + return if ATPX ACPI handle cannot be found
    + set amdgpu_atpx_priv.bridge_pm_usable
  + amdgpu_atpx_init
    + amdgpu_atpx_validate
      + use amdgpu_atpx_priv.bridge_pm_usable

I wonder whether newer devices remove ATPX completely. If that is the
case (please provide acpidump!), can you try this hack:


This is not intended for merging, but just testing a hypothesis.

Kind regards,
Peter

On Sat, Nov 26, 2016 at 11:20:28AM +0000, Mike Lothian wrote:
> He's some screen shots of the booting screens, apologies they're not that
> clear
> 
> https://imgur.com/a/V7mRA
> 
> On Sat, 26 Nov 2016 at 11:00 Mike Lothian <mike@fireburn.co.uk> wrote:
> 
> > Hi
> >
> > The previous working kernel, was the patch before this - reverting fixes it
> >
> > My bios date is quite new 08/05/2016
> >
> > Most of the messages fly straight off the screen, the journal isn't kept
> > and the wifi driver also complains, will see if I can get a slow motion
> > capture of the errors and screen shot them
> >
> > Regards
> >
> > Mike
> >
> > On Sat, 26 Nov 2016 at 10:20 Peter Wu <peter@lekensteyn.nl> wrote:
> >
> > Hi Mike,
> >
> > On Sat, Nov 26, 2016 at 12:36:27AM +0000, Mike Lothian wrote:
> > > This patch is preventing my laptop from booting, I'm getting D3 error
> > > messages and atom bios stuck messages
> >
> > Can you post a dmesg and acpidump? What was the previous working kernel?
> > I suppose your BIOS date (/sys/class/dmi/id/bios_date) is pre-2015?
> > Does booting with runpm=0 work?
> >
> > > I mistakenly saw the v2 patch and didn't realise it was for radeon not
> > > amdgpu - this branch and Linus's tree are currently not booting for me
> >
> > What do you mean by this?
> >
> > Kind regards,
> > Peter
> >
> > > On Wed, 23 Nov 2016 at 17:16 Deucher, Alexander <
> > Alexander.Deucher@amd.com>
> > > wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Peter Wu [mailto:peter@lekensteyn.nl]
> > > > > Sent: Wednesday, November 23, 2016 12:13 PM
> > > > > To: Alex Deucher
> > > > > Cc: amd-gfx list; Deucher, Alexander; Nayan Deshmukh
> > > > > Subject: Re: [PATCH 1/2] drm/amdgpu: fix power state when port pm is
> > > > > unavailable
> > > > >
> > > > > On Wed, Nov 23, 2016 at 12:01:55PM -0500, Alex Deucher wrote:
> > > > > >  On Tue, Nov 22, 2016 at 8:22 PM, Peter Wu <peter@lekensteyn.nl>
> > > > wrote:
> > > > > > > When PCIe port PM is not enabled (system BIOS is pre-2015 or the
> > > > > > > pcie_port_pm=off parameter is set), legacy ATPX PM should still
> > be
> > > > > > > marked as supported. Otherwise the GPU can fail to power on after
> > > > > > > runtime suspend. This affected a Dell Inspiron 5548.
> > > > > > >
> > > > > > > Ideally the BIOS date in the PCI core is lowered to 2013 (the
> > first
> > > > year
> > > > > > > where hybrid graphics platforms using power resources was
> > > > introduced),
> > > > > > > but that seems more risky at this point and would not solve the
> > > > > > > pcie_port_pm=off issue.
> > > > > > >
> > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98505
> > > > > > > Reported-and-tested-by: Nayan Deshmukh
> > > > > <nayan26deshmukh@gmail.com>
> > > > > > > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > > > > > > ---
> > > > > > > Hi,
> > > > > > >
> > > > > > > This patch is already three weeks old. One alternative idea was
> > > > lowering
> > > > > BIOS
> > > > > > > date in PCI core, but as pcie_port_pm=force did not have the
> > desired
> > > > > effect, I
> > > > > > > do not think that this would help though.
> > > > > >
> > > > > > Thanks for doing this.
> > > > > >
> > > > > > >
> > > > > > > I have also not contacted linux-pci or Mika about lowering the
> > year
> > > > due to
> > > > > the
> > > > > > > lack of a good reason.  Might do it later though once ACPICA bug
> > > > 1333 is
> > > > > sorted
> > > > > > > out such that lowering the year actually has benefits for a
> > Nvidia
> > > > laptop
> > > > > (or if
> > > > > > > some amdgpu problem can be solved by this).
> > > > > > >
> > > > > > > Both patches should probably be Cc stable (4.8+), fixing
> > > > 2f5af82eeab2 and
> > > > > > > b8c9fd5ad4b4 ("track whether if this is a hybrid graphics
> > > > platform"). There
> > > > > have
> > > > > > > been some ifdef 0's and reverts in between, so I was not sure if
> > > > adding
> > > > > the
> > > > > > > Fixes tag is appropriate.
> > > > > >
> > > > > > I don't think we need to cc stable.  In kernel 4.8 we don't
> > attempt to
> > > > > > do d3cold at all.  4.8 and older kernels have:
> > > > > > c63695cc5e5f685e924e25a8f9555f6e846f1fc6 (drm/amdgpu: work around
> > > > > lack
> > > > > > of upstream ACPI support for D3cold)
> > > > > > which was reverted for 4.9.
> > > > >
> > > > > That patch was already reverted in 4.8 as far as I can see:
> > > > >
> > > > > $ git tag --contains c39b487f195b93235ee76384427467786f7bf29f | grep
> > v4.8
> > > > > v4.8
> > > > >
> > > > > Can you double-check? v4.8 was the first kernel with D3cold support
> > in
> > > > > PCI core.
> > > >
> > > > You are right.  I was thinking d3cold went upstream in 4.9, so yes, we
> > > > should apply this to 4.8.
> > > >
> > > > Alex
> > > >
> > > > >
> > > > > > Series is:
> > > > > > Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Kind regards,
> > > > > Peter
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Kind regards,
> > > > > > > Peter
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c | 9 ++++++++-
> > > > > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > > > index 10b5ddf..951addf 100644
> > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > > > @@ -33,6 +33,7 @@ struct amdgpu_atpx {
> > > > > > >
> > > > > > >  static struct amdgpu_atpx_priv {
> > > > > > >         bool atpx_detected;
> > > > > > > +       bool bridge_pm_usable;
> > > > > > >         /* handle for device - and atpx */
> > > > > > >         acpi_handle dhandle;
> > > > > > >         acpi_handle other_handle;
> > > > > > > @@ -200,7 +201,11 @@ static int amdgpu_atpx_validate(struct
> > > > > amdgpu_atpx *atpx)
> > > > > > >         atpx->is_hybrid = false;
> > > > > > >         if (valid_bits & ATPX_MS_HYBRID_GFX_SUPPORTED) {
> > > > > > >                 printk("ATPX Hybrid Graphics\n");
> > > > > > > -               atpx->functions.power_cntl = false;
> > > > > > > +               /*
> > > > > > > +                * Disable legacy PM methods only when pcie port
> > PM
> > > > is usable,
> > > > > > > +                * otherwise the device might fail to power off
> > or
> > > > power on.
> > > > > > > +                */
> > > > > > > +               atpx->functions.power_cntl =
> > > > > !amdgpu_atpx_priv.bridge_pm_usable;
> > > > > > >                 atpx->is_hybrid = true;
> > > > > > >         }
> > > > > > >
> > > > > > > @@ -471,6 +476,7 @@ static int amdgpu_atpx_power_state(enum
> > > > > vga_switcheroo_client_id id,
> > > > > > >   */
> > > > > > >  static bool amdgpu_atpx_pci_probe_handle(struct pci_dev *pdev)
> > > > > > >  {
> > > > > > > +       struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
> > > > > > >         acpi_handle dhandle, atpx_handle;
> > > > > > >         acpi_status status;
> > > > > > >
> > > > > > > @@ -485,6 +491,7 @@ static bool
> > > > > amdgpu_atpx_pci_probe_handle(struct pci_dev *pdev)
> > > > > > >         }
> > > > > > >         amdgpu_atpx_priv.dhandle = dhandle;
> > > > > > >         amdgpu_atpx_priv.atpx.handle = atpx_handle;
> > > > > > > +       amdgpu_atpx_priv.bridge_pm_usable = parent_pdev &&
> > > > > parent_pdev->bridge_d3;
> > > > > > >         return true;
> > > > > > >  }
> > > > > > >
> > > > > > > --
> > > > > > > 2.10.2

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
index 951addf..5c23382 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
@@ -487,6 +487,7 @@  static bool amdgpu_atpx_pci_probe_handle(struct pci_dev *pdev)
 	status = acpi_get_handle(dhandle, "ATPX", &atpx_handle);
 	if (ACPI_FAILURE(status)) {
 		amdgpu_atpx_priv.other_handle = dhandle;
+		amdgpu_atpx_priv.bridge_pm_usable = true;
 		return false;
 	}
 	amdgpu_atpx_priv.dhandle = dhandle;

Comments

I can confirm the patch allows my machine to boot again

On Sat, 26 Nov 2016 at 11:39 Peter Wu <peter@lekensteyn.nl> wrote:

> Hi Mike,
>
> To grab console output, you could also try to blacklist amdgpu at
> startup and load it later.
>
> Can you provide your acpidump (and lspci -nn, especially if you do not
> have a dmesg).
>
> This is the call chain:
>
> + amdgpu_atpx_detect
>   + amdgpu_atpx_pci_probe_handle
>     + return if ATPX ACPI handle cannot be found
>     + set amdgpu_atpx_priv.bridge_pm_usable
>   + amdgpu_atpx_init
>     + amdgpu_atpx_validate
>       + use amdgpu_atpx_priv.bridge_pm_usable
>
> I wonder whether newer devices remove ATPX completely. If that is the
> case (please provide acpidump!), can you try this hack:
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> index 951addf..5c23382 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> @@ -487,6 +487,7 @@ static bool amdgpu_atpx_pci_probe_handle(struct
> pci_dev *pdev)
>         status = acpi_get_handle(dhandle, "ATPX", &atpx_handle);
>         if (ACPI_FAILURE(status)) {
>                 amdgpu_atpx_priv.other_handle = dhandle;
> +               amdgpu_atpx_priv.bridge_pm_usable = true;
>                 return false;
>         }
>         amdgpu_atpx_priv.dhandle = dhandle;
>
> This is not intended for merging, but just testing a hypothesis.
>
> Kind regards,
> Peter
>
> On Sat, Nov 26, 2016 at 11:20:28AM +0000, Mike Lothian wrote:
> > He's some screen shots of the booting screens, apologies they're not that
> > clear
> >
> > https://imgur.com/a/V7mRA
> >
> > On Sat, 26 Nov 2016 at 11:00 Mike Lothian <mike@fireburn.co.uk> wrote:
> >
> > > Hi
> > >
> > > The previous working kernel, was the patch before this - reverting
> fixes it
> > >
> > > My bios date is quite new 08/05/2016
> > >
> > > Most of the messages fly straight off the screen, the journal isn't
> kept
> > > and the wifi driver also complains, will see if I can get a slow motion
> > > capture of the errors and screen shot them
> > >
> > > Regards
> > >
> > > Mike
> > >
> > > On Sat, 26 Nov 2016 at 10:20 Peter Wu <peter@lekensteyn.nl> wrote:
> > >
> > > Hi Mike,
> > >
> > > On Sat, Nov 26, 2016 at 12:36:27AM +0000, Mike Lothian wrote:
> > > > This patch is preventing my laptop from booting, I'm getting D3 error
> > > > messages and atom bios stuck messages
> > >
> > > Can you post a dmesg and acpidump? What was the previous working
> kernel?
> > > I suppose your BIOS date (/sys/class/dmi/id/bios_date) is pre-2015?
> > > Does booting with runpm=0 work?
> > >
> > > > I mistakenly saw the v2 patch and didn't realise it was for radeon
> not
> > > > amdgpu - this branch and Linus's tree are currently not booting for
> me
> > >
> > > What do you mean by this?
> > >
> > > Kind regards,
> > > Peter
> > >
> > > > On Wed, 23 Nov 2016 at 17:16 Deucher, Alexander <
> > > Alexander.Deucher@amd.com>
> > > > wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Peter Wu [mailto:peter@lekensteyn.nl]
> > > > > > Sent: Wednesday, November 23, 2016 12:13 PM
> > > > > > To: Alex Deucher
> > > > > > Cc: amd-gfx list; Deucher, Alexander; Nayan Deshmukh
> > > > > > Subject: Re: [PATCH 1/2] drm/amdgpu: fix power state when port
> pm is
> > > > > > unavailable
> > > > > >
> > > > > > On Wed, Nov 23, 2016 at 12:01:55PM -0500, Alex Deucher wrote:
> > > > > > >  On Tue, Nov 22, 2016 at 8:22 PM, Peter Wu <
> peter@lekensteyn.nl>
> > > > > wrote:
> > > > > > > > When PCIe port PM is not enabled (system BIOS is pre-2015 or
> the
> > > > > > > > pcie_port_pm=off parameter is set), legacy ATPX PM should
> still
> > > be
> > > > > > > > marked as supported. Otherwise the GPU can fail to power on
> after
> > > > > > > > runtime suspend. This affected a Dell Inspiron 5548.
> > > > > > > >
> > > > > > > > Ideally the BIOS date in the PCI core is lowered to 2013 (the
> > > first
> > > > > year
> > > > > > > > where hybrid graphics platforms using power resources was
> > > > > introduced),
> > > > > > > > but that seems more risky at this point and would not solve
> the
> > > > > > > > pcie_port_pm=off issue.
> > > > > > > >
> > > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98505
> > > > > > > > Reported-and-tested-by: Nayan Deshmukh
> > > > > > <nayan26deshmukh@gmail.com>
> > > > > > > > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > > > > > > > ---
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > This patch is already three weeks old. One alternative idea
> was
> > > > > lowering
> > > > > > BIOS
> > > > > > > > date in PCI core, but as pcie_port_pm=force did not have the
> > > desired
> > > > > > effect, I
> > > > > > > > do not think that this would help though.
> > > > > > >
> > > > > > > Thanks for doing this.
> > > > > > >
> > > > > > > >
> > > > > > > > I have also not contacted linux-pci or Mika about lowering
> the
> > > year
> > > > > due to
> > > > > > the
> > > > > > > > lack of a good reason.  Might do it later though once ACPICA
> bug
> > > > > 1333 is
> > > > > > sorted
> > > > > > > > out such that lowering the year actually has benefits for a
> > > Nvidia
> > > > > laptop
> > > > > > (or if
> > > > > > > > some amdgpu problem can be solved by this).
> > > > > > > >
> > > > > > > > Both patches should probably be Cc stable (4.8+), fixing
> > > > > 2f5af82eeab2 and
> > > > > > > > b8c9fd5ad4b4 ("track whether if this is a hybrid graphics
> > > > > platform"). There
> > > > > > have
> > > > > > > > been some ifdef 0's and reverts in between, so I was not
> sure if
> > > > > adding
> > > > > > the
> > > > > > > > Fixes tag is appropriate.
> > > > > > >
> > > > > > > I don't think we need to cc stable.  In kernel 4.8 we don't
> > > attempt to
> > > > > > > do d3cold at all.  4.8 and older kernels have:
> > > > > > > c63695cc5e5f685e924e25a8f9555f6e846f1fc6 (drm/amdgpu: work
> around
> > > > > > lack
> > > > > > > of upstream ACPI support for D3cold)
> > > > > > > which was reverted for 4.9.
> > > > > >
> > > > > > That patch was already reverted in 4.8 as far as I can see:
> > > > > >
> > > > > > $ git tag --contains c39b487f195b93235ee76384427467786f7bf29f |
> grep
> > > v4.8
> > > > > > v4.8
> > > > > >
> > > > > > Can you double-check? v4.8 was the first kernel with D3cold
> support
> > > in
> > > > > > PCI core.
> > > > >
> > > > > You are right.  I was thinking d3cold went upstream in 4.9, so
> yes, we
> > > > > should apply this to 4.8.
> > > > >
> > > > > Alex
> > > > >
> > > > > >
> > > > > > > Series is:
> > > > > > > Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Kind regards,
> > > > > > Peter
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Kind regards,
> > > > > > > > Peter
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c | 9
> ++++++++-
> > > > > > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > > > > index 10b5ddf..951addf 100644
> > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c
> > > > > > > > @@ -33,6 +33,7 @@ struct amdgpu_atpx {
> > > > > > > >
> > > > > > > >  static struct amdgpu_atpx_priv {
> > > > > > > >         bool atpx_detected;
> > > > > > > > +       bool bridge_pm_usable;
> > > > > > > >         /* handle for device - and atpx */
> > > > > > > >         acpi_handle dhandle;
> > > > > > > >         acpi_handle other_handle;
> > > > > > > > @@ -200,7 +201,11 @@ static int amdgpu_atpx_validate(struct
> > > > > > amdgpu_atpx *atpx)
> > > > > > > >         atpx->is_hybrid = false;
> > > > > > > >         if (valid_bits & ATPX_MS_HYBRID_GFX_SUPPORTED) {
> > > > > > > >                 printk("ATPX Hybrid Graphics\n");
> > > > > > > > -               atpx->functions.power_cntl = false;
> > > > > > > > +               /*
> > > > > > > > +                * Disable legacy PM methods only when pcie
> port
> > > PM
> > > > > is usable,
> > > > > > > > +                * otherwise the device might fail to power
> off
> > > or
> > > > > power on.
> > > > > > > > +                */
> > > > > > > > +               atpx->functions.power_cntl =
> > > > > > !amdgpu_atpx_priv.bridge_pm_usable;
> > > > > > > >                 atpx->is_hybrid = true;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > @@ -471,6 +476,7 @@ static int amdgpu_atpx_power_state(enum
> > > > > > vga_switcheroo_client_id id,
> > > > > > > >   */
> > > > > > > >  static bool amdgpu_atpx_pci_probe_handle(struct pci_dev
> *pdev)
> > > > > > > >  {
> > > > > > > > +       struct pci_dev *parent_pdev =
> pci_upstream_bridge(pdev);
> > > > > > > >         acpi_handle dhandle, atpx_handle;
> > > > > > > >         acpi_status status;
> > > > > > > >
> > > > > > > > @@ -485,6 +491,7 @@ static bool
> > > > > > amdgpu_atpx_pci_probe_handle(struct pci_dev *pdev)
> > > > > > > >         }
> > > > > > > >         amdgpu_atpx_priv.dhandle = dhandle;
> > > > > > > >         amdgpu_atpx_priv.atpx.handle = atpx_handle;
> > > > > > > > +       amdgpu_atpx_priv.bridge_pm_usable = parent_pdev &&
> > > > > > parent_pdev->bridge_d3;
> > > > > > > >         return true;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.10.2
>