amdgpu with 8+ cards for GPU mining?

Submitted by Koenig, Christian on Feb. 19, 2018, 1:21 p.m.

Details

Message ID 064a018e-9b61-4559-e4af-929b0bccd841@amd.com
State New
Headers show
Series "amdgpu with 8+ cards for GPU mining?" ( rev: 1 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Koenig, Christian Feb. 19, 2018, 1:21 p.m.
Hi Joseph,

as a band aid you can try the attached patch. It should at least fix the 
crash at hand and allow amdgpu to continue with the boot process.

Regards,
Christian.

Am 19.02.2018 um 14:13 schrieb Christian König:
> Hi Joseph,
>
> and here is the root cause of the problem:
>> 0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
>> [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] (rev ef) (prog-if 00 
>> [VGA controller])
>>     Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b31
>>     Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
>> ParErr- Stepping- SERR- FastB2B- DisINTx-
>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>     Interrupt: pin A routed to IRQ 11
>>     Region 0: Memory at <ignored> (64-bit, prefetchable) [disabled]
>>     Region 2: Memory at b0000000 (64-bit, prefetchable) [disabled] 
>> [size=2M]
>
> The BIOS is not able to assign resources to one of the VGA adapters 
> when there are more than eight installed.
>
> You could try with pci=realloc, but I doubt that there is much we can 
> do in the operating system when the BIOS messed things up like that.
>
> What we should do is to prevent amdgpu from crashing so badly, e.g. 
> allow to cleanly continue with the working hardware even when one of 
> the devices doesn't work.
>
>> when I load in amdgpu, everything froze, so I don't have the log.
> You can work around that using netconsole, see 
> Documentation/networking/netconsole.txt.
>
> Going to try to fix that by just using the screen shot you send 
> earlier, but it would be better if I can get a full log.
>
> Regards,
> Christian.
>
> Am 19.02.2018 um 12:55 schrieb Joseph Wang:
>> Here is the lspci without amdgpu loaded.  when I load in amdgpu, 
>> everything froze, so I don't have the log.
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

Patch hide | download patch | download mbox

From 9a90837362d8620d247d943b0e8ed93250a3ad3c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Mon, 19 Feb 2018 14:17:18 +0100
Subject: [PATCH] PCI: stop crashing in pci_release_resource
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Is it entirely possible that the BIOS wasn't able to assign resources to
a device. In this case don't crash in pci_release_resource() when we try
to resize the resource.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/pci/setup-res.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index e815111f3f81..fd72c87a9b72 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -401,6 +401,9 @@  void pci_release_resource(struct pci_dev *dev, int resno)
 {
 	struct resource *res = dev->resource + resno;
 
+	if (!res->parent)
+		return;
+
 	dev_info(&dev->dev, "BAR %d: releasing %pR\n", resno, res);
 	release_resource(res);
 	res->end = resource_size(res) - 1;
-- 
2.14.1