[12/12] drm/amdgpu: enable foreign DMA-buf objects

Submitted by Kuehling, Felix on July 4, 2017, 3:56 p.m.

Details

Message ID 21d55b2f-d713-a67a-cbf6-e9590cee308a@amd.com
State New
Headers show
Series "Patches from amd-kfd-staging" ( rev: 4 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Kuehling, Felix July 4, 2017, 3:56 p.m.
On 17-07-04 03:32 AM, Christian König wrote:
> Am 03.07.2017 um 23:11 schrieb Felix Kuehling:
>> +
>> +    list_add(&gobj->list, &bo->gem_objects);
>> +    gobj->bo = amdgpu_bo_ref(bo);
>> +    bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>
> It's a bit more tricker than that. IIRC my original patch limited the
> BO to GTT space as well.

I dug out your original patch for reference (attached), which was
originally applied on a 4.1-based KFD staging branch. Your original
patch series also included a change for VRAM VM mappings with the system
bit (also attached). So your original intention was clearly to also
allow VRAM P2P access. The VM patch didn't apply after some VM changes
(probably related to the vram manager) and was later replaced by Amber's
patch.

>
> VRAM peer to peer access doesn't work with most PCIe chipsets.
>
> At bare minimum we need to put this behind a config option or add a
> white list for the chipset or only enable it if
> "pci=pcie_bus_peer2peer" is set or something like this.

Well we're using it without any special pci= flags.
pci=pcie_bus_peer2peer can reduce performance, so we should not require
it if it's not needed on all systems.

There are other issues that can prevent P2P access between some pairs of
devices. For example on Intel dual-socket boards the QPI link between
the sockets doesn't work for P2P traffic. So P2P only works between
devices connected to the same socket.

I think it's impractical to check all those chipset-specific limitations
at this level. Importing and mapping a foreign BO should be no problem
either way. If remote access is limited, that's something the
application can figure out on its own. In case of KFD, this is done
based on IO-link topology information.

Regards,
  Felix


>
> BTW: If you modify a patch as severely as that please add your
> Signed-of-by line as well.
>
> Regards,
> Christian.
>
>> +
>> +    ww_mutex_unlock(&bo->tbo.resv->lock);
>> +
>> +    return &gobj->base;
>> +}
>> +
>> +struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
>> +                           struct dma_buf *dma_buf)
>> +{
>> +    struct amdgpu_device *adev = dev->dev_private;
>> +
>> +    if (dma_buf->ops == &drm_gem_prime_dmabuf_ops) {
>> +        struct drm_gem_object *obj = dma_buf->priv;
>> +
>> +        if (obj->dev != dev && obj->dev->driver == dev->driver) {
>> +            /* It's a amdgpu_bo from a different driver instance */
>> +            struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>> +
>> +            return amdgpu_gem_prime_foreign_bo(adev, bo);
>> +        }
>> +    }
>> +
>> +    return drm_gem_prime_import(dev, dma_buf);
>> +}
>
>

Patch hide | download patch | download mbox

From fc703df5a630852a70656dbd4842bcacb0b4cf69 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
Date: Mon, 30 Nov 2015 14:30:08 +0100
Subject: [PATCH] drm/amdgpu: handle foreign BOs in the VM mapping code v2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

That should allow us using foreign BOs in VRAM.

v2: use the correct device for the PTE creation.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 47e9f41..a02889c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -843,12 +843,15 @@  int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 			struct ttm_mem_reg *mem)
 {
 	struct amdgpu_vm *vm = bo_va->vm;
+	struct amdgpu_bo *bo = bo_va->bo;
 	struct amdgpu_bo_va_mapping *mapping;
 	struct amdgpu_gart *gtt = NULL;
 	uint32_t flags;
 	uint64_t addr;
 	int r;
 
+	flags = amdgpu_ttm_tt_pte_flags(adev, bo->tbo.ttm, mem);
+
 	if (mem) {
 		addr = (u64)mem->start << PAGE_SHIFT;
 		switch (mem->mem_type) {
@@ -857,7 +860,11 @@  int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 			break;
 
 		case TTM_PL_VRAM:
-			addr += adev->vm_manager.vram_base_offset;
+			if (bo->adev != adev) {
+				addr += bo->adev->mc.aper_base;
+				flags |= AMDGPU_PTE_SYSTEM;
+			} else
+				addr += adev->vm_manager.vram_base_offset;
 			break;
 
 		default:
@@ -867,8 +874,6 @@  int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 		addr = 0;
 	}
 
-	flags = amdgpu_ttm_tt_pte_flags(adev, bo_va->bo->tbo.ttm, mem);
-
 	spin_lock(&vm->status_lock);
 	if (!list_empty(&bo_va->vm_status))
 		list_splice_init(&bo_va->valids, &bo_va->invalids);
-- 
1.9.1


Comments

Am 04.07.2017 um 17:56 schrieb Felix Kuehling:
> On 17-07-04 03:32 AM, Christian König wrote:
>> Am 03.07.2017 um 23:11 schrieb Felix Kuehling:
>>> +
>>> +    list_add(&gobj->list, &bo->gem_objects);
>>> +    gobj->bo = amdgpu_bo_ref(bo);
>>> +    bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>> It's a bit more tricker than that. IIRC my original patch limited the
>> BO to GTT space as well.
> I dug out your original patch for reference (attached), which was
> originally applied on a 4.1-based KFD staging branch. Your original
> patch series also included a change for VRAM VM mappings with the system
> bit (also attached). So your original intention was clearly to also
> allow VRAM P2P access. The VM patch didn't apply after some VM changes
> (probably related to the vram manager) and was later replaced by Amber's
> patch.

Ok in this case my memory fooled me.

>> VRAM peer to peer access doesn't work with most PCIe chipsets.
>>
>> At bare minimum we need to put this behind a config option or add a
>> white list for the chipset or only enable it if
>> "pci=pcie_bus_peer2peer" is set or something like this.
> Well we're using it without any special pci= flags.
> pci=pcie_bus_peer2peer can reduce performance, so we should not require
> it if it's not needed on all systems.
>
> There are other issues that can prevent P2P access between some pairs of
> devices. For example on Intel dual-socket boards the QPI link between
> the sockets doesn't work for P2P traffic. So P2P only works between
> devices connected to the same socket.
>
> I think it's impractical to check all those chipset-specific limitations
> at this level.

Seriously? Sorry, but that approach is a clear no-go.

Long story short all those cases must cleanly be handled before this 
patch series can land upstream (or even be in any hybrid release).

> Importing and mapping a foreign BO should be no problem
> either way. If remote access is limited, that's something the
> application can figure out on its own. In case of KFD, this is done
> based on IO-link topology information.

I'm pretty sure that the patch as is would break A+A laptops, so pushing 
it to any branch outside some KFD testing environment is a clear NAK 
from my side.

I would also strongly discourage to use it in a production system until 
those issues are sorted out. This patch series was a technical prove of 
concept, not something we can upstream as is.

What needs to be done is working on a white list for chipsets and/or 
interconnections between PCIe bus systems.

Regards,
Christian.

>
> Regards,
>    Felix
>
>
>> BTW: If you modify a patch as severely as that please add your
>> Signed-of-by line as well.
>>
>> Regards,
>> Christian.
>>
>>> +
>>> +    ww_mutex_unlock(&bo->tbo.resv->lock);
>>> +
>>> +    return &gobj->base;
>>> +}
>>> +
>>> +struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
>>> +                           struct dma_buf *dma_buf)
>>> +{
>>> +    struct amdgpu_device *adev = dev->dev_private;
>>> +
>>> +    if (dma_buf->ops == &drm_gem_prime_dmabuf_ops) {
>>> +        struct drm_gem_object *obj = dma_buf->priv;
>>> +
>>> +        if (obj->dev != dev && obj->dev->driver == dev->driver) {
>>> +            /* It's a amdgpu_bo from a different driver instance */
>>> +            struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
>>> +
>>> +            return amdgpu_gem_prime_foreign_bo(adev, bo);
>>> +        }
>>> +    }
>>> +
>>> +    return drm_gem_prime_import(dev, dma_buf);
>>> +}
>>
On 17-07-04 12:39 PM, Christian König wrote:
> Long story short all those cases must cleanly be handled before this
> patch series can land upstream (or even be in any hybrid release). 
FWIW, it's in 17.20.

> I'm pretty sure that the patch as is would break A+A laptops, so
> pushing it to any branch outside some KFD testing environment is a
> clear NAK from my side.

On A+A laptops, one side of the P2P is system memory, so it shouldn't be
a problem.

>
> I would also strongly discourage to use it in a production system
> until those issues are sorted out. This patch series was a technical
> prove of concept, not something we can upstream as is.
>
> What needs to be done is working on a white list for chipsets and/or
> interconnections between PCIe bus systems. 
If I understand your concerns correctly, it's that buffer sharing across
GPUs can be used in amdgpu graphics use cases today, but forces the
memory to be migrated to system memory and shared through scatter-gather
lists. I wasn't aware of such use cases. A+A laptops shouldn't be a
problem because it's not really P2P like I pointed out above.

This patch series improves buffer sharing between GPUs but breaks if the
chipset doesn't support P2P, if there are such use cases today. I'm
still sceptical about that assumption.

So we'd need some conditions to check whether P2P is supported by the
chipset (whitelist, or maybe a config option or module parameter) for
ROCm setups that need P2P. And a working fallback path (the old SG-way)
for systems where P2P is not working.

Regards,
  Felix