[4/5] drm/panfrost: Add support for GPU heap allocations

Submitted by Rob Herring on July 17, 2019, 6:33 p.m.

Details

Message ID 20190717183352.22519-4-robh@kernel.org
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in DRI devel

Not browsing as part of any series.

Commit Message

Rob Herring July 17, 2019, 6:33 p.m.
The midgard/bifrost GPUs need to allocate GPU heap memory which is
allocated on GPU page faults and not pinned in memory. The vendor driver
calls this functionality GROW_ON_GPF.

This implementation assumes that BOs allocated with the
PANFROST_BO_NOEXEC flag are never mmapped or exported. Both of those may
actually work, but I'm unsure if there's some interaction there. It
would cause the whole object to be pinned in memory which would defeat
the point of this.

On faults, we map in 2MB at a time in order to utilize huge pages (if
enabled). Currently, once we've mapped pages in, they are only unmapped
if the BO is freed. Once we add shrinker support, we can unmap pages
with the shrinker.

Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Rob Herring <robh@kernel.org>
---
 drivers/gpu/drm/panfrost/TODO           |   2 -
 drivers/gpu/drm/panfrost/panfrost_drv.c |   2 +-
 drivers/gpu/drm/panfrost/panfrost_gem.c |  14 ++-
 drivers/gpu/drm/panfrost/panfrost_gem.h |   8 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c | 114 +++++++++++++++++++++---
 include/uapi/drm/panfrost_drm.h         |   1 +
 6 files changed, 125 insertions(+), 16 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/panfrost/TODO b/drivers/gpu/drm/panfrost/TODO
index c2e44add37d8..64129bf73933 100644
--- a/drivers/gpu/drm/panfrost/TODO
+++ b/drivers/gpu/drm/panfrost/TODO
@@ -14,8 +14,6 @@ 
   The hard part is handling when more address spaces are needed than what
   the h/w provides.
 
-- Support pinning pages on demand (GPU page faults).
-
 - Support userspace controlled GPU virtual addresses. Needed for Vulkan. (Tomeu)
 
 - Support for madvise and a shrinker.
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
index b91e991bc6a3..9e87d0060202 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -50,7 +50,7 @@  static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
 	struct drm_panfrost_create_bo *args = data;
 
 	if (!args->size || args->pad ||
-	    (args->flags & ~PANFROST_BO_NOEXEC))
+	    (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
 		return -EINVAL;
 
 	bo = panfrost_gem_create_with_handle(file, dev, args->size, args->flags,
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 37ffec8391da..528396000038 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -87,7 +87,10 @@  static int panfrost_gem_map(struct panfrost_device *pfdev, struct panfrost_gem_o
 	if (ret)
 		return ret;
 
-	return panfrost_mmu_map(bo);
+	if (!bo->is_heap)
+		ret = panfrost_mmu_map(bo);
+
+	return ret;
 }
 
 struct panfrost_gem_object *
@@ -101,7 +104,11 @@  panfrost_gem_create_with_handle(struct drm_file *file_priv,
 	struct drm_gem_shmem_object *shmem;
 	struct panfrost_gem_object *bo;
 
-	size = roundup(size, PAGE_SIZE);
+	/* Round up heap allocations to 2MB to keep fault handling simple */
+	if (flags & PANFROST_BO_HEAP)
+		size = roundup(size, SZ_2M);
+	else
+		size = roundup(size, PAGE_SIZE);
 
 	shmem = drm_gem_shmem_create_with_handle(file_priv, dev, size, handle);
 	if (IS_ERR(shmem))
@@ -109,6 +116,9 @@  panfrost_gem_create_with_handle(struct drm_file *file_priv,
 
 	bo = to_panfrost_bo(&shmem->base);
 	bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
+	bo->is_heap = !!(flags & PANFROST_BO_HEAP);
+	if (bo->is_heap)
+		bo->noexec = true;
 
 	ret = panfrost_gem_map(pfdev, bo);
 	if (ret)
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
index 132f02399b7b..c500ca6b9072 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -13,6 +13,7 @@  struct panfrost_gem_object {
 	struct drm_mm_node node;
 	bool is_mapped		:1;
 	bool noexec		:1;
+	bool is_heap		:1;
 };
 
 static inline
@@ -21,6 +22,13 @@  struct  panfrost_gem_object *to_panfrost_bo(struct drm_gem_object *obj)
 	return container_of(to_drm_gem_shmem_obj(obj), struct panfrost_gem_object, base);
 }
 
+static inline
+struct  panfrost_gem_object *drm_mm_node_to_panfrost_bo(struct drm_mm_node *node)
+{
+	return container_of(node, struct panfrost_gem_object, node);
+}
+
+
 struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size);
 
 struct panfrost_gem_object *
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d18484a07bfa..3b95c7027188 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -3,6 +3,7 @@ 
 /* Copyright (C) 2019 Arm Ltd. */
 #include <linux/bitfield.h>
 #include <linux/delay.h>
+#include <linux/dma-mapping.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iopoll.h>
@@ -10,6 +11,7 @@ 
 #include <linux/iommu.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
+#include <linux/shmem_fs.h>
 #include <linux/sizes.h>
 
 #include "panfrost_device.h"
@@ -257,12 +259,12 @@  void panfrost_mmu_unmap(struct panfrost_gem_object *bo)
 		size_t unmapped_page;
 		size_t pgsize = get_pgsize(iova, len - unmapped_len);
 
-		unmapped_page = ops->unmap(ops, iova, pgsize);
-		if (!unmapped_page)
-			break;
-
-		iova += unmapped_page;
-		unmapped_len += unmapped_page;
+		if (ops->iova_to_phys(ops, iova)) {
+			unmapped_page = ops->unmap(ops, iova, pgsize);
+			WARN_ON(unmapped_page != pgsize);
+		}
+		iova += pgsize;
+		unmapped_len += pgsize;
 	}
 
 	mmu_hw_do_operation(pfdev, 0, bo->node.start << PAGE_SHIFT,
@@ -298,6 +300,86 @@  static const struct iommu_gather_ops mmu_tlb_ops = {
 	.tlb_sync	= mmu_tlb_sync_context,
 };
 
+static struct drm_mm_node *addr_to_drm_mm_node(struct panfrost_device *pfdev, int as, u64 addr)
+{
+	struct drm_mm_node *node;
+	u64 offset = addr >> PAGE_SHIFT;
+
+	drm_mm_for_each_node(node, &pfdev->mm) {
+		if (offset >= node->start && offset < (node->start + node->size))
+			return node;
+	}
+	return NULL;
+}
+
+#define NUM_FAULT_PAGES (SZ_2M / PAGE_SIZE)
+
+int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, u64 addr)
+{
+	int ret, i;
+	struct drm_mm_node *node;
+	struct panfrost_gem_object *bo;
+	struct address_space *mapping;
+	pgoff_t page_offset;
+	struct sg_table sgt = {};
+	struct page **pages;
+
+	node = addr_to_drm_mm_node(pfdev, as, addr);
+	if (!node)
+		return -ENOENT;
+
+	bo = drm_mm_node_to_panfrost_bo(node);
+	if (!bo->is_heap) {
+		dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
+			 node->start << PAGE_SHIFT);
+		return -EINVAL;
+	}
+	/* Assume 2MB alignment and size multiple */
+	addr &= ~((u64)SZ_2M - 1);
+	page_offset = addr >> PAGE_SHIFT;
+	page_offset -= node->start;
+
+	pages = kvmalloc_array(NUM_FAULT_PAGES, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return -ENOMEM;
+
+	mapping = bo->base.base.filp->f_mapping;
+	mapping_set_unevictable(mapping);
+
+	for (i = 0; i < NUM_FAULT_PAGES; i++) {
+		pages[i] = shmem_read_mapping_page(mapping, page_offset + i);
+		if (IS_ERR(pages[i])) {
+			ret = PTR_ERR(pages[i]);
+			goto err_pages;
+		}
+	}
+
+	ret = sg_alloc_table_from_pages(&sgt, pages, NUM_FAULT_PAGES, 0,
+					SZ_2M, GFP_KERNEL);
+	if (ret)
+		goto err_pages;
+
+	if (dma_map_sg(pfdev->dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL) == 0) {
+		ret = -EINVAL;
+		goto err_map;
+	}
+
+	mmu_map_sg(pfdev, addr, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, &sgt);
+
+	mmu_write(pfdev, MMU_INT_CLEAR, BIT(as));
+	bo->is_mapped = true;
+
+	dev_dbg(pfdev->dev, "mapped page fault @ %llx", addr);
+
+	return 0;
+
+err_map:
+	sg_free_table(&sgt);
+err_pages:
+	kvfree(pages);
+	return ret;
+}
+
 static const char *access_type_name(struct panfrost_device *pfdev,
 		u32 fault_status)
 {
@@ -323,13 +405,11 @@  static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
 {
 	struct panfrost_device *pfdev = data;
 	u32 status = mmu_read(pfdev, MMU_INT_STAT);
-	int i;
+	int i, ret;
 
 	if (!status)
 		return IRQ_NONE;
 
-	dev_err(pfdev->dev, "mmu irq status=%x\n", status);
-
 	for (i = 0; status; i++) {
 		u32 mask = BIT(i) | BIT(i + 16);
 		u64 addr;
@@ -350,6 +430,17 @@  static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
 		access_type = (fault_status >> 8) & 0x3;
 		source_id = (fault_status >> 16);
 
+		/* Page fault only */
+		if ((status & mask) == BIT(i)) {
+			WARN_ON(exception_type < 0xC1 || exception_type > 0xC4);
+
+			ret = panfrost_mmu_map_fault_addr(pfdev, i, addr);
+			if (!ret) {
+				status &= ~mask;
+				continue;
+			}
+		}
+
 		/* terminal fault, print info about the fault */
 		dev_err(pfdev->dev,
 			"Unhandled Page fault in AS%d at VA 0x%016llX\n"
@@ -391,8 +482,9 @@  int panfrost_mmu_init(struct panfrost_device *pfdev)
 	if (irq <= 0)
 		return -ENODEV;
 
-	err = devm_request_irq(pfdev->dev, irq, panfrost_mmu_irq_handler,
-			       IRQF_SHARED, "mmu", pfdev);
+	err = devm_request_threaded_irq(pfdev->dev, irq, NULL,
+					panfrost_mmu_irq_handler,
+					IRQF_ONESHOT, "mmu", pfdev);
 
 	if (err) {
 		dev_err(pfdev->dev, "failed to request mmu irq");
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
index 17fb5d200f7a..9150dd75aad8 100644
--- a/include/uapi/drm/panfrost_drm.h
+++ b/include/uapi/drm/panfrost_drm.h
@@ -83,6 +83,7 @@  struct drm_panfrost_wait_bo {
 };
 
 #define PANFROST_BO_NOEXEC	1
+#define PANFROST_BO_HEAP	2
 
 /**
  * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.

Comments

On 17/07/2019 19:33, Rob Herring wrote:
> The midgard/bifrost GPUs need to allocate GPU heap memory which is
> allocated on GPU page faults and not pinned in memory. The vendor driver
> calls this functionality GROW_ON_GPF.
> 
> This implementation assumes that BOs allocated with the
> PANFROST_BO_NOEXEC flag are never mmapped or exported. Both of those may
> actually work, but I'm unsure if there's some interaction there. It
> would cause the whole object to be pinned in memory which would defeat
> the point of this.
> 
> On faults, we map in 2MB at a time in order to utilize huge pages (if
> enabled). Currently, once we've mapped pages in, they are only unmapped
> if the BO is freed. Once we add shrinker support, we can unmap pages
> with the shrinker.
> 
> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
>  drivers/gpu/drm/panfrost/TODO           |   2 -
>  drivers/gpu/drm/panfrost/panfrost_drv.c |   2 +-
>  drivers/gpu/drm/panfrost/panfrost_gem.c |  14 ++-
>  drivers/gpu/drm/panfrost/panfrost_gem.h |   8 ++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 114 +++++++++++++++++++++---
>  include/uapi/drm/panfrost_drm.h         |   1 +
>  6 files changed, 125 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/TODO b/drivers/gpu/drm/panfrost/TODO
> index c2e44add37d8..64129bf73933 100644
> --- a/drivers/gpu/drm/panfrost/TODO
> +++ b/drivers/gpu/drm/panfrost/TODO
> @@ -14,8 +14,6 @@
>    The hard part is handling when more address spaces are needed than what
>    the h/w provides.
>  
> -- Support pinning pages on demand (GPU page faults).
> -
>  - Support userspace controlled GPU virtual addresses. Needed for Vulkan. (Tomeu)
>  
>  - Support for madvise and a shrinker.
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index b91e991bc6a3..9e87d0060202 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -50,7 +50,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
>  	struct drm_panfrost_create_bo *args = data;
>  
>  	if (!args->size || args->pad ||
> -	    (args->flags & ~PANFROST_BO_NOEXEC))
> +	    (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
>  		return -EINVAL;
>  
>  	bo = panfrost_gem_create_with_handle(file, dev, args->size, args->flags,
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index 37ffec8391da..528396000038 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -87,7 +87,10 @@ static int panfrost_gem_map(struct panfrost_device *pfdev, struct panfrost_gem_o
>  	if (ret)
>  		return ret;
>  
> -	return panfrost_mmu_map(bo);
> +	if (!bo->is_heap)
> +		ret = panfrost_mmu_map(bo);
> +
> +	return ret;
>  }
>  
>  struct panfrost_gem_object *
> @@ -101,7 +104,11 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv,
>  	struct drm_gem_shmem_object *shmem;
>  	struct panfrost_gem_object *bo;
>  
> -	size = roundup(size, PAGE_SIZE);
> +	/* Round up heap allocations to 2MB to keep fault handling simple */
> +	if (flags & PANFROST_BO_HEAP)
> +		size = roundup(size, SZ_2M);
> +	else
> +		size = roundup(size, PAGE_SIZE);
>  
>  	shmem = drm_gem_shmem_create_with_handle(file_priv, dev, size, handle);
>  	if (IS_ERR(shmem))
> @@ -109,6 +116,9 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv,
>  
>  	bo = to_panfrost_bo(&shmem->base);
>  	bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
> +	bo->is_heap = !!(flags & PANFROST_BO_HEAP);
> +	if (bo->is_heap)
> +		bo->noexec = true;

While I agree an executable heap is pretty weird, I'd prefer making this
explicit - i.e. failing the allocation if the flags don't make sense.

>  
>  	ret = panfrost_gem_map(pfdev, bo);
>  	if (ret)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> index 132f02399b7b..c500ca6b9072 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> @@ -13,6 +13,7 @@ struct panfrost_gem_object {
>  	struct drm_mm_node node;
>  	bool is_mapped		:1;
>  	bool noexec		:1;
> +	bool is_heap		:1;
>  };
>  
>  static inline
> @@ -21,6 +22,13 @@ struct  panfrost_gem_object *to_panfrost_bo(struct drm_gem_object *obj)
>  	return container_of(to_drm_gem_shmem_obj(obj), struct panfrost_gem_object, base);
>  }
>  
> +static inline
> +struct  panfrost_gem_object *drm_mm_node_to_panfrost_bo(struct drm_mm_node *node)
> +{
> +	return container_of(node, struct panfrost_gem_object, node);
> +}
> +
> +

NIT: Extra blank line

>  struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size);
>  
>  struct panfrost_gem_object *
> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> index d18484a07bfa..3b95c7027188 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> @@ -3,6 +3,7 @@
>  /* Copyright (C) 2019 Arm Ltd. */
>  #include <linux/bitfield.h>
>  #include <linux/delay.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/iopoll.h>
> @@ -10,6 +11,7 @@
>  #include <linux/iommu.h>
>  #include <linux/platform_device.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/shmem_fs.h>
>  #include <linux/sizes.h>
>  
>  #include "panfrost_device.h"
> @@ -257,12 +259,12 @@ void panfrost_mmu_unmap(struct panfrost_gem_object *bo)
>  		size_t unmapped_page;
>  		size_t pgsize = get_pgsize(iova, len - unmapped_len);
>  
> -		unmapped_page = ops->unmap(ops, iova, pgsize);
> -		if (!unmapped_page)
> -			break;
> -
> -		iova += unmapped_page;
> -		unmapped_len += unmapped_page;
> +		if (ops->iova_to_phys(ops, iova)) {
> +			unmapped_page = ops->unmap(ops, iova, pgsize);
> +			WARN_ON(unmapped_page != pgsize);
> +		}
> +		iova += pgsize;
> +		unmapped_len += pgsize;
>  	}
>  
>  	mmu_hw_do_operation(pfdev, 0, bo->node.start << PAGE_SHIFT,
> @@ -298,6 +300,86 @@ static const struct iommu_gather_ops mmu_tlb_ops = {
>  	.tlb_sync	= mmu_tlb_sync_context,
>  };
>  
> +static struct drm_mm_node *addr_to_drm_mm_node(struct panfrost_device *pfdev, int as, u64 addr)
> +{
> +	struct drm_mm_node *node;
> +	u64 offset = addr >> PAGE_SHIFT;
> +
> +	drm_mm_for_each_node(node, &pfdev->mm) {
> +		if (offset >= node->start && offset < (node->start + node->size))
> +			return node;
> +	}
> +	return NULL;
> +}
> +
> +#define NUM_FAULT_PAGES (SZ_2M / PAGE_SIZE)
> +
> +int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, u64 addr)
> +{
> +	int ret, i;
> +	struct drm_mm_node *node;
> +	struct panfrost_gem_object *bo;
> +	struct address_space *mapping;
> +	pgoff_t page_offset;
> +	struct sg_table sgt = {};
> +	struct page **pages;
> +
> +	node = addr_to_drm_mm_node(pfdev, as, addr);
> +	if (!node)
> +		return -ENOENT;
> +
> +	bo = drm_mm_node_to_panfrost_bo(node);
> +	if (!bo->is_heap) {
> +		dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
> +			 node->start << PAGE_SHIFT);
> +		return -EINVAL;
> +	}
> +	/* Assume 2MB alignment and size multiple */
> +	addr &= ~((u64)SZ_2M - 1);
> +	page_offset = addr >> PAGE_SHIFT;
> +	page_offset -= node->start;
> +
> +	pages = kvmalloc_array(NUM_FAULT_PAGES, sizeof(struct page *), GFP_KERNEL);
> +	if (!pages)
> +		return -ENOMEM;
> +
> +	mapping = bo->base.base.filp->f_mapping;
> +	mapping_set_unevictable(mapping);
> +
> +	for (i = 0; i < NUM_FAULT_PAGES; i++) {
> +		pages[i] = shmem_read_mapping_page(mapping, page_offset + i);
> +		if (IS_ERR(pages[i])) {
> +			ret = PTR_ERR(pages[i]);
> +			goto err_pages;
> +		}
> +	}
> +
> +	ret = sg_alloc_table_from_pages(&sgt, pages, NUM_FAULT_PAGES, 0,
> +					SZ_2M, GFP_KERNEL);
> +	if (ret)
> +		goto err_pages;
> +
> +	if (dma_map_sg(pfdev->dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL) == 0) {
> +		ret = -EINVAL;
> +		goto err_map;
> +	}
> +
> +	mmu_map_sg(pfdev, addr, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, &sgt);
> +
> +	mmu_write(pfdev, MMU_INT_CLEAR, BIT(as));
> +	bo->is_mapped = true;
> +
> +	dev_dbg(pfdev->dev, "mapped page fault @ %llx", addr);
> +
> +	return 0;

You still need to free sgt and pages - so this should be:

ret = 0;

to fall through to the clean up below:

> +
> +err_map:
> +	sg_free_table(&sgt);
> +err_pages:
> +	kvfree(pages);
> +	return ret;
> +}
> +

But actually, you need to store the pages allocated in the buffer object
so that they can be freed later. At the moment you have a big memory leak.

>  static const char *access_type_name(struct panfrost_device *pfdev,
>  		u32 fault_status)
>  {
> @@ -323,13 +405,11 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
>  {
>  	struct panfrost_device *pfdev = data;
>  	u32 status = mmu_read(pfdev, MMU_INT_STAT);
> -	int i;
> +	int i, ret;
>  
>  	if (!status)
>  		return IRQ_NONE;
>  
> -	dev_err(pfdev->dev, "mmu irq status=%x\n", status);
> -
>  	for (i = 0; status; i++) {
>  		u32 mask = BIT(i) | BIT(i + 16);
>  		u64 addr;
> @@ -350,6 +430,17 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
>  		access_type = (fault_status >> 8) & 0x3;
>  		source_id = (fault_status >> 16);
>  
> +		/* Page fault only */
> +		if ((status & mask) == BIT(i)) {
> +			WARN_ON(exception_type < 0xC1 || exception_type > 0xC4);
> +
> +			ret = panfrost_mmu_map_fault_addr(pfdev, i, addr);
> +			if (!ret) {
> +				status &= ~mask;
> +				continue;

In this case the IRQ isn't handled and will remain asserted, which
probably isn't going to end particularly well.

Ideally you would switch the address space to UNMAPPED to kill off the
job, but at the very least we should acknowledge the interrupt and let
the GPU timeout reset the GPU to recover (which is equivalent while we
still only use the one AS on the GPU).

Steve

> +			}
> +		}
> +
>  		/* terminal fault, print info about the fault */
>  		dev_err(pfdev->dev,
>  			"Unhandled Page fault in AS%d at VA 0x%016llX\n"
> @@ -391,8 +482,9 @@ int panfrost_mmu_init(struct panfrost_device *pfdev)
>  	if (irq <= 0)
>  		return -ENODEV;
>  
> -	err = devm_request_irq(pfdev->dev, irq, panfrost_mmu_irq_handler,
> -			       IRQF_SHARED, "mmu", pfdev);
> +	err = devm_request_threaded_irq(pfdev->dev, irq, NULL,
> +					panfrost_mmu_irq_handler,
> +					IRQF_ONESHOT, "mmu", pfdev);
>  
>  	if (err) {
>  		dev_err(pfdev->dev, "failed to request mmu irq");
> diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> index 17fb5d200f7a..9150dd75aad8 100644
> --- a/include/uapi/drm/panfrost_drm.h
> +++ b/include/uapi/drm/panfrost_drm.h
> @@ -83,6 +83,7 @@ struct drm_panfrost_wait_bo {
>  };
>  
>  #define PANFROST_BO_NOEXEC	1
> +#define PANFROST_BO_HEAP	2
>  
>  /**
>   * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
>
On Thu, Jul 18, 2019 at 9:03 AM Steven Price <steven.price@arm.com> wrote:
>
> On 17/07/2019 19:33, Rob Herring wrote:
> > The midgard/bifrost GPUs need to allocate GPU heap memory which is
> > allocated on GPU page faults and not pinned in memory. The vendor driver
> > calls this functionality GROW_ON_GPF.
> >
> > This implementation assumes that BOs allocated with the
> > PANFROST_BO_NOEXEC flag are never mmapped or exported. Both of those may
> > actually work, but I'm unsure if there's some interaction there. It
> > would cause the whole object to be pinned in memory which would defeat
> > the point of this.
> >
> > On faults, we map in 2MB at a time in order to utilize huge pages (if
> > enabled). Currently, once we've mapped pages in, they are only unmapped
> > if the BO is freed. Once we add shrinker support, we can unmap pages
> > with the shrinker.
> >
> > Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > Cc: Boris Brezillon <boris.brezillon@collabora.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> > Signed-off-by: Rob Herring <robh@kernel.org>
> > ---
> >  drivers/gpu/drm/panfrost/TODO           |   2 -
> >  drivers/gpu/drm/panfrost/panfrost_drv.c |   2 +-
> >  drivers/gpu/drm/panfrost/panfrost_gem.c |  14 ++-
> >  drivers/gpu/drm/panfrost/panfrost_gem.h |   8 ++
> >  drivers/gpu/drm/panfrost/panfrost_mmu.c | 114 +++++++++++++++++++++---
> >  include/uapi/drm/panfrost_drm.h         |   1 +
> >  6 files changed, 125 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/panfrost/TODO b/drivers/gpu/drm/panfrost/TODO
> > index c2e44add37d8..64129bf73933 100644
> > --- a/drivers/gpu/drm/panfrost/TODO
> > +++ b/drivers/gpu/drm/panfrost/TODO
> > @@ -14,8 +14,6 @@
> >    The hard part is handling when more address spaces are needed than what
> >    the h/w provides.
> >
> > -- Support pinning pages on demand (GPU page faults).
> > -
> >  - Support userspace controlled GPU virtual addresses. Needed for Vulkan. (Tomeu)
> >
> >  - Support for madvise and a shrinker.
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > index b91e991bc6a3..9e87d0060202 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > @@ -50,7 +50,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
> >       struct drm_panfrost_create_bo *args = data;
> >
> >       if (!args->size || args->pad ||
> > -         (args->flags & ~PANFROST_BO_NOEXEC))
> > +         (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
> >               return -EINVAL;
> >
> >       bo = panfrost_gem_create_with_handle(file, dev, args->size, args->flags,
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> > index 37ffec8391da..528396000038 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> > @@ -87,7 +87,10 @@ static int panfrost_gem_map(struct panfrost_device *pfdev, struct panfrost_gem_o
> >       if (ret)
> >               return ret;
> >
> > -     return panfrost_mmu_map(bo);
> > +     if (!bo->is_heap)
> > +             ret = panfrost_mmu_map(bo);
> > +
> > +     return ret;
> >  }
> >
> >  struct panfrost_gem_object *
> > @@ -101,7 +104,11 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv,
> >       struct drm_gem_shmem_object *shmem;
> >       struct panfrost_gem_object *bo;
> >
> > -     size = roundup(size, PAGE_SIZE);
> > +     /* Round up heap allocations to 2MB to keep fault handling simple */
> > +     if (flags & PANFROST_BO_HEAP)
> > +             size = roundup(size, SZ_2M);
> > +     else
> > +             size = roundup(size, PAGE_SIZE);
> >
> >       shmem = drm_gem_shmem_create_with_handle(file_priv, dev, size, handle);
> >       if (IS_ERR(shmem))
> > @@ -109,6 +116,9 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv,
> >
> >       bo = to_panfrost_bo(&shmem->base);
> >       bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
> > +     bo->is_heap = !!(flags & PANFROST_BO_HEAP);
> > +     if (bo->is_heap)
> > +             bo->noexec = true;
>
> While I agree an executable heap is pretty weird, I'd prefer making this
> explicit - i.e. failing the allocation if the flags don't make sense.

Seems a bit strange too to always have to set NOEXEC when HEAP is set.
There's not really any reason to reject setting just HEAP.

[...]

> > +int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, u64 addr)
> > +{
> > +     int ret, i;
> > +     struct drm_mm_node *node;
> > +     struct panfrost_gem_object *bo;
> > +     struct address_space *mapping;
> > +     pgoff_t page_offset;
> > +     struct sg_table sgt = {};
> > +     struct page **pages;
> > +
> > +     node = addr_to_drm_mm_node(pfdev, as, addr);
> > +     if (!node)
> > +             return -ENOENT;
> > +
> > +     bo = drm_mm_node_to_panfrost_bo(node);
> > +     if (!bo->is_heap) {
> > +             dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
> > +                      node->start << PAGE_SHIFT);
> > +             return -EINVAL;
> > +     }
> > +     /* Assume 2MB alignment and size multiple */
> > +     addr &= ~((u64)SZ_2M - 1);
> > +     page_offset = addr >> PAGE_SHIFT;
> > +     page_offset -= node->start;
> > +
> > +     pages = kvmalloc_array(NUM_FAULT_PAGES, sizeof(struct page *), GFP_KERNEL);
> > +     if (!pages)
> > +             return -ENOMEM;
> > +
> > +     mapping = bo->base.base.filp->f_mapping;
> > +     mapping_set_unevictable(mapping);
> > +
> > +     for (i = 0; i < NUM_FAULT_PAGES; i++) {
> > +             pages[i] = shmem_read_mapping_page(mapping, page_offset + i);
> > +             if (IS_ERR(pages[i])) {
> > +                     ret = PTR_ERR(pages[i]);
> > +                     goto err_pages;
> > +             }
> > +     }
> > +
> > +     ret = sg_alloc_table_from_pages(&sgt, pages, NUM_FAULT_PAGES, 0,
> > +                                     SZ_2M, GFP_KERNEL);
> > +     if (ret)
> > +             goto err_pages;
> > +
> > +     if (dma_map_sg(pfdev->dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL) == 0) {
> > +             ret = -EINVAL;
> > +             goto err_map;
> > +     }
> > +
> > +     mmu_map_sg(pfdev, addr, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, &sgt);
> > +
> > +     mmu_write(pfdev, MMU_INT_CLEAR, BIT(as));
> > +     bo->is_mapped = true;
> > +
> > +     dev_dbg(pfdev->dev, "mapped page fault @ %llx", addr);
> > +
> > +     return 0;
>
> You still need to free sgt and pages - so this should be:
>
> ret = 0;
>
> to fall through to the clean up below:

I think I had that then thought I forgot the return...

>
> > +
> > +err_map:
> > +     sg_free_table(&sgt);
> > +err_pages:
> > +     kvfree(pages);
> > +     return ret;
> > +}
> > +
>
> But actually, you need to store the pages allocated in the buffer object
> so that they can be freed later. At the moment you have a big memory leak.

Ah yes, now I see the memory ends up in 'Unevictable' bucket. I'd been
looking at the shmem counts and it seemed fine. I have it working now,
but still need to figure out how to do a dma_unmap.

> >  static const char *access_type_name(struct panfrost_device *pfdev,
> >               u32 fault_status)
> >  {
> > @@ -323,13 +405,11 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
> >  {
> >       struct panfrost_device *pfdev = data;
> >       u32 status = mmu_read(pfdev, MMU_INT_STAT);
> > -     int i;
> > +     int i, ret;
> >
> >       if (!status)
> >               return IRQ_NONE;
> >
> > -     dev_err(pfdev->dev, "mmu irq status=%x\n", status);
> > -
> >       for (i = 0; status; i++) {
> >               u32 mask = BIT(i) | BIT(i + 16);
> >               u64 addr;
> > @@ -350,6 +430,17 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
> >               access_type = (fault_status >> 8) & 0x3;
> >               source_id = (fault_status >> 16);
> >
> > +             /* Page fault only */
> > +             if ((status & mask) == BIT(i)) {
> > +                     WARN_ON(exception_type < 0xC1 || exception_type > 0xC4);
> > +
> > +                     ret = panfrost_mmu_map_fault_addr(pfdev, i, addr);
> > +                     if (!ret) {
> > +                             status &= ~mask;
> > +                             continue;
>
> In this case the IRQ isn't handled and will remain asserted, which
> probably isn't going to end particularly well.

This is the success condition. We've already cleared the interrupt in
panfrost_mmu_map_fault_addr. On failure, we fall thru printing out the
next message and clear the interrupt. Maybe it would be better to
clear the interrupt in 1 place...

>
> Ideally you would switch the address space to UNMAPPED to kill off the
> job, but at the very least we should acknowledge the interrupt and let
> the GPU timeout reset the GPU to recover (which is equivalent while we
> still only use the one AS on the GPU).

I'll hopefully remember this detail when doing multiple AS.

Thanks for the review.

Rob
On 19/07/2019 15:27, Rob Herring wrote:
> On Thu, Jul 18, 2019 at 9:03 AM Steven Price <steven.price@arm.com> wrote:
>>
>> On 17/07/2019 19:33, Rob Herring wrote:
>>> The midgard/bifrost GPUs need to allocate GPU heap memory which is
>>> allocated on GPU page faults and not pinned in memory. The vendor driver
>>> calls this functionality GROW_ON_GPF.
>>>
>>> This implementation assumes that BOs allocated with the
>>> PANFROST_BO_NOEXEC flag are never mmapped or exported. Both of those may
>>> actually work, but I'm unsure if there's some interaction there. It
>>> would cause the whole object to be pinned in memory which would defeat
>>> the point of this.
>>>
>>> On faults, we map in 2MB at a time in order to utilize huge pages (if
>>> enabled). Currently, once we've mapped pages in, they are only unmapped
>>> if the BO is freed. Once we add shrinker support, we can unmap pages
>>> with the shrinker.
>>>
>>> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>> Cc: Robin Murphy <robin.murphy@arm.com>
>>> Cc: Steven Price <steven.price@arm.com>
>>> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
>>> Signed-off-by: Rob Herring <robh@kernel.org>
>>> ---
>>>  drivers/gpu/drm/panfrost/TODO           |   2 -
>>>  drivers/gpu/drm/panfrost/panfrost_drv.c |   2 +-
>>>  drivers/gpu/drm/panfrost/panfrost_gem.c |  14 ++-
>>>  drivers/gpu/drm/panfrost/panfrost_gem.h |   8 ++
>>>  drivers/gpu/drm/panfrost/panfrost_mmu.c | 114 +++++++++++++++++++++---
>>>  include/uapi/drm/panfrost_drm.h         |   1 +
>>>  6 files changed, 125 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/panfrost/TODO b/drivers/gpu/drm/panfrost/TODO
>>> index c2e44add37d8..64129bf73933 100644
>>> --- a/drivers/gpu/drm/panfrost/TODO
>>> +++ b/drivers/gpu/drm/panfrost/TODO
>>> @@ -14,8 +14,6 @@
>>>    The hard part is handling when more address spaces are needed than what
>>>    the h/w provides.
>>>
>>> -- Support pinning pages on demand (GPU page faults).
>>> -
>>>  - Support userspace controlled GPU virtual addresses. Needed for Vulkan. (Tomeu)
>>>
>>>  - Support for madvise and a shrinker.
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> index b91e991bc6a3..9e87d0060202 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
>>> @@ -50,7 +50,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
>>>       struct drm_panfrost_create_bo *args = data;
>>>
>>>       if (!args->size || args->pad ||
>>> -         (args->flags & ~PANFROST_BO_NOEXEC))
>>> +         (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP)))
>>>               return -EINVAL;
>>>
>>>       bo = panfrost_gem_create_with_handle(file, dev, args->size, args->flags,
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
>>> index 37ffec8391da..528396000038 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
>>> @@ -87,7 +87,10 @@ static int panfrost_gem_map(struct panfrost_device *pfdev, struct panfrost_gem_o
>>>       if (ret)
>>>               return ret;
>>>
>>> -     return panfrost_mmu_map(bo);
>>> +     if (!bo->is_heap)
>>> +             ret = panfrost_mmu_map(bo);
>>> +
>>> +     return ret;
>>>  }
>>>
>>>  struct panfrost_gem_object *
>>> @@ -101,7 +104,11 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv,
>>>       struct drm_gem_shmem_object *shmem;
>>>       struct panfrost_gem_object *bo;
>>>
>>> -     size = roundup(size, PAGE_SIZE);
>>> +     /* Round up heap allocations to 2MB to keep fault handling simple */
>>> +     if (flags & PANFROST_BO_HEAP)
>>> +             size = roundup(size, SZ_2M);
>>> +     else
>>> +             size = roundup(size, PAGE_SIZE);
>>>
>>>       shmem = drm_gem_shmem_create_with_handle(file_priv, dev, size, handle);
>>>       if (IS_ERR(shmem))
>>> @@ -109,6 +116,9 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv,
>>>
>>>       bo = to_panfrost_bo(&shmem->base);
>>>       bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
>>> +     bo->is_heap = !!(flags & PANFROST_BO_HEAP);
>>> +     if (bo->is_heap)
>>> +             bo->noexec = true;
>>
>> While I agree an executable heap is pretty weird, I'd prefer making this
>> explicit - i.e. failing the allocation if the flags don't make sense.
> 
> Seems a bit strange too to always have to set NOEXEC when HEAP is set.
> There's not really any reason to reject setting just HEAP.

Personally I prefer an explicit API, so you get exactly what you
requested (no extra properties added on). But I'm not too bothered
because I can't actually see why HEAP would ever be used without NOEXEC.

> [...]
> 
>>> +int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, u64 addr)
>>> +{
>>> +     int ret, i;
>>> +     struct drm_mm_node *node;
>>> +     struct panfrost_gem_object *bo;
>>> +     struct address_space *mapping;
>>> +     pgoff_t page_offset;
>>> +     struct sg_table sgt = {};
>>> +     struct page **pages;
>>> +
>>> +     node = addr_to_drm_mm_node(pfdev, as, addr);
>>> +     if (!node)
>>> +             return -ENOENT;
>>> +
>>> +     bo = drm_mm_node_to_panfrost_bo(node);
>>> +     if (!bo->is_heap) {
>>> +             dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
>>> +                      node->start << PAGE_SHIFT);
>>> +             return -EINVAL;
>>> +     }
>>> +     /* Assume 2MB alignment and size multiple */
>>> +     addr &= ~((u64)SZ_2M - 1);
>>> +     page_offset = addr >> PAGE_SHIFT;
>>> +     page_offset -= node->start;
>>> +
>>> +     pages = kvmalloc_array(NUM_FAULT_PAGES, sizeof(struct page *), GFP_KERNEL);
>>> +     if (!pages)
>>> +             return -ENOMEM;
>>> +
>>> +     mapping = bo->base.base.filp->f_mapping;
>>> +     mapping_set_unevictable(mapping);
>>> +
>>> +     for (i = 0; i < NUM_FAULT_PAGES; i++) {
>>> +             pages[i] = shmem_read_mapping_page(mapping, page_offset + i);
>>> +             if (IS_ERR(pages[i])) {
>>> +                     ret = PTR_ERR(pages[i]);
>>> +                     goto err_pages;
>>> +             }
>>> +     }
>>> +
>>> +     ret = sg_alloc_table_from_pages(&sgt, pages, NUM_FAULT_PAGES, 0,
>>> +                                     SZ_2M, GFP_KERNEL);
>>> +     if (ret)
>>> +             goto err_pages;
>>> +
>>> +     if (dma_map_sg(pfdev->dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL) == 0) {
>>> +             ret = -EINVAL;
>>> +             goto err_map;
>>> +     }
>>> +
>>> +     mmu_map_sg(pfdev, addr, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, &sgt);
>>> +
>>> +     mmu_write(pfdev, MMU_INT_CLEAR, BIT(as));
>>> +     bo->is_mapped = true;
>>> +
>>> +     dev_dbg(pfdev->dev, "mapped page fault @ %llx", addr);
>>> +
>>> +     return 0;
>>
>> You still need to free sgt and pages - so this should be:
>>
>> ret = 0;
>>
>> to fall through to the clean up below:
> 
> I think I had that then thought I forgot the return...

Might be an idea to rename the labels to "out_xxx" to make it look less
like the return is missing? :)

>>
>>> +
>>> +err_map:
>>> +     sg_free_table(&sgt);
>>> +err_pages:
>>> +     kvfree(pages);
>>> +     return ret;
>>> +}
>>> +
>>
>> But actually, you need to store the pages allocated in the buffer object
>> so that they can be freed later. At the moment you have a big memory leak.
> 
> Ah yes, now I see the memory ends up in 'Unevictable' bucket. I'd been
> looking at the shmem counts and it seemed fine. I have it working now,
> but still need to figure out how to do a dma_unmap.

I was purely looking at the memory usage in 'top'. I was hoping it would
be lower using HEAP, but instead I saw the leak. I'm afraid I gave up
trying to figure out the best way of implementing the dma_unmap - so
good luck! :)

>>>  static const char *access_type_name(struct panfrost_device *pfdev,
>>>               u32 fault_status)
>>>  {
>>> @@ -323,13 +405,11 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
>>>  {
>>>       struct panfrost_device *pfdev = data;
>>>       u32 status = mmu_read(pfdev, MMU_INT_STAT);
>>> -     int i;
>>> +     int i, ret;
>>>
>>>       if (!status)
>>>               return IRQ_NONE;
>>>
>>> -     dev_err(pfdev->dev, "mmu irq status=%x\n", status);
>>> -
>>>       for (i = 0; status; i++) {
>>>               u32 mask = BIT(i) | BIT(i + 16);
>>>               u64 addr;
>>> @@ -350,6 +430,17 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
>>>               access_type = (fault_status >> 8) & 0x3;
>>>               source_id = (fault_status >> 16);
>>>
>>> +             /* Page fault only */
>>> +             if ((status & mask) == BIT(i)) {
>>> +                     WARN_ON(exception_type < 0xC1 || exception_type > 0xC4);
>>> +
>>> +                     ret = panfrost_mmu_map_fault_addr(pfdev, i, addr);
>>> +                     if (!ret) {
>>> +                             status &= ~mask;
>>> +                             continue;
>>
>> In this case the IRQ isn't handled and will remain asserted, which
>> probably isn't going to end particularly well.
> 
> This is the success condition. We've already cleared the interrupt in
> panfrost_mmu_map_fault_addr. On failure, we fall thru printing out the
> next message and clear the interrupt. Maybe it would be better to
> clear the interrupt in 1 place...

You're absolutely right - sorry for the noise. I think I must have
misread and assumed this was the failure condition. Like you say though
it might be cleaner to only have the one place that the interrupt is
cleared.

>>
>> Ideally you would switch the address space to UNMAPPED to kill off the
>> job, but at the very least we should acknowledge the interrupt and let
>> the GPU timeout reset the GPU to recover (which is equivalent while we
>> still only use the one AS on the GPU).
> 
> I'll hopefully remember this detail when doing multiple AS.

A fair bit of the complexity of kbase comes from trying to avoid the
possibility of one process DoSing another by submitting malicious jobs.
The MMU in particular will basically just pause until it is kicked when
a page fault happens, so UNMAPPED is needed to trigger a failure back
into the execution units to allow the threads to progress and report
failure - which is important if there's another job competing for the
same execution units.

Steve
> While I agree an executable heap is pretty weird, I'd prefer making this
> explicit - i.e. failing the allocation if the flags don't make sense.

The only use case for an executable heap I can think of is an attacker
trying to exploit a GPU-side heap overflow, and that's seriously
stretching it ;)

Making executable? mutually exclusive with growable? is quite sane to
me.

> 
> >  
> >  	ret = panfrost_gem_map(pfdev, bo);
> >  	if (ret)
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
> > index 132f02399b7b..c500ca6b9072 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_gem.h
> > +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
> > @@ -13,6 +13,7 @@ struct panfrost_gem_object {
> >  	struct drm_mm_node node;
> >  	bool is_mapped		:1;
> >  	bool noexec		:1;
> > +	bool is_heap		:1;
> >  };
> >  
> >  static inline
> > @@ -21,6 +22,13 @@ struct  panfrost_gem_object *to_panfrost_bo(struct drm_gem_object *obj)
> >  	return container_of(to_drm_gem_shmem_obj(obj), struct panfrost_gem_object, base);
> >  }
> >  
> > +static inline
> > +struct  panfrost_gem_object *drm_mm_node_to_panfrost_bo(struct drm_mm_node *node)
> > +{
> > +	return container_of(node, struct panfrost_gem_object, node);
> > +}
> > +
> > +
> 
> NIT: Extra blank line
> 
> >  struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size);
> >  
> >  struct panfrost_gem_object *
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > index d18484a07bfa..3b95c7027188 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
> > @@ -3,6 +3,7 @@
> >  /* Copyright (C) 2019 Arm Ltd. */
> >  #include <linux/bitfield.h>
> >  #include <linux/delay.h>
> > +#include <linux/dma-mapping.h>
> >  #include <linux/interrupt.h>
> >  #include <linux/io.h>
> >  #include <linux/iopoll.h>
> > @@ -10,6 +11,7 @@
> >  #include <linux/iommu.h>
> >  #include <linux/platform_device.h>
> >  #include <linux/pm_runtime.h>
> > +#include <linux/shmem_fs.h>
> >  #include <linux/sizes.h>
> >  
> >  #include "panfrost_device.h"
> > @@ -257,12 +259,12 @@ void panfrost_mmu_unmap(struct panfrost_gem_object *bo)
> >  		size_t unmapped_page;
> >  		size_t pgsize = get_pgsize(iova, len - unmapped_len);
> >  
> > -		unmapped_page = ops->unmap(ops, iova, pgsize);
> > -		if (!unmapped_page)
> > -			break;
> > -
> > -		iova += unmapped_page;
> > -		unmapped_len += unmapped_page;
> > +		if (ops->iova_to_phys(ops, iova)) {
> > +			unmapped_page = ops->unmap(ops, iova, pgsize);
> > +			WARN_ON(unmapped_page != pgsize);
> > +		}
> > +		iova += pgsize;
> > +		unmapped_len += pgsize;
> >  	}
> >  
> >  	mmu_hw_do_operation(pfdev, 0, bo->node.start << PAGE_SHIFT,
> > @@ -298,6 +300,86 @@ static const struct iommu_gather_ops mmu_tlb_ops = {
> >  	.tlb_sync	= mmu_tlb_sync_context,
> >  };
> >  
> > +static struct drm_mm_node *addr_to_drm_mm_node(struct panfrost_device *pfdev, int as, u64 addr)
> > +{
> > +	struct drm_mm_node *node;
> > +	u64 offset = addr >> PAGE_SHIFT;
> > +
> > +	drm_mm_for_each_node(node, &pfdev->mm) {
> > +		if (offset >= node->start && offset < (node->start + node->size))
> > +			return node;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +#define NUM_FAULT_PAGES (SZ_2M / PAGE_SIZE)
> > +
> > +int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, u64 addr)
> > +{
> > +	int ret, i;
> > +	struct drm_mm_node *node;
> > +	struct panfrost_gem_object *bo;
> > +	struct address_space *mapping;
> > +	pgoff_t page_offset;
> > +	struct sg_table sgt = {};
> > +	struct page **pages;
> > +
> > +	node = addr_to_drm_mm_node(pfdev, as, addr);
> > +	if (!node)
> > +		return -ENOENT;
> > +
> > +	bo = drm_mm_node_to_panfrost_bo(node);
> > +	if (!bo->is_heap) {
> > +		dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
> > +			 node->start << PAGE_SHIFT);
> > +		return -EINVAL;
> > +	}
> > +	/* Assume 2MB alignment and size multiple */
> > +	addr &= ~((u64)SZ_2M - 1);
> > +	page_offset = addr >> PAGE_SHIFT;
> > +	page_offset -= node->start;
> > +
> > +	pages = kvmalloc_array(NUM_FAULT_PAGES, sizeof(struct page *), GFP_KERNEL);
> > +	if (!pages)
> > +		return -ENOMEM;
> > +
> > +	mapping = bo->base.base.filp->f_mapping;
> > +	mapping_set_unevictable(mapping);
> > +
> > +	for (i = 0; i < NUM_FAULT_PAGES; i++) {
> > +		pages[i] = shmem_read_mapping_page(mapping, page_offset + i);
> > +		if (IS_ERR(pages[i])) {
> > +			ret = PTR_ERR(pages[i]);
> > +			goto err_pages;
> > +		}
> > +	}
> > +
> > +	ret = sg_alloc_table_from_pages(&sgt, pages, NUM_FAULT_PAGES, 0,
> > +					SZ_2M, GFP_KERNEL);
> > +	if (ret)
> > +		goto err_pages;
> > +
> > +	if (dma_map_sg(pfdev->dev, sgt.sgl, sgt.nents, DMA_BIDIRECTIONAL) == 0) {
> > +		ret = -EINVAL;
> > +		goto err_map;
> > +	}
> > +
> > +	mmu_map_sg(pfdev, addr, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, &sgt);
> > +
> > +	mmu_write(pfdev, MMU_INT_CLEAR, BIT(as));
> > +	bo->is_mapped = true;
> > +
> > +	dev_dbg(pfdev->dev, "mapped page fault @ %llx", addr);
> > +
> > +	return 0;
> 
> You still need to free sgt and pages - so this should be:
> 
> ret = 0;
> 
> to fall through to the clean up below:
> 
> > +
> > +err_map:
> > +	sg_free_table(&sgt);
> > +err_pages:
> > +	kvfree(pages);
> > +	return ret;
> > +}
> > +
> 
> But actually, you need to store the pages allocated in the buffer object
> so that they can be freed later. At the moment you have a big memory leak.
> 
> >  static const char *access_type_name(struct panfrost_device *pfdev,
> >  		u32 fault_status)
> >  {
> > @@ -323,13 +405,11 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
> >  {
> >  	struct panfrost_device *pfdev = data;
> >  	u32 status = mmu_read(pfdev, MMU_INT_STAT);
> > -	int i;
> > +	int i, ret;
> >  
> >  	if (!status)
> >  		return IRQ_NONE;
> >  
> > -	dev_err(pfdev->dev, "mmu irq status=%x\n", status);
> > -
> >  	for (i = 0; status; i++) {
> >  		u32 mask = BIT(i) | BIT(i + 16);
> >  		u64 addr;
> > @@ -350,6 +430,17 @@ static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
> >  		access_type = (fault_status >> 8) & 0x3;
> >  		source_id = (fault_status >> 16);
> >  
> > +		/* Page fault only */
> > +		if ((status & mask) == BIT(i)) {
> > +			WARN_ON(exception_type < 0xC1 || exception_type > 0xC4);
> > +
> > +			ret = panfrost_mmu_map_fault_addr(pfdev, i, addr);
> > +			if (!ret) {
> > +				status &= ~mask;
> > +				continue;
> 
> In this case the IRQ isn't handled and will remain asserted, which
> probably isn't going to end particularly well.
> 
> Ideally you would switch the address space to UNMAPPED to kill off the
> job, but at the very least we should acknowledge the interrupt and let
> the GPU timeout reset the GPU to recover (which is equivalent while we
> still only use the one AS on the GPU).
> 
> Steve
> 
> > +			}
> > +		}
> > +
> >  		/* terminal fault, print info about the fault */
> >  		dev_err(pfdev->dev,
> >  			"Unhandled Page fault in AS%d at VA 0x%016llX\n"
> > @@ -391,8 +482,9 @@ int panfrost_mmu_init(struct panfrost_device *pfdev)
> >  	if (irq <= 0)
> >  		return -ENODEV;
> >  
> > -	err = devm_request_irq(pfdev->dev, irq, panfrost_mmu_irq_handler,
> > -			       IRQF_SHARED, "mmu", pfdev);
> > +	err = devm_request_threaded_irq(pfdev->dev, irq, NULL,
> > +					panfrost_mmu_irq_handler,
> > +					IRQF_ONESHOT, "mmu", pfdev);
> >  
> >  	if (err) {
> >  		dev_err(pfdev->dev, "failed to request mmu irq");
> > diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
> > index 17fb5d200f7a..9150dd75aad8 100644
> > --- a/include/uapi/drm/panfrost_drm.h
> > +++ b/include/uapi/drm/panfrost_drm.h
> > @@ -83,6 +83,7 @@ struct drm_panfrost_wait_bo {
> >  };
> >  
> >  #define PANFROST_BO_NOEXEC	1
> > +#define PANFROST_BO_HEAP	2
> >  
> >  /**
> >   * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
> >
> A fair bit of the complexity of kbase comes from trying to avoid the
> possibility of one process DoSing another by submitting malicious jobs.

...and yet it was still doable so easily (by accident, with buggy jobs
instead of malicious jobs).... sigh...

Still is on the mainline kernel (e.g. running dEQP in a window in
Weston, some faults triggered in dEQP end up messing up Weston's
rendering outside the deqp window). What's our threat model here?

Is "banning WebGL on Panfrost" allowed? :)
>  #define PANFROST_BO_NOEXEC	1
> +#define PANFROST_BO_HEAP	2

Bikeshedding, but I don't like this name. There are, I think, multiple
GPU-mapped buffers (at least in Panfrost -- I don't know how the blob
manages memory) that can be considered heaps of sorts. Some of those are
just regular old BOs.

What makes these special is that they're growable. Calling it "heap" is
okay inside the kernel, but for the UABI, I'd prefer an explicit
"PANFROST_BO_GROWABLE(_HEAP)" to make it obvious what's going on.
On Mon, Jul 22, 2019 at 8:15 AM Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
>
> >  #define PANFROST_BO_NOEXEC   1
> > +#define PANFROST_BO_HEAP     2
>
> Bikeshedding, but I don't like this name. There are, I think, multiple
> GPU-mapped buffers (at least in Panfrost -- I don't know how the blob
> manages memory) that can be considered heaps of sorts. Some of those are
> just regular old BOs.

Well, I had 'nomap' which reflected exactly what the kernel would do
to the BO and some folks didn't like that either. Granted, exactly
what is not mapped wasn't that clear.

It's really a question of give userspace exactly what it asks for or
tell the kernel how the BO is going to be used and it will decide the
details (which could change). It's similar to asking for a linear
buffer vs. scanout buffer.

> What makes these special is that they're growable. Calling it "heap" is
> okay inside the kernel, but for the UABI, I'd prefer an explicit
> "PANFROST_BO_GROWABLE(_HEAP)" to make it obvious what's going on.

IMO, 'growable' means the BO is one size and then grows to a new size.
But we're not resizing the BO, but rather just delaying the GPU
mapping and sparsely mapping it.

Rob
On Tue, 23 Jul 2019 at 09:14, Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
>
> > A fair bit of the complexity of kbase comes from trying to avoid the
> > possibility of one process DoSing another by submitting malicious jobs.
>
> ...and yet it was still doable so easily (by accident, with buggy jobs
> instead of malicious jobs).... sigh...
>
> Still is on the mainline kernel (e.g. running dEQP in a window in
> Weston, some faults triggered in dEQP end up messing up Weston's
> rendering outside the deqp window). What's our threat model here?
>
> Is "banning WebGL on Panfrost" allowed? :)

I think what Panfrost needs to do is to give browsers as much support
as possible to make WebGL secure. This includes making best use of the
HW to that end, and implementing any robustness-related extensions
that may be available.

Cheers,

Tomeu