drm/msm: Per-instance pagetable support

Submitted by Jordan Crouse on March 1, 2019, 7:38 p.m.

Details

Reviewer None
Submitted March 1, 2019, 7:38 p.m.
Last Updated May 29, 2019, 8:57 p.m.
Revision 3

Cover Letter(s)

Revision 1
      This is the latest incarnation of per-instance pagetable support for the MSM GPU
driver. Some of these have been seen before, most recently [1].

Per-instance pagetables allow the target GPU driver to create and manage
an individual pagetable for each file descriptor instance and switch
between them asynchronously using the GPU to reprogram the pagetable
registers on the fly.

This is accomplished in this series by taking advantage of the multiple
IOMMU domain API from Lu Baolu [2] and all these patches are based on that
patch. This series is split into three parts:

Part one adds support for split pagetables. These are the same patches from the
previous attempts [1]. Split pagetables allow the hardware to switch out the
lower pagetable (TTBR0) without affecting the global allocations in the upper
one (TTBR1).

Part 2 adds aux domain support for arm-smmu-v2. New aux domains create a new
pagetable but do not touch the underlying hardware.  The target driver uses the
new aux domain to map and unmap memory through the usual mechanisms.

The final part is the support in the GPU driver to enable 64 bit addressing for
a5xx and a6xx, set up the support for split pagetables, create new per-instance
pagetables for a new instance and submit the GPU command to switch the pagetable
at the appropriate time.

This is compile tested but I haven't done much target testing as of yet. I
wanted to get this out in the world for debate while we work on fixing up the
minor issues. In particular, I want to make sure that this fits with the
current thinking about how aux domains should look and feel.

[1] https://patchwork.freedesktop.org/series/43447/
[2] https://patchwork.kernel.org/patch/10825061/


Jordan Crouse (15):
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add split pagetable support for arm-smmu-v2
  iommu/io-pgtable: Allow TLB operations to be optional
  iommu: Add DOMAIN_ATTR_PTBASE
  iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
  drm/msm/adreno: Enable 64 bit mode by default on a5xx and a6xx targets
  drm/msm: Print all 64 bits of the faulting IOMMU address
  drm/msm: Pass the MMU domain index in struct msm_file_private
  drm/msm/gpu: Move address space setup to the GPU targets
  drm/msm: Add support for IOMMU auxiliary domains
  drm/msm: Add a helper function for a per-instance address space
  drm/msm: Add support to create target specific address spaces
  drm/msm/gpu: Add ttbr0 to the memptrs
  drm/msm/a6xx: Support per-instance pagetables
  drm/msm/a5xx: Support per-instance pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c     |  37 ++--
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c     |  50 ++++--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c     |  51 ++++--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 163 +++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  19 ++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  70 ++++++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 167 +++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h     |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |   7 -
 drivers/gpu/drm/msm/msm_drv.c             |  25 ++-
 drivers/gpu/drm/msm/msm_drv.h             |   5 +
 drivers/gpu/drm/msm/msm_gem.h             |   2 +
 drivers/gpu/drm/msm/msm_gem_submit.c      |  13 +-
 drivers/gpu/drm/msm/msm_gem_vma.c         |  53 +++---
 drivers/gpu/drm/msm/msm_gpu.c             |  59 +------
 drivers/gpu/drm/msm/msm_gpu.h             |   3 +
 drivers/gpu/drm/msm/msm_iommu.c           |  99 ++++++++++-
 drivers/gpu/drm/msm/msm_mmu.h             |   4 +
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   1 +
 drivers/iommu/arm-smmu-regs.h             |  18 ++
 drivers/iommu/arm-smmu.c                  | 278 ++++++++++++++++++++++++++----
 drivers/iommu/io-pgtable-arm.c            |   3 +-
 drivers/iommu/io-pgtable.h                |  10 +-
 include/linux/iommu.h                     |   2 +
 24 files changed, 952 insertions(+), 188 deletions(-)
    
Revision 2
      This is a refresh of the per-instance pagetable support for arm-smmu-v2 and the
MSM GPU driver. I think this is pretty mature at this point, so I've dropped the
RFC designation and ask for consideration for 5.3.

Per-instance pagetables allow the target GPU driver to create and manage
an individual pagetable for each file descriptor instance and switch
between them asynchronously using the GPU to reprogram the pagetable
registers on the fly.

Most of the heavy lifting for this is done in the arm-smmu-v2 driver by
taking advantage of the newly added multiple domain API. The first patch in the
series allows opted-in clients to create a default identity domain when the
IOMMU group for the SMMU device is created. This bypasses the DMA domain
creation in the IOMMU core which serves several purposes for the GPU by skipping
the otherwise  unused DMA domain and also keeping context bank 0 unused on the
hardware (for better or worse, the GPU is hardcoded to only use context bank 0
for switching).

The next two patches enable split pagetable support. This is used to map
global buffers for the GPU so we can safely switch the TTBR0 pagetable for the
instance.

The last two arm-smmu-v2 patches enable auxillary domain support. Again the
SMMU client can opt-in to allow auxiliary domains, and if enabled will create
a pagetable but not otherwise touch the hardware. The client can get the address
of the pagetable through an attribute to perform its own switching.

After the arm-smmu-v2 patches are more than several msm/gpu patches to allow
for target specific address spaces, enable 64 bit virtual addressing and
implement the mechanics of pagetable switching.

For the purposes of merging all the patches between

drm/msm/adreno: Enable 64 bit mode by default on a5xx and a6xx targets

and

drm/msm: Add support to create target specific address spaces

can be merged to the msm-next tree without dependencies on the IOMMU changes.
Only the last three patches will require coordination between the two areas.

Jordan Crouse (15):
  iommu/arm-smmu: Allow IOMMU enabled devices to skip DMA domains
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add split pagetable support for arm-smmu-v2
  iommu: Add DOMAIN_ATTR_PTBASE
  iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
  drm/msm/adreno: Enable 64 bit mode by default on a5xx and a6xx targets
  drm/msm: Print all 64 bits of the faulting IOMMU address
  drm/msm: Pass the MMU domain index in struct msm_file_private
  drm/msm/gpu: Move address space setup to the GPU targets
  drm/msm: Add a helper function for a per-instance address space
  drm/msm/gpu: Add ttbr0 to the memptrs
  drm/msm: Add support to create target specific address spaces
  drm/msm: Add support for IOMMU auxiliary domains
  drm/msm/a6xx: Support per-instance pagetables
  drm/msm/a5xx: Support per-instance pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c     |  37 +++-
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c     |  50 +++--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c     |  51 +++--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 163 +++++++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  19 ++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  70 ++++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 166 +++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h     |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |   7 -
 drivers/gpu/drm/msm/msm_drv.c             |  25 ++-
 drivers/gpu/drm/msm/msm_drv.h             |   5 +
 drivers/gpu/drm/msm/msm_gem.h             |   2 +
 drivers/gpu/drm/msm/msm_gem_submit.c      |  13 +-
 drivers/gpu/drm/msm/msm_gem_vma.c         |  53 +++--
 drivers/gpu/drm/msm/msm_gpu.c             |  59 +----
 drivers/gpu/drm/msm/msm_gpu.h             |   3 +
 drivers/gpu/drm/msm/msm_iommu.c           |  99 ++++++++-
 drivers/gpu/drm/msm/msm_mmu.h             |   4 +
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   1 +
 drivers/iommu/arm-smmu-regs.h             |  19 ++
 drivers/iommu/arm-smmu.c                  | 352 +++++++++++++++++++++++++++---
 drivers/iommu/io-pgtable-arm.c            |   3 +-
 drivers/iommu/iommu.c                     |  29 ++-
 include/linux/iommu.h                     |   5 +
 24 files changed, 1052 insertions(+), 184 deletions(-)
    
Revision 3
      This is v3 of the per-instance pagetable support. Biggest change in this
revision is moving nearly all of the split pagetable support into
io-pgtable-arm and setting up specific ops to handle the unique behavior
of the split pagetables. Now that I've spent some time with it, I like how
it turned out.

For background:

Per-instance pagetables allow the target GPU driver to create and manage
an individual pagetable for each file descriptor instance and switch
between them asynchronously using the GPU to reprogram the pagetable
registers on the fly.

Most of the heavy lifting for this is done in the arm-smmu-v2 driver by
taking advantage of the newly added multiple domain API. The first patch in the
series allows opted-in clients to direct map a device with
iommu_request_dm_for_dev(). This bypasses the DMA domain creation in the IOMMU
core which serves several purposes for the GPU by skipping the otherwise unused
DMA domain and also keeping context bank 0 unused on the hardware (for better or
worse, the GPU is hardcoded to only use context bank 0 for switching).

The next several patches enable split pagetable support. This is used to map
global buffers for the GPU so we can safely switch the TTBR0 pagetable for the
instance.

The last two arm-smmu-v2 patches enable auxillary domain support. Again the
SMMU client can opt-in to allow auxiliary domains, and if enabled will create
a pagetable but not otherwise touch the hardware. The client can get the address
of the pagetable through an attribute to perform its own switching.

After the arm-smmu-v2 patches are more than several msm/gpu patches to allow
for target specific address spaces, enable 64 bit virtual addressing and
implement the mechanics of pagetable switching.

For the purposes of merging all the patches between

drm/msm/adreno: Enable 64 bit mode by default on a5xx and a6xx targets

and

drm/msm: Add support to create target specific address spaces

can be merged to the msm-next tree without dependencies on the IOMMU changes.
Only the last three patches will require coordination between the two areas.

Jordan Crouse (16):
  iommu/arm-smmu: Allow client devices to select direct mapping
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/io-pgtable-arm: Add support for AARCH64 split pagetables
  iommu/arm-smmu: Add support for DOMAIN_ATTR_SPLIT_TABLES
  iommu: Add DOMAIN_ATTR_PTBASE
  iommu/arm-smmu: Add auxiliary domain support for arm-smmuv2
  drm/msm/adreno: Enable 64 bit mode by default on a5xx and a6xx targets
  drm/msm: Print all 64 bits of the faulting IOMMU address
  drm/msm: Pass the MMU domain index in struct msm_file_private
  drm/msm/gpu: Move address space setup to the GPU targets
  drm/msm: Add support for IOMMU auxiliary domains
  drm/msm: Add a helper function for a per-instance address space
  drm/msm: Add support to create target specific address spaces
  drm/msm/gpu: Add ttbr0 to the memptrs
  drm/msm/a6xx: Support per-instance pagetables
  drm/msm/a5xx: Support per-instance pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c     |  37 +++--
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c     |  50 ++++--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c     |  51 ++++--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 163 ++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  19 +++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  70 ++++++--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c     | 166 ++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h     |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |   7 -
 drivers/gpu/drm/msm/msm_drv.c             |  25 ++-
 drivers/gpu/drm/msm/msm_drv.h             |   5 +
 drivers/gpu/drm/msm/msm_gem.h             |   2 +
 drivers/gpu/drm/msm/msm_gem_submit.c      |  13 +-
 drivers/gpu/drm/msm/msm_gem_vma.c         |  53 +++---
 drivers/gpu/drm/msm/msm_gpu.c             |  59 +------
 drivers/gpu/drm/msm/msm_gpu.h             |   3 +
 drivers/gpu/drm/msm/msm_iommu.c           |  99 +++++++++++-
 drivers/gpu/drm/msm/msm_mmu.h             |   4 +
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   1 +
 drivers/iommu/arm-smmu.c                  | 176 ++++++++++++++++++--
 drivers/iommu/io-pgtable-arm.c            | 261 +++++++++++++++++++++++++++---
 drivers/iommu/io-pgtable.c                |   1 +
 include/linux/io-pgtable.h                |   2 +
 include/linux/iommu.h                     |   2 +
 24 files changed, 1082 insertions(+), 188 deletions(-)
    

Revisions