[v5,06/20] drm/dp_mst: Introduce new refcounting scheme for mstbs and ports

Submitted by Lyude Paul on Jan. 9, 2019, 12:35 a.m.

Details

Message ID 20190109003516.14752-7-lyude@redhat.com
State New
Series "MST refcounting/atomic helpers cleanup"
Headers show

Commit Message

Lyude Paul Jan. 9, 2019, 12:35 a.m.
The current way of handling refcounting in the DP MST helpers is really
confusing and probably just plain wrong because it's been hacked up many
times over the years without anyone actually going over the code and
seeing if things could be simplified.

To the best of my understanding, the current scheme works like this:
drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
this refcount hits 0 for either of the two, they're removed from the
topology state, but not immediately freed. Both ports and branch devices
will reinitialize their kref once it's hit 0 before actually destroying
themselves. The intended purpose behind this is so that we can avoid
problems like not being able to free a remote payload that might still
be active, due to us having removed all of the port/branch device
structures in memory, as per:

commit 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")

Which may have worked, but then it caused use-after-free errors. Being
new to MST at the time, I tried fixing it;

commit 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")

But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
are validated in almost every DP MST helper function. Simply put, this
means we go through the topology and try to see if the given
drm_dp_mst_branch or drm_dp_mst_port is still attached to something
before trying to use it in order to avoid dereferencing freed memory
(something that has happened a LOT in the past with this library).
Because of this it doesn't actually matter whether or not we keep keep
the ports and branches around in memory as that's not enough, because
any function that validates the branches and ports passed to it will
still reject them anyway since they're no longer in the topology
structure. So, use-after-free errors were fixed but payload deallocation
was completely broken.

Two years later, AMD informed me about this issue and I attempted to
come up with a temporary fix, pending a long-overdue cleanup of this
library:

commit c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")

But then that introduced use-after-free errors, so I quickly reverted
it:

commit 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during destruction, just ref"")

And in the process, learned that there is just no simple fix for this:
the design is just broken. Unfortunately, the usage of these helpers are
quite broken as well. Some drivers like i915 have been smart enough to
avoid accessing any kind of information from MST port structures, but
others like nouveau have assumed, understandably so, that
drm_dp_mst_port structures are normal and can just be accessed at any
time without worrying about use-after-free errors.

After a lot of discussion, me and Daniel Vetter came up with a better
idea to replace all of this.

To summarize, since this is documented far more indepth in the
documentation this patch introduces, we make it so that drm_dp_mst_port
and drm_dp_mst_branch structures have two different classes of
refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
given topology. Once it hits zero, any associated connectors are removed
and the branch or port can no longer be validated. malloc_kref
corresponds to the lifetime of the memory allocation for the actual
structure, and will always be non-zero so long as the topology_kref is
non-zero. This gives us a way to allow callers to hold onto port and
branch device structures past their topology lifetime, and dramatically
simplifies the lifetimes of both structures. This also finally fixes the
port deallocation problem, properly.

Additionally: since this now means that we can keep ports and branch
devices allocated in memory for however long we need, we no longer need
a significant amount of the port validation that we currently do.

Additionally, there is one last scenario that this fixes, which couldn't
have been fixed properly beforehand:

- CPU1 unrefs port from topology (refcount 1->0)
- CPU2 refs port in topology(refcount 0->1)

Since we now can guarantee memory safety for ports and branches
as-needed, we also can make our main reference counting functions fix
this problem by using kref_get_unless_zero() internally so that topology
refcounts can only ever reach 0 once.

Changes since v3:
* Remove rebase detritus - danvet
* Split out purely style changes into separate patches - hwentlan

Changes since v2:
* Fix commit message - checkpatch
* s/)-1/) - 1/g - checkpatch

Changes since v1:
* Remove forward declarations - danvet
* Move "Branch device and port refcounting" section from documentation
  into kernel-doc comments - danvet
* Export internal topology lifetime functions into their own section in
  the kernel-docs - danvet
* s/@/&/g for struct references in kernel-docs - danvet
* Drop the "when they are no longer being used" bits from the kernel
  docs - danvet
* Modify diagrams to show how the DRM driver interacts with the topology
  and payloads - danvet
* Make suggested documentation changes for
  drm_dp_mst_topology_get_mstb() and drm_dp_mst_topology_get_port() -
  danvet
* Better explain the relationship between malloc refs and topology krefs
  in the documentation for drm_dp_mst_topology_get_port() and
  drm_dp_mst_topology_get_mstb() - danvet
* Fix "See also" in drm_dp_mst_topology_get_mstb() - danvet
* Rename drm_dp_mst_topology_get_(port|mstb)() ->
  drm_dp_mst_topology_try_get_(port|mstb)() and
  drm_dp_mst_topology_ref_(port|mstb)() ->
  drm_dp_mst_topology_get_(port|mstb)() - danvet
* s/should/must in docs - danvet
* WARN_ON(refcount == 0) in topology_get_(mstb|port) - danvet
* Move kdocs for mstb/port structs inline - danvet
* Split drm_dp_get_last_connected_port_and_mstb() changes into their own
  commit - danvet

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@redhat.com>
Cc: Jerry Zuo <Jerry.Zuo@amd.com>
Cc: Juston Li <juston.li@intel.com>
---
 .../gpu/dp-mst/topology-figure-1.dot          |  52 ++
 .../gpu/dp-mst/topology-figure-2.dot          |  56 +++
 .../gpu/dp-mst/topology-figure-3.dot          |  59 +++
 Documentation/gpu/drm-kms-helpers.rst         |  26 +-
 drivers/gpu/drm/drm_dp_mst_topology.c         | 457 +++++++++++++++---
 include/drm/drm_dp_mst_helper.h               |  32 +-
 6 files changed, 613 insertions(+), 69 deletions(-)
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
 create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot

Patch hide | download patch | download mbox

diff --git a/Documentation/gpu/dp-mst/topology-figure-1.dot b/Documentation/gpu/dp-mst/topology-figure-1.dot
new file mode 100644
index 000000000000..157e17c7e0b0
--- /dev/null
+++ b/Documentation/gpu/dp-mst/topology-figure-1.dot
@@ -0,0 +1,52 @@ 
+digraph T {
+    /* Make sure our payloads are always drawn below the driver node */
+    subgraph cluster_driver {
+        fillcolor = grey;
+        style = filled;
+        driver -> {payload1, payload2} [dir=none];
+    }
+
+    /* Driver malloc references */
+    edge [style=dashed];
+    driver -> port1;
+    driver -> port2;
+    driver -> port3:e;
+    driver -> port4;
+
+    payload1:s -> port1:e;
+    payload2:s -> port3:e;
+    edge [style=""];
+
+    subgraph cluster_topology {
+        label="Topology Manager";
+        labelloc=bottom;
+
+        /* Topology references */
+        mstb1 -> {port1, port2};
+        port1 -> mstb2;
+        port2 -> mstb3 -> {port3, port4};
+        port3 -> mstb4;
+
+        /* Malloc references */
+        edge [style=dashed;dir=back];
+        mstb1 -> {port1, port2};
+        port1 -> mstb2;
+        port2 -> mstb3 -> {port3, port4};
+        port3 -> mstb4;
+    }
+
+    driver [label="DRM driver";style=filled;shape=box;fillcolor=lightblue];
+
+    payload1 [label="Payload #1";style=filled;shape=box;fillcolor=lightblue];
+    payload2 [label="Payload #2";style=filled;shape=box;fillcolor=lightblue];
+
+    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen;shape=oval];
+    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen;shape=oval];
+    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen;shape=oval];
+    mstb4 [label="MSTB #4";style=filled;fillcolor=palegreen;shape=oval];
+
+    port1 [label="Port #1";shape=oval];
+    port2 [label="Port #2";shape=oval];
+    port3 [label="Port #3";shape=oval];
+    port4 [label="Port #4";shape=oval];
+}
diff --git a/Documentation/gpu/dp-mst/topology-figure-2.dot b/Documentation/gpu/dp-mst/topology-figure-2.dot
new file mode 100644
index 000000000000..4243dd1737cb
--- /dev/null
+++ b/Documentation/gpu/dp-mst/topology-figure-2.dot
@@ -0,0 +1,56 @@ 
+digraph T {
+    /* Make sure our payloads are always drawn below the driver node */
+    subgraph cluster_driver {
+        fillcolor = grey;
+        style = filled;
+        driver -> {payload1, payload2} [dir=none];
+    }
+
+    /* Driver malloc references */
+    edge [style=dashed];
+    driver -> port1;
+    driver -> port2;
+    driver -> port3:e;
+    driver -> port4 [color=red];
+
+    payload1:s -> port1:e;
+    payload2:s -> port3:e;
+    edge [style=""];
+
+    subgraph cluster_topology {
+        label="Topology Manager";
+        labelloc=bottom;
+
+        /* Topology references */
+        mstb1 -> {port1, port2};
+        port1 -> mstb2;
+        edge [color=red];
+        port2 -> mstb3 -> {port3, port4};
+        port3 -> mstb4;
+        edge [color=""];
+
+        /* Malloc references */
+        edge [style=dashed;dir=back];
+        mstb1 -> {port1, port2};
+        port1 -> mstb2;
+        port2 -> mstb3 -> port3;
+        edge [color=red];
+        mstb3 -> port4;
+        port3 -> mstb4;
+    }
+
+    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
+    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
+    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
+    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
+
+    port1 [label="Port #1"];
+    port2 [label="Port #2"];
+    port3 [label="Port #3"];
+    port4 [label="Port #4";style=filled;fillcolor=grey];
+
+    driver [label="DRM driver";style=filled;shape=box;fillcolor=lightblue];
+
+    payload1 [label="Payload #1";style=filled;shape=box;fillcolor=lightblue];
+    payload2 [label="Payload #2";style=filled;shape=box;fillcolor=lightblue];
+}
diff --git a/Documentation/gpu/dp-mst/topology-figure-3.dot b/Documentation/gpu/dp-mst/topology-figure-3.dot
new file mode 100644
index 000000000000..6cd78d06778b
--- /dev/null
+++ b/Documentation/gpu/dp-mst/topology-figure-3.dot
@@ -0,0 +1,59 @@ 
+digraph T {
+    /* Make sure our payloads are always drawn below the driver node */
+    subgraph cluster_driver {
+        fillcolor = grey;
+        style = filled;
+        edge [dir=none];
+        driver -> payload1;
+        driver -> payload2 [penwidth=3];
+        edge [dir=""];
+    }
+
+    /* Driver malloc references */
+    edge [style=dashed];
+    driver -> port1;
+    driver -> port2;
+    driver -> port3:e;
+    driver -> port4 [color=grey];
+    payload1:s -> port1:e;
+    payload2:s -> port3:e [penwidth=3];
+    edge [style=""];
+
+    subgraph cluster_topology {
+        label="Topology Manager";
+        labelloc=bottom;
+
+        /* Topology references */
+        mstb1 -> {port1, port2};
+        port1 -> mstb2;
+        edge [color=grey];
+        port2 -> mstb3 -> {port3, port4};
+        port3 -> mstb4;
+        edge [color=""];
+
+        /* Malloc references */
+        edge [style=dashed;dir=back];
+        mstb1 -> {port1, port2};
+        port1 -> mstb2;
+        port2 -> mstb3 [penwidth=3];
+        mstb3 -> port3 [penwidth=3];
+        edge [color=grey];
+        mstb3 -> port4;
+        port3 -> mstb4;
+    }
+
+    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
+    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
+    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen;penwidth=3];
+    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
+
+    port1 [label="Port #1"];
+    port2 [label="Port #2";penwidth=5];
+    port3 [label="Port #3";penwidth=3];
+    port4 [label="Port #4";style=filled;fillcolor=grey];
+
+    driver [label="DRM driver";style=filled;shape=box;fillcolor=lightblue];
+
+    payload1 [label="Payload #1";style=filled;shape=box;fillcolor=lightblue];
+    payload2 [label="Payload #2";style=filled;shape=box;fillcolor=lightblue;penwidth=3];
+}
diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst
index b422eb8edf16..7a3fc569bc68 100644
--- a/Documentation/gpu/drm-kms-helpers.rst
+++ b/Documentation/gpu/drm-kms-helpers.rst
@@ -208,18 +208,40 @@  Display Port Dual Mode Adaptor Helper Functions Reference
 .. kernel-doc:: drivers/gpu/drm/drm_dp_dual_mode_helper.c
    :export:
 
-Display Port MST Helper Functions Reference
-===========================================
+Display Port MST Helpers
+========================
+
+Overview
+--------
 
 .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
    :doc: dp mst helper
 
+.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
+   :doc: Branch device and port refcounting
+
+Functions Reference
+-------------------
+
 .. kernel-doc:: include/drm/drm_dp_mst_helper.h
    :internal:
 
 .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
    :export:
 
+Topology Lifetime Internals
+---------------------------
+
+These functions aren't exported to drivers, but are documented here to help make
+the MST topology helpers easier to understand
+
+.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
+   :functions: drm_dp_mst_topology_try_get_mstb drm_dp_mst_topology_get_mstb
+               drm_dp_mst_topology_put_mstb
+               drm_dp_mst_topology_try_get_port drm_dp_mst_topology_get_port
+               drm_dp_mst_topology_put_port
+               drm_dp_mst_get_mstb_malloc drm_dp_mst_put_mstb_malloc
+
 MIPI DSI Helper Functions Reference
 ===================================
 
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
index 074e985093ca..c53cf7eb1dbc 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -850,46 +850,211 @@  static struct drm_dp_mst_branch *drm_dp_add_mst_branch_device(u8 lct, u8 *rad)
 	if (lct > 1)
 		memcpy(mstb->rad, rad, lct / 2);
 	INIT_LIST_HEAD(&mstb->ports);
-	kref_init(&mstb->kref);
+	kref_init(&mstb->topology_kref);
+	kref_init(&mstb->malloc_kref);
 	return mstb;
 }
 
-static void drm_dp_free_mst_port(struct kref *kref);
-
 static void drm_dp_free_mst_branch_device(struct kref *kref)
 {
-	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
-	if (mstb->port_parent) {
-		if (list_empty(&mstb->port_parent->next))
-			kref_put(&mstb->port_parent->kref, drm_dp_free_mst_port);
-	}
+	struct drm_dp_mst_branch *mstb =
+		container_of(kref, struct drm_dp_mst_branch, malloc_kref);
+
+	if (mstb->port_parent)
+		drm_dp_mst_put_port_malloc(mstb->port_parent);
+
 	kfree(mstb);
 }
 
+/**
+ * DOC: Branch device and port refcounting
+ *
+ * Topology refcount overview
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * The refcounting schemes for &struct drm_dp_mst_branch and &struct
+ * drm_dp_mst_port are somewhat unusual. Both ports and branch devices have
+ * two different kinds of refcounts: topology refcounts, and malloc refcounts.
+ *
+ * Topology refcounts are not exposed to drivers, and are handled internally
+ * by the DP MST helpers. The helpers use them in order to prevent the
+ * in-memory topology state from being changed in the middle of critical
+ * operations like changing the internal state of payload allocations. This
+ * means each branch and port will be considered to be connected to the rest
+ * of the topology until it's topology refcount reaches zero. Additionally,
+ * for ports this means that their associated &struct drm_connector will stay
+ * registered with userspace until the port's refcount reaches 0.
+ *
+ * Malloc refcount overview
+ * ~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * Malloc references are used to keep a &struct drm_dp_mst_port or &struct
+ * drm_dp_mst_branch allocated even after all of its topology references have
+ * been dropped, so that the driver or MST helpers can safely access each
+ * branch's last known state before it was disconnected from the topology.
+ * When the malloc refcount of a port or branch reaches 0, the memory
+ * allocation containing the &struct drm_dp_mst_branch or &struct
+ * drm_dp_mst_port respectively will be freed.
+ *
+ * For &struct drm_dp_mst_branch, malloc refcounts are not currently exposed
+ * to drivers. As of writing this documentation, there are no drivers that
+ * have a usecase for accessing &struct drm_dp_mst_branch outside of the MST
+ * helpers. Exposing this API to drivers in a race-free manner would take more
+ * tweaking of the refcounting scheme, however patches are welcome provided
+ * there is a legitimate driver usecase for this.
+ *
+ * Refcount relationships in a topology
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * Let's take a look at why the relationship between topology and malloc
+ * refcounts is designed the way it is.
+ *
+ * .. kernel-figure:: dp-mst/topology-figure-1.dot
+ *
+ *    An example of topology and malloc refs in a DP MST topology with two
+ *    active payloads. Topology refcount increments are indicated by solid
+ *    lines, and malloc refcount increments are indicated by dashed lines.
+ *    Each starts from the branch which incremented the refcount, and ends at
+ *    the branch to which the refcount belongs to.
+ *
+ * As you can see in figure 1, every branch increments the topology refcount
+ * of it's children, and increments the malloc refcount of it's parent.
+ * Additionally, every payload increments the malloc refcount of it's assigned
+ * port by 1.
+ *
+ * So, what would happen if MSTB #3 from the above figure was unplugged from
+ * the system, but the driver hadn't yet removed payload #2 from port #3? The
+ * topology would start to look like figure 2.
+ *
+ * .. kernel-figure:: dp-mst/topology-figure-2.dot
+ *
+ *    Ports and branch devices which have been released from memory are
+ *    colored grey, and references which have been removed are colored red.
+ *
+ * Whenever a port or branch device's topology refcount reaches zero, it will
+ * decrement the topology refcounts of all its children, the malloc refcount
+ * of its parent, and finally its own malloc refcount. For MSTB #4 and port
+ * #4, this means they both have been disconnected from the topology and freed
+ * from memory. But, because payload #2 is still holding a reference to port
+ * #3, port #3 is removed from the topology but it's &struct drm_dp_mst_port
+ * is still accessible from memory. This also means port #3 has not yet
+ * decremented the malloc refcount of MSTB #3, so it's &struct
+ * drm_dp_mst_branch will also stay allocated in memory until port #3's
+ * malloc refcount reaches 0.
+ *
+ * This relationship is necessary because in order to release payload #2, we
+ * need to be able to figure out the last relative of port #3 that's still
+ * connected to the topology. In this case, we would travel up the topology as
+ * shown in figure 3.
+ *
+ * .. kernel-figure:: dp-mst/topology-figure-3.dot
+ *
+ * And finally, remove payload #2 by communicating with port #2 through
+ * sideband transactions.
+ */
+
+/**
+ * drm_dp_mst_get_mstb_malloc() - Increment the malloc refcount of a branch
+ * device
+ * @mstb: The &struct drm_dp_mst_branch to increment the malloc refcount of
+ *
+ * Increments &drm_dp_mst_branch.malloc_kref. When
+ * &drm_dp_mst_branch.malloc_kref reaches 0, the memory allocation for @mstb
+ * will be released and @mstb may no longer be used.
+ *
+ * See also: drm_dp_mst_put_mstb_malloc()
+ */
+static void
+drm_dp_mst_get_mstb_malloc(struct drm_dp_mst_branch *mstb)
+{
+	kref_get(&mstb->malloc_kref);
+	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref));
+}
+
+/**
+ * drm_dp_mst_put_mstb_malloc() - Decrement the malloc refcount of a branch
+ * device
+ * @mstb: The &struct drm_dp_mst_branch to decrement the malloc refcount of
+ *
+ * Decrements &drm_dp_mst_branch.malloc_kref. When
+ * &drm_dp_mst_branch.malloc_kref reaches 0, the memory allocation for @mstb
+ * will be released and @mstb may no longer be used.
+ *
+ * See also: drm_dp_mst_get_mstb_malloc()
+ */
+static void
+drm_dp_mst_put_mstb_malloc(struct drm_dp_mst_branch *mstb)
+{
+	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref) - 1);
+	kref_put(&mstb->malloc_kref, drm_dp_free_mst_branch_device);
+}
+
+static void drm_dp_free_mst_port(struct kref *kref)
+{
+	struct drm_dp_mst_port *port =
+		container_of(kref, struct drm_dp_mst_port, malloc_kref);
+
+	drm_dp_mst_put_mstb_malloc(port->parent);
+	kfree(port);
+}
+
+/**
+ * drm_dp_mst_get_port_malloc() - Increment the malloc refcount of an MST port
+ * @port: The &struct drm_dp_mst_port to increment the malloc refcount of
+ *
+ * Increments &drm_dp_mst_port.malloc_kref. When &drm_dp_mst_port.malloc_kref
+ * reaches 0, the memory allocation for @port will be released and @port may
+ * no longer be used.
+ *
+ * Because @port could potentially be freed at any time by the DP MST helpers
+ * if &drm_dp_mst_port.malloc_kref reaches 0, including during a call to this
+ * function, drivers that which to make use of &struct drm_dp_mst_port should
+ * ensure that they grab at least one main malloc reference to their MST ports
+ * in &drm_dp_mst_topology_cbs.add_connector. This callback is called before
+ * there is any chance for &drm_dp_mst_port.malloc_kref to reach 0.
+ *
+ * See also: drm_dp_mst_put_port_malloc()
+ */
+void
+drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port)
+{
+	kref_get(&port->malloc_kref);
+	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref));
+}
+EXPORT_SYMBOL(drm_dp_mst_get_port_malloc);
+
+/**
+ * drm_dp_mst_put_port_malloc() - Decrement the malloc refcount of an MST port
+ * @port: The &struct drm_dp_mst_port to decrement the malloc refcount of
+ *
+ * Decrements &drm_dp_mst_port.malloc_kref. When &drm_dp_mst_port.malloc_kref
+ * reaches 0, the memory allocation for @port will be released and @port may
+ * no longer be used.
+ *
+ * See also: drm_dp_mst_get_port_malloc()
+ */
+void
+drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port)
+{
+	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref) - 1);
+	kref_put(&port->malloc_kref, drm_dp_free_mst_port);
+}
+EXPORT_SYMBOL(drm_dp_mst_put_port_malloc);
+
 static void drm_dp_destroy_mst_branch_device(struct kref *kref)
 {
-	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
+	struct drm_dp_mst_branch *mstb =
+		container_of(kref, struct drm_dp_mst_branch, topology_kref);
+	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
 	struct drm_dp_mst_port *port, *tmp;
 	bool wake_tx = false;
 
-	/*
-	 * init kref again to be used by ports to remove mst branch when it is
-	 * not needed anymore
-	 */
-	kref_init(kref);
-
-	if (mstb->port_parent && list_empty(&mstb->port_parent->next))
-		kref_get(&mstb->port_parent->kref);
-
-	/*
-	 * destroy all ports - don't need lock
-	 * as there are no more references to the mst branch
-	 * device at this point.
-	 */
+	mutex_lock(&mgr->lock);
 	list_for_each_entry_safe(port, tmp, &mstb->ports, next) {
 		list_del(&port->next);
 		drm_dp_mst_topology_put_port(port);
 	}
+	mutex_unlock(&mgr->lock);
 
 	/* drop any tx slots msg */
 	mutex_lock(&mstb->mgr->qlock);
@@ -908,14 +1073,83 @@  static void drm_dp_destroy_mst_branch_device(struct kref *kref)
 	if (wake_tx)
 		wake_up_all(&mstb->mgr->tx_waitq);
 
-	kref_put(kref, drm_dp_free_mst_branch_device);
+	drm_dp_mst_put_mstb_malloc(mstb);
 }
 
-static void drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
+/**
+ * drm_dp_mst_topology_try_get_mstb() - Increment the topology refcount of a
+ * branch device unless its zero
+ * @mstb: &struct drm_dp_mst_branch to increment the topology refcount of
+ *
+ * Attempts to grab a topology reference to @mstb, if it hasn't yet been
+ * removed from the topology (e.g. &drm_dp_mst_branch.topology_kref has
+ * reached 0). Holding a topology reference implies that a malloc reference
+ * will be held to @mstb as long as the user holds the topology reference.
+ *
+ * Care should be taken to ensure that the user has at least one malloc
+ * reference to @mstb. If you already have a topology reference to @mstb, you
+ * should use drm_dp_mst_topology_get_mstb() instead.
+ *
+ * See also:
+ * drm_dp_mst_topology_get_mstb()
+ * drm_dp_mst_topology_put_mstb()
+ *
+ * Returns:
+ * * 1: A topology reference was grabbed successfully
+ * * 0: @port is no longer in the topology, no reference was grabbed
+ */
+static int __must_check
+drm_dp_mst_topology_try_get_mstb(struct drm_dp_mst_branch *mstb)
 {
-	kref_put(&mstb->kref, drm_dp_destroy_mst_branch_device);
+	int ret = kref_get_unless_zero(&mstb->topology_kref);
+
+	if (ret)
+		DRM_DEBUG("mstb %p (%d)\n", mstb,
+			  kref_read(&mstb->topology_kref));
+
+	return ret;
 }
 
+/**
+ * drm_dp_mst_topology_get_mstb() - Increment the topology refcount of a
+ * branch device
+ * @mstb: The &struct drm_dp_mst_branch to increment the topology refcount of
+ *
+ * Increments &drm_dp_mst_branch.topology_refcount without checking whether or
+ * not it's already reached 0. This is only valid to use in scenarios where
+ * you are already guaranteed to have at least one active topology reference
+ * to @mstb. Otherwise, drm_dp_mst_topology_try_get_mstb() must be used.
+ *
+ * See also:
+ * drm_dp_mst_topology_try_get_mstb()
+ * drm_dp_mst_topology_put_mstb()
+ */
+static void drm_dp_mst_topology_get_mstb(struct drm_dp_mst_branch *mstb)
+{
+	WARN_ON(kref_read(&mstb->topology_kref) == 0);
+	kref_get(&mstb->topology_kref);
+	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref));
+}
+
+/**
+ * drm_dp_mst_topology_put_mstb() - release a topology reference to a branch
+ * device
+ * @mstb: The &struct drm_dp_mst_branch to release the topology reference from
+ *
+ * Releases a topology reference from @mstb by decrementing
+ * &drm_dp_mst_branch.topology_kref.
+ *
+ * See also:
+ * drm_dp_mst_topology_try_get_mstb()
+ * drm_dp_mst_topology_get_mstb()
+ */
+static void
+drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
+{
+	DRM_DEBUG("mstb %p (%d)\n",
+		  mstb, kref_read(&mstb->topology_kref) - 1);
+	kref_put(&mstb->topology_kref, drm_dp_destroy_mst_branch_device);
+}
 
 static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
 {
@@ -937,7 +1171,8 @@  static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
 
 static void drm_dp_destroy_port(struct kref *kref)
 {
-	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
+	struct drm_dp_mst_port *port =
+		container_of(kref, struct drm_dp_mst_port, topology_kref);
 	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
 
 	if (!port->input) {
@@ -956,7 +1191,6 @@  static void drm_dp_destroy_port(struct kref *kref)
 			 * from an EDID retrieval */
 
 			mutex_lock(&mgr->destroy_connector_lock);
-			kref_get(&port->parent->kref);
 			list_add(&port->next, &mgr->destroy_connector_list);
 			mutex_unlock(&mgr->destroy_connector_lock);
 			schedule_work(&mgr->destroy_connector_work);
@@ -967,12 +1201,79 @@  static void drm_dp_destroy_port(struct kref *kref)
 		drm_dp_port_teardown_pdt(port, port->pdt);
 		port->pdt = DP_PEER_DEVICE_NONE;
 	}
-	kfree(port);
+	drm_dp_mst_put_port_malloc(port);
+}
+
+/**
+ * drm_dp_mst_topology_try_get_port() - Increment the topology refcount of a
+ * port unless its zero
+ * @port: &struct drm_dp_mst_port to increment the topology refcount of
+ *
+ * Attempts to grab a topology reference to @port, if it hasn't yet been
+ * removed from the topology (e.g. &drm_dp_mst_port.topology_kref has reached
+ * 0). Holding a topology reference implies that a malloc reference will be
+ * held to @port as long as the user holds the topology reference.
+ *
+ * Care should be taken to ensure that the user has at least one malloc
+ * reference to @port. If you already have a topology reference to @port, you
+ * should use drm_dp_mst_topology_get_port() instead.
+ *
+ * See also:
+ * drm_dp_mst_topology_get_port()
+ * drm_dp_mst_topology_put_port()
+ *
+ * Returns:
+ * * 1: A topology reference was grabbed successfully
+ * * 0: @port is no longer in the topology, no reference was grabbed
+ */
+static int __must_check
+drm_dp_mst_topology_try_get_port(struct drm_dp_mst_port *port)
+{
+	int ret = kref_get_unless_zero(&port->topology_kref);
+
+	if (ret)
+		DRM_DEBUG("port %p (%d)\n", port,
+			  kref_read(&port->topology_kref));
+
+	return ret;
 }
 
+/**
+ * drm_dp_mst_topology_get_port() - Increment the topology refcount of a port
+ * @port: The &struct drm_dp_mst_port to increment the topology refcount of
+ *
+ * Increments &drm_dp_mst_port.topology_refcount without checking whether or
+ * not it's already reached 0. This is only valid to use in scenarios where
+ * you are already guaranteed to have at least one active topology reference
+ * to @port. Otherwise, drm_dp_mst_topology_try_get_port() must be used.
+ *
+ * See also:
+ * drm_dp_mst_topology_try_get_port()
+ * drm_dp_mst_topology_put_port()
+ */
+static void drm_dp_mst_topology_get_port(struct drm_dp_mst_port *port)
+{
+	WARN_ON(kref_read(&port->topology_kref) == 0);
+	kref_get(&port->topology_kref);
+	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref));
+}
+
+/**
+ * drm_dp_mst_topology_put_port() - release a topology reference to a port
+ * @port: The &struct drm_dp_mst_port to release the topology reference from
+ *
+ * Releases a topology reference from @port by decrementing
+ * &drm_dp_mst_port.topology_kref.
+ *
+ * See also:
+ * drm_dp_mst_topology_try_get_port()
+ * drm_dp_mst_topology_get_port()
+ */
 static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
 {
-	kref_put(&port->kref, drm_dp_destroy_port);
+	DRM_DEBUG("port %p (%d)\n",
+		  port, kref_read(&port->topology_kref) - 1);
+	kref_put(&port->topology_kref, drm_dp_destroy_port);
 }
 
 static struct drm_dp_mst_branch *
@@ -981,10 +1282,10 @@  drm_dp_mst_topology_get_mstb_validated_locked(struct drm_dp_mst_branch *mstb,
 {
 	struct drm_dp_mst_port *port;
 	struct drm_dp_mst_branch *rmstb;
-	if (to_find == mstb) {
-		kref_get(&mstb->kref);
+
+	if (to_find == mstb)
 		return mstb;
-	}
+
 	list_for_each_entry(port, &mstb->ports, next) {
 		if (port->mstb) {
 			rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
@@ -1001,25 +1302,32 @@  drm_dp_mst_topology_get_mstb_validated(struct drm_dp_mst_topology_mgr *mgr,
 				       struct drm_dp_mst_branch *mstb)
 {
 	struct drm_dp_mst_branch *rmstb = NULL;
+
 	mutex_lock(&mgr->lock);
-	if (mgr->mst_primary)
+	if (mgr->mst_primary) {
 		rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
 		    mgr->mst_primary, mstb);
+
+		if (rmstb && !drm_dp_mst_topology_try_get_mstb(rmstb))
+			rmstb = NULL;
+	}
 	mutex_unlock(&mgr->lock);
 	return rmstb;
 }
 
-static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_branch *mstb, struct drm_dp_mst_port *to_find)
+static struct drm_dp_mst_port *
+drm_dp_mst_topology_get_port_validated_locked(struct drm_dp_mst_branch *mstb,
+					      struct drm_dp_mst_port *to_find)
 {
 	struct drm_dp_mst_port *port, *mport;
 
 	list_for_each_entry(port, &mstb->ports, next) {
-		if (port == to_find) {
-			kref_get(&port->kref);
+		if (port == to_find)
 			return port;
-		}
+
 		if (port->mstb) {
-			mport = drm_dp_mst_get_port_ref_locked(port->mstb, to_find);
+			mport = drm_dp_mst_topology_get_port_validated_locked(
+			    port->mstb, to_find);
 			if (mport)
 				return mport;
 		}
@@ -1032,9 +1340,15 @@  drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr *mgr,
 				       struct drm_dp_mst_port *port)
 {
 	struct drm_dp_mst_port *rport = NULL;
+
 	mutex_lock(&mgr->lock);
-	if (mgr->mst_primary)
-		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
+	if (mgr->mst_primary) {
+		rport = drm_dp_mst_topology_get_port_validated_locked(
+		    mgr->mst_primary, port);
+
+		if (rport && !drm_dp_mst_topology_try_get_port(rport))
+			rport = NULL;
+	}
 	mutex_unlock(&mgr->lock);
 	return rport;
 }
@@ -1042,11 +1356,12 @@  drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr *mgr,
 static struct drm_dp_mst_port *drm_dp_get_port(struct drm_dp_mst_branch *mstb, u8 port_num)
 {
 	struct drm_dp_mst_port *port;
+	int ret;
 
 	list_for_each_entry(port, &mstb->ports, next) {
 		if (port->port_num == port_num) {
-			kref_get(&port->kref);
-			return port;
+			ret = drm_dp_mst_topology_try_get_port(port);
+			return ret ? port : NULL;
 		}
 	}
 
@@ -1095,6 +1410,11 @@  static bool drm_dp_port_setup_pdt(struct drm_dp_mst_port *port)
 		if (port->mstb) {
 			port->mstb->mgr = port->mgr;
 			port->mstb->port_parent = port;
+			/*
+			 * Make sure this port's memory allocation stays
+			 * around until it's child MSTB releases it
+			 */
+			drm_dp_mst_get_port_malloc(port);
 
 			send_link = true;
 		}
@@ -1155,17 +1475,26 @@  static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
 	bool created = false;
 	int old_pdt = 0;
 	int old_ddps = 0;
+
 	port = drm_dp_get_port(mstb, port_msg->port_number);
 	if (!port) {
 		port = kzalloc(sizeof(*port), GFP_KERNEL);
 		if (!port)
 			return;
-		kref_init(&port->kref);
+		kref_init(&port->topology_kref);
+		kref_init(&port->malloc_kref);
 		port->parent = mstb;
 		port->port_num = port_msg->port_number;
 		port->mgr = mstb->mgr;
 		port->aux.name = "DPMST";
 		port->aux.dev = dev->dev;
+
+		/*
+		 * Make sure the memory allocation for our parent branch stays
+		 * around until our own memory allocation is released
+		 */
+		drm_dp_mst_get_mstb_malloc(mstb);
+
 		created = true;
 	} else {
 		old_pdt = port->pdt;
@@ -1185,7 +1514,7 @@  static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
 	   for this list */
 	if (created) {
 		mutex_lock(&mstb->mgr->lock);
-		kref_get(&port->kref);
+		drm_dp_mst_topology_get_port(port);
 		list_add(&port->next, &mstb->ports);
 		mutex_unlock(&mstb->mgr->lock);
 	}
@@ -1284,7 +1613,7 @@  static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
 {
 	struct drm_dp_mst_branch *mstb;
 	struct drm_dp_mst_port *port;
-	int i;
+	int i, ret;
 	/* find the port by iterating down */
 
 	mutex_lock(&mgr->lock);
@@ -1309,7 +1638,9 @@  static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
 			}
 		}
 	}
-	kref_get(&mstb->kref);
+	ret = drm_dp_mst_topology_try_get_mstb(mstb);
+	if (!ret)
+		mstb = NULL;
 out:
 	mutex_unlock(&mgr->lock);
 	return mstb;
@@ -1344,14 +1675,17 @@  drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr *mgr,
 				     uint8_t *guid)
 {
 	struct drm_dp_mst_branch *mstb;
+	int ret;
 
 	/* find the port by iterating down */
 	mutex_lock(&mgr->lock);
 
 	mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
-
-	if (mstb)
-		kref_get(&mstb->kref);
+	if (mstb) {
+		ret = drm_dp_mst_topology_try_get_mstb(mstb);
+		if (!ret)
+			mstb = NULL;
+	}
 
 	mutex_unlock(&mgr->lock);
 	return mstb;
@@ -1390,11 +1724,14 @@  static void drm_dp_mst_link_probe_work(struct work_struct *work)
 {
 	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, work);
 	struct drm_dp_mst_branch *mstb;
+	int ret;
 
 	mutex_lock(&mgr->lock);
 	mstb = mgr->mst_primary;
 	if (mstb) {
-		kref_get(&mstb->kref);
+		ret = drm_dp_mst_topology_try_get_mstb(mstb);
+		if (!ret)
+			mstb = NULL;
 	}
 	mutex_unlock(&mgr->lock);
 	if (mstb) {
@@ -1722,8 +2059,10 @@  static struct drm_dp_mst_branch *drm_dp_get_last_connected_port_and_mstb(struct
 
 		if (found_port) {
 			rmstb = found_port->parent;
-			kref_get(&rmstb->kref);
-			*port_num = found_port->port_num;
+			if (drm_dp_mst_topology_try_get_mstb(rmstb))
+				*port_num = found_port->port_num;
+			else
+				rmstb = NULL;
 		}
 	}
 	mutex_unlock(&mgr->lock);
@@ -2176,7 +2515,7 @@  int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
 
 		/* give this the main reference */
 		mgr->mst_primary = mstb;
-		kref_get(&mgr->mst_primary->kref);
+		drm_dp_mst_topology_get_mstb(mgr->mst_primary);
 
 		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
 							 DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
@@ -3096,13 +3435,6 @@  static void drm_dp_tx_work(struct work_struct *work)
 	mutex_unlock(&mgr->qlock);
 }
 
-static void drm_dp_free_mst_port(struct kref *kref)
-{
-	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
-	kref_put(&port->parent->kref, drm_dp_free_mst_branch_device);
-	kfree(port);
-}
-
 static void drm_dp_destroy_connector_work(struct work_struct *work)
 {
 	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, destroy_connector_work);
@@ -3123,7 +3455,6 @@  static void drm_dp_destroy_connector_work(struct work_struct *work)
 		list_del(&port->next);
 		mutex_unlock(&mgr->destroy_connector_lock);
 
-		kref_init(&port->kref);
 		INIT_LIST_HEAD(&port->next);
 
 		mgr->cbs->destroy_connector(mgr, port->connector);
@@ -3137,7 +3468,7 @@  static void drm_dp_destroy_connector_work(struct work_struct *work)
 			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
 		}
 
-		kref_put(&port->kref, drm_dp_free_mst_port);
+		drm_dp_mst_put_port_malloc(port);
 		send_hotplug = true;
 	}
 	if (send_hotplug)
diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
index 371cc2816477..8eca5f29242c 100644
--- a/include/drm/drm_dp_mst_helper.h
+++ b/include/drm/drm_dp_mst_helper.h
@@ -44,7 +44,6 @@  struct drm_dp_vcpi {
 
 /**
  * struct drm_dp_mst_port - MST port
- * @kref: reference count for this port.
  * @port_num: port number
  * @input: if this port is an input port.
  * @mcs: message capability status - DP 1.2 spec.
@@ -67,7 +66,18 @@  struct drm_dp_vcpi {
  * in the MST topology.
  */
 struct drm_dp_mst_port {
-	struct kref kref;
+	/**
+	 * @topology_kref: refcount for this port's lifetime in the topology,
+	 * only the DP MST helpers should need to touch this
+	 */
+	struct kref topology_kref;
+
+	/**
+	 * @malloc_kref: refcount for the memory allocation containing this
+	 * structure. See drm_dp_mst_get_port_malloc() and
+	 * drm_dp_mst_put_port_malloc().
+	 */
+	struct kref malloc_kref;
 
 	u8 port_num;
 	bool input;
@@ -102,7 +112,6 @@  struct drm_dp_mst_port {
 
 /**
  * struct drm_dp_mst_branch - MST branch device.
- * @kref: reference count for this port.
  * @rad: Relative Address to talk to this branch device.
  * @lct: Link count total to talk to this branch device.
  * @num_ports: number of ports on the branch.
@@ -121,7 +130,19 @@  struct drm_dp_mst_port {
  * to downstream port of parent branches.
  */
 struct drm_dp_mst_branch {
-	struct kref kref;
+	/**
+	 * @topology_kref: refcount for this branch device's lifetime in the
+	 * topology, only the DP MST helpers should need to touch this
+	 */
+	struct kref topology_kref;
+
+	/**
+	 * @malloc_kref: refcount for the memory allocation containing this
+	 * structure. See drm_dp_mst_get_mstb_malloc() and
+	 * drm_dp_mst_put_mstb_malloc().
+	 */
+	struct kref malloc_kref;
+
 	u8 rad[8];
 	u8 lct;
 	int num_ports;
@@ -626,4 +647,7 @@  int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
 int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
 				 struct drm_dp_mst_port *port, bool power_up);
 
+void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
+void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
+
 #endif

Comments

Daniel Vetter Jan. 10, 2019, 9:12 a.m.
On Tue, Jan 08, 2019 at 07:35:03PM -0500, Lyude Paul wrote:
> The current way of handling refcounting in the DP MST helpers is really
> confusing and probably just plain wrong because it's been hacked up many
> times over the years without anyone actually going over the code and
> seeing if things could be simplified.
> 
> To the best of my understanding, the current scheme works like this:
> drm_dp_mst_port and drm_dp_mst_branch both have a single refcount. When
> this refcount hits 0 for either of the two, they're removed from the
> topology state, but not immediately freed. Both ports and branch devices
> will reinitialize their kref once it's hit 0 before actually destroying
> themselves. The intended purpose behind this is so that we can avoid
> problems like not being able to free a remote payload that might still
> be active, due to us having removed all of the port/branch device
> structures in memory, as per:
> 
> commit 91a25e463130 ("drm/dp/mst: deallocate payload on port destruction")
> 
> Which may have worked, but then it caused use-after-free errors. Being
> new to MST at the time, I tried fixing it;
> 
> commit 263efde31f97 ("drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()")
> 
> But, that was broken: both drm_dp_mst_port and drm_dp_mst_branch structs
> are validated in almost every DP MST helper function. Simply put, this
> means we go through the topology and try to see if the given
> drm_dp_mst_branch or drm_dp_mst_port is still attached to something
> before trying to use it in order to avoid dereferencing freed memory
> (something that has happened a LOT in the past with this library).
> Because of this it doesn't actually matter whether or not we keep keep
> the ports and branches around in memory as that's not enough, because
> any function that validates the branches and ports passed to it will
> still reject them anyway since they're no longer in the topology
> structure. So, use-after-free errors were fixed but payload deallocation
> was completely broken.
> 
> Two years later, AMD informed me about this issue and I attempted to
> come up with a temporary fix, pending a long-overdue cleanup of this
> library:
> 
> commit c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
> 
> But then that introduced use-after-free errors, so I quickly reverted
> it:
> 
> commit 9765635b3075 ("Revert "drm/dp_mst: Skip validating ports during destruction, just ref"")
> 
> And in the process, learned that there is just no simple fix for this:
> the design is just broken. Unfortunately, the usage of these helpers are
> quite broken as well. Some drivers like i915 have been smart enough to
> avoid accessing any kind of information from MST port structures, but
> others like nouveau have assumed, understandably so, that
> drm_dp_mst_port structures are normal and can just be accessed at any
> time without worrying about use-after-free errors.
> 
> After a lot of discussion, me and Daniel Vetter came up with a better
> idea to replace all of this.
> 
> To summarize, since this is documented far more indepth in the
> documentation this patch introduces, we make it so that drm_dp_mst_port
> and drm_dp_mst_branch structures have two different classes of
> refcounts: topology_kref, and malloc_kref. topology_kref corresponds to
> the lifetime of the given drm_dp_mst_port or drm_dp_mst_branch in it's
> given topology. Once it hits zero, any associated connectors are removed
> and the branch or port can no longer be validated. malloc_kref
> corresponds to the lifetime of the memory allocation for the actual
> structure, and will always be non-zero so long as the topology_kref is
> non-zero. This gives us a way to allow callers to hold onto port and
> branch device structures past their topology lifetime, and dramatically
> simplifies the lifetimes of both structures. This also finally fixes the
> port deallocation problem, properly.
> 
> Additionally: since this now means that we can keep ports and branch
> devices allocated in memory for however long we need, we no longer need
> a significant amount of the port validation that we currently do.
> 
> Additionally, there is one last scenario that this fixes, which couldn't
> have been fixed properly beforehand:
> 
> - CPU1 unrefs port from topology (refcount 1->0)
> - CPU2 refs port in topology(refcount 0->1)
> 
> Since we now can guarantee memory safety for ports and branches
> as-needed, we also can make our main reference counting functions fix
> this problem by using kref_get_unless_zero() internally so that topology
> refcounts can only ever reach 0 once.
> 
> Changes since v3:
> * Remove rebase detritus - danvet
> * Split out purely style changes into separate patches - hwentlan
> 
> Changes since v2:
> * Fix commit message - checkpatch
> * s/)-1/) - 1/g - checkpatch
> 
> Changes since v1:
> * Remove forward declarations - danvet
> * Move "Branch device and port refcounting" section from documentation
>   into kernel-doc comments - danvet
> * Export internal topology lifetime functions into their own section in
>   the kernel-docs - danvet
> * s/@/&/g for struct references in kernel-docs - danvet
> * Drop the "when they are no longer being used" bits from the kernel
>   docs - danvet
> * Modify diagrams to show how the DRM driver interacts with the topology
>   and payloads - danvet
> * Make suggested documentation changes for
>   drm_dp_mst_topology_get_mstb() and drm_dp_mst_topology_get_port() -
>   danvet
> * Better explain the relationship between malloc refs and topology krefs
>   in the documentation for drm_dp_mst_topology_get_port() and
>   drm_dp_mst_topology_get_mstb() - danvet
> * Fix "See also" in drm_dp_mst_topology_get_mstb() - danvet
> * Rename drm_dp_mst_topology_get_(port|mstb)() ->
>   drm_dp_mst_topology_try_get_(port|mstb)() and
>   drm_dp_mst_topology_ref_(port|mstb)() ->
>   drm_dp_mst_topology_get_(port|mstb)() - danvet
> * s/should/must in docs - danvet
> * WARN_ON(refcount == 0) in topology_get_(mstb|port) - danvet
> * Move kdocs for mstb/port structs inline - danvet
> * Split drm_dp_get_last_connected_port_and_mstb() changes into their own
>   commit - danvet
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Reviewed-by: Harry Wentland <harry.wentland@amd.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: David Airlie <airlied@redhat.com>
> Cc: Jerry Zuo <Jerry.Zuo@amd.com>
> Cc: Juston Li <juston.li@intel.com>
> ---
>  .../gpu/dp-mst/topology-figure-1.dot          |  52 ++
>  .../gpu/dp-mst/topology-figure-2.dot          |  56 +++
>  .../gpu/dp-mst/topology-figure-3.dot          |  59 +++
>  Documentation/gpu/drm-kms-helpers.rst         |  26 +-
>  drivers/gpu/drm/drm_dp_mst_topology.c         | 457 +++++++++++++++---
>  include/drm/drm_dp_mst_helper.h               |  32 +-
>  6 files changed, 613 insertions(+), 69 deletions(-)
>  create mode 100644 Documentation/gpu/dp-mst/topology-figure-1.dot
>  create mode 100644 Documentation/gpu/dp-mst/topology-figure-2.dot
>  create mode 100644 Documentation/gpu/dp-mst/topology-figure-3.dot
> 
> diff --git a/Documentation/gpu/dp-mst/topology-figure-1.dot b/Documentation/gpu/dp-mst/topology-figure-1.dot
> new file mode 100644
> index 000000000000..157e17c7e0b0
> --- /dev/null
> +++ b/Documentation/gpu/dp-mst/topology-figure-1.dot
> @@ -0,0 +1,52 @@
> +digraph T {
> +    /* Make sure our payloads are always drawn below the driver node */
> +    subgraph cluster_driver {
> +        fillcolor = grey;
> +        style = filled;
> +        driver -> {payload1, payload2} [dir=none];
> +    }
> +
> +    /* Driver malloc references */
> +    edge [style=dashed];
> +    driver -> port1;
> +    driver -> port2;
> +    driver -> port3:e;
> +    driver -> port4;
> +
> +    payload1:s -> port1:e;
> +    payload2:s -> port3:e;
> +    edge [style=""];
> +
> +    subgraph cluster_topology {
> +        label="Topology Manager";
> +        labelloc=bottom;
> +
> +        /* Topology references */
> +        mstb1 -> {port1, port2};
> +        port1 -> mstb2;
> +        port2 -> mstb3 -> {port3, port4};
> +        port3 -> mstb4;
> +
> +        /* Malloc references */
> +        edge [style=dashed;dir=back];
> +        mstb1 -> {port1, port2};
> +        port1 -> mstb2;
> +        port2 -> mstb3 -> {port3, port4};
> +        port3 -> mstb4;
> +    }
> +
> +    driver [label="DRM driver";style=filled;shape=box;fillcolor=lightblue];
> +
> +    payload1 [label="Payload #1";style=filled;shape=box;fillcolor=lightblue];
> +    payload2 [label="Payload #2";style=filled;shape=box;fillcolor=lightblue];
> +
> +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen;shape=oval];
> +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen;shape=oval];
> +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen;shape=oval];
> +    mstb4 [label="MSTB #4";style=filled;fillcolor=palegreen;shape=oval];
> +
> +    port1 [label="Port #1";shape=oval];
> +    port2 [label="Port #2";shape=oval];
> +    port3 [label="Port #3";shape=oval];
> +    port4 [label="Port #4";shape=oval];
> +}
> diff --git a/Documentation/gpu/dp-mst/topology-figure-2.dot b/Documentation/gpu/dp-mst/topology-figure-2.dot
> new file mode 100644
> index 000000000000..4243dd1737cb
> --- /dev/null
> +++ b/Documentation/gpu/dp-mst/topology-figure-2.dot
> @@ -0,0 +1,56 @@
> +digraph T {
> +    /* Make sure our payloads are always drawn below the driver node */
> +    subgraph cluster_driver {
> +        fillcolor = grey;
> +        style = filled;
> +        driver -> {payload1, payload2} [dir=none];
> +    }
> +
> +    /* Driver malloc references */
> +    edge [style=dashed];
> +    driver -> port1;
> +    driver -> port2;
> +    driver -> port3:e;
> +    driver -> port4 [color=red];
> +
> +    payload1:s -> port1:e;
> +    payload2:s -> port3:e;
> +    edge [style=""];
> +
> +    subgraph cluster_topology {
> +        label="Topology Manager";
> +        labelloc=bottom;
> +
> +        /* Topology references */
> +        mstb1 -> {port1, port2};
> +        port1 -> mstb2;
> +        edge [color=red];
> +        port2 -> mstb3 -> {port3, port4};
> +        port3 -> mstb4;
> +        edge [color=""];
> +
> +        /* Malloc references */
> +        edge [style=dashed;dir=back];
> +        mstb1 -> {port1, port2};
> +        port1 -> mstb2;
> +        port2 -> mstb3 -> port3;
> +        edge [color=red];
> +        mstb3 -> port4;
> +        port3 -> mstb4;
> +    }
> +
> +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen];
> +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> +
> +    port1 [label="Port #1"];
> +    port2 [label="Port #2"];
> +    port3 [label="Port #3"];
> +    port4 [label="Port #4";style=filled;fillcolor=grey];
> +
> +    driver [label="DRM driver";style=filled;shape=box;fillcolor=lightblue];
> +
> +    payload1 [label="Payload #1";style=filled;shape=box;fillcolor=lightblue];
> +    payload2 [label="Payload #2";style=filled;shape=box;fillcolor=lightblue];
> +}
> diff --git a/Documentation/gpu/dp-mst/topology-figure-3.dot b/Documentation/gpu/dp-mst/topology-figure-3.dot
> new file mode 100644
> index 000000000000..6cd78d06778b
> --- /dev/null
> +++ b/Documentation/gpu/dp-mst/topology-figure-3.dot
> @@ -0,0 +1,59 @@
> +digraph T {
> +    /* Make sure our payloads are always drawn below the driver node */
> +    subgraph cluster_driver {
> +        fillcolor = grey;
> +        style = filled;
> +        edge [dir=none];
> +        driver -> payload1;
> +        driver -> payload2 [penwidth=3];
> +        edge [dir=""];
> +    }
> +
> +    /* Driver malloc references */
> +    edge [style=dashed];
> +    driver -> port1;
> +    driver -> port2;
> +    driver -> port3:e;
> +    driver -> port4 [color=grey];
> +    payload1:s -> port1:e;
> +    payload2:s -> port3:e [penwidth=3];
> +    edge [style=""];
> +
> +    subgraph cluster_topology {
> +        label="Topology Manager";
> +        labelloc=bottom;
> +
> +        /* Topology references */
> +        mstb1 -> {port1, port2};
> +        port1 -> mstb2;
> +        edge [color=grey];
> +        port2 -> mstb3 -> {port3, port4};
> +        port3 -> mstb4;
> +        edge [color=""];
> +
> +        /* Malloc references */
> +        edge [style=dashed;dir=back];
> +        mstb1 -> {port1, port2};
> +        port1 -> mstb2;
> +        port2 -> mstb3 [penwidth=3];
> +        mstb3 -> port3 [penwidth=3];
> +        edge [color=grey];
> +        mstb3 -> port4;
> +        port3 -> mstb4;
> +    }
> +
> +    mstb1 [label="MSTB #1";style=filled;fillcolor=palegreen];
> +    mstb2 [label="MSTB #2";style=filled;fillcolor=palegreen];
> +    mstb3 [label="MSTB #3";style=filled;fillcolor=palegreen;penwidth=3];
> +    mstb4 [label="MSTB #4";style=filled;fillcolor=grey];
> +
> +    port1 [label="Port #1"];
> +    port2 [label="Port #2";penwidth=5];
> +    port3 [label="Port #3";penwidth=3];
> +    port4 [label="Port #4";style=filled;fillcolor=grey];
> +
> +    driver [label="DRM driver";style=filled;shape=box;fillcolor=lightblue];
> +
> +    payload1 [label="Payload #1";style=filled;shape=box;fillcolor=lightblue];
> +    payload2 [label="Payload #2";style=filled;shape=box;fillcolor=lightblue;penwidth=3];
> +}
> diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst
> index b422eb8edf16..7a3fc569bc68 100644
> --- a/Documentation/gpu/drm-kms-helpers.rst
> +++ b/Documentation/gpu/drm-kms-helpers.rst
> @@ -208,18 +208,40 @@ Display Port Dual Mode Adaptor Helper Functions Reference
>  .. kernel-doc:: drivers/gpu/drm/drm_dp_dual_mode_helper.c
>     :export:
>  
> -Display Port MST Helper Functions Reference
> -===========================================
> +Display Port MST Helpers
> +========================
> +
> +Overview
> +--------
>  
>  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
>     :doc: dp mst helper
>  
> +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> +   :doc: Branch device and port refcounting
> +
> +Functions Reference
> +-------------------
> +
>  .. kernel-doc:: include/drm/drm_dp_mst_helper.h
>     :internal:
>  
>  .. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
>     :export:
>  
> +Topology Lifetime Internals
> +---------------------------
> +
> +These functions aren't exported to drivers, but are documented here to help make
> +the MST topology helpers easier to understand
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_dp_mst_topology.c
> +   :functions: drm_dp_mst_topology_try_get_mstb drm_dp_mst_topology_get_mstb
> +               drm_dp_mst_topology_put_mstb
> +               drm_dp_mst_topology_try_get_port drm_dp_mst_topology_get_port
> +               drm_dp_mst_topology_put_port
> +               drm_dp_mst_get_mstb_malloc drm_dp_mst_put_mstb_malloc
> +
>  MIPI DSI Helper Functions Reference
>  ===================================
>  
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 074e985093ca..c53cf7eb1dbc 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -850,46 +850,211 @@ static struct drm_dp_mst_branch *drm_dp_add_mst_branch_device(u8 lct, u8 *rad)
>  	if (lct > 1)
>  		memcpy(mstb->rad, rad, lct / 2);
>  	INIT_LIST_HEAD(&mstb->ports);
> -	kref_init(&mstb->kref);
> +	kref_init(&mstb->topology_kref);
> +	kref_init(&mstb->malloc_kref);
>  	return mstb;
>  }
>  
> -static void drm_dp_free_mst_port(struct kref *kref);
> -
>  static void drm_dp_free_mst_branch_device(struct kref *kref)
>  {
> -	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
> -	if (mstb->port_parent) {
> -		if (list_empty(&mstb->port_parent->next))
> -			kref_put(&mstb->port_parent->kref, drm_dp_free_mst_port);
> -	}
> +	struct drm_dp_mst_branch *mstb =
> +		container_of(kref, struct drm_dp_mst_branch, malloc_kref);
> +
> +	if (mstb->port_parent)
> +		drm_dp_mst_put_port_malloc(mstb->port_parent);
> +
>  	kfree(mstb);
>  }
>  
> +/**
> + * DOC: Branch device and port refcounting
> + *
> + * Topology refcount overview
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * The refcounting schemes for &struct drm_dp_mst_branch and &struct
> + * drm_dp_mst_port are somewhat unusual. Both ports and branch devices have
> + * two different kinds of refcounts: topology refcounts, and malloc refcounts.
> + *
> + * Topology refcounts are not exposed to drivers, and are handled internally
> + * by the DP MST helpers. The helpers use them in order to prevent the
> + * in-memory topology state from being changed in the middle of critical
> + * operations like changing the internal state of payload allocations. This
> + * means each branch and port will be considered to be connected to the rest
> + * of the topology until it's topology refcount reaches zero. Additionally,
> + * for ports this means that their associated &struct drm_connector will stay
> + * registered with userspace until the port's refcount reaches 0.
> + *
> + * Malloc refcount overview
> + * ~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * Malloc references are used to keep a &struct drm_dp_mst_port or &struct
> + * drm_dp_mst_branch allocated even after all of its topology references have
> + * been dropped, so that the driver or MST helpers can safely access each
> + * branch's last known state before it was disconnected from the topology.
> + * When the malloc refcount of a port or branch reaches 0, the memory
> + * allocation containing the &struct drm_dp_mst_branch or &struct
> + * drm_dp_mst_port respectively will be freed.
> + *
> + * For &struct drm_dp_mst_branch, malloc refcounts are not currently exposed
> + * to drivers. As of writing this documentation, there are no drivers that
> + * have a usecase for accessing &struct drm_dp_mst_branch outside of the MST
> + * helpers. Exposing this API to drivers in a race-free manner would take more
> + * tweaking of the refcounting scheme, however patches are welcome provided
> + * there is a legitimate driver usecase for this.
> + *
> + * Refcount relationships in a topology
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + *
> + * Let's take a look at why the relationship between topology and malloc
> + * refcounts is designed the way it is.
> + *
> + * .. kernel-figure:: dp-mst/topology-figure-1.dot
> + *
> + *    An example of topology and malloc refs in a DP MST topology with two
> + *    active payloads. Topology refcount increments are indicated by solid
> + *    lines, and malloc refcount increments are indicated by dashed lines.
> + *    Each starts from the branch which incremented the refcount, and ends at
> + *    the branch to which the refcount belongs to.

I had to take a double take on this one here.  Maybe add ", i.e. the
arrows point the same way as the C pointers used to reference a
structure."

> + *
> + * As you can see in figure 1, every branch increments the topology refcount

Maybe s/figure 1/above figure/, the output doesn't have numbered figures
(and if it did, we'd need an rst reference).

> + * of it's children, and increments the malloc refcount of it's parent.
> + * Additionally, every payload increments the malloc refcount of it's assigned
> + * port by 1.
> + *
> + * So, what would happen if MSTB #3 from the above figure was unplugged from
> + * the system, but the driver hadn't yet removed payload #2 from port #3? The
> + * topology would start to look like figure 2.
> + *
> + * .. kernel-figure:: dp-mst/topology-figure-2.dot
> + *
> + *    Ports and branch devices which have been released from memory are
> + *    colored grey, and references which have been removed are colored red.
> + *
> + * Whenever a port or branch device's topology refcount reaches zero, it will
> + * decrement the topology refcounts of all its children, the malloc refcount
> + * of its parent, and finally its own malloc refcount. For MSTB #4 and port
> + * #4, this means they both have been disconnected from the topology and freed
> + * from memory. But, because payload #2 is still holding a reference to port
> + * #3, port #3 is removed from the topology but it's &struct drm_dp_mst_port
> + * is still accessible from memory. This also means port #3 has not yet
> + * decremented the malloc refcount of MSTB #3, so it's &struct
> + * drm_dp_mst_branch will also stay allocated in memory until port #3's
> + * malloc refcount reaches 0.
> + *
> + * This relationship is necessary because in order to release payload #2, we
> + * need to be able to figure out the last relative of port #3 that's still
> + * connected to the topology. In this case, we would travel up the topology as
> + * shown in figure 3.
> + *
> + * .. kernel-figure:: dp-mst/topology-figure-3.dot
> + *
> + * And finally, remove payload #2 by communicating with port #2 through
> + * sideband transactions.
> + */
> +
> +/**
> + * drm_dp_mst_get_mstb_malloc() - Increment the malloc refcount of a branch
> + * device
> + * @mstb: The &struct drm_dp_mst_branch to increment the malloc refcount of
> + *
> + * Increments &drm_dp_mst_branch.malloc_kref. When
> + * &drm_dp_mst_branch.malloc_kref reaches 0, the memory allocation for @mstb
> + * will be released and @mstb may no longer be used.
> + *
> + * See also: drm_dp_mst_put_mstb_malloc()
> + */
> +static void
> +drm_dp_mst_get_mstb_malloc(struct drm_dp_mst_branch *mstb)
> +{
> +	kref_get(&mstb->malloc_kref);
> +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref));
> +}
> +
> +/**
> + * drm_dp_mst_put_mstb_malloc() - Decrement the malloc refcount of a branch
> + * device
> + * @mstb: The &struct drm_dp_mst_branch to decrement the malloc refcount of
> + *
> + * Decrements &drm_dp_mst_branch.malloc_kref. When
> + * &drm_dp_mst_branch.malloc_kref reaches 0, the memory allocation for @mstb
> + * will be released and @mstb may no longer be used.
> + *
> + * See also: drm_dp_mst_get_mstb_malloc()
> + */
> +static void
> +drm_dp_mst_put_mstb_malloc(struct drm_dp_mst_branch *mstb)
> +{
> +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->malloc_kref) - 1);
> +	kref_put(&mstb->malloc_kref, drm_dp_free_mst_branch_device);
> +}
> +
> +static void drm_dp_free_mst_port(struct kref *kref)
> +{
> +	struct drm_dp_mst_port *port =
> +		container_of(kref, struct drm_dp_mst_port, malloc_kref);
> +
> +	drm_dp_mst_put_mstb_malloc(port->parent);
> +	kfree(port);
> +}
> +
> +/**
> + * drm_dp_mst_get_port_malloc() - Increment the malloc refcount of an MST port
> + * @port: The &struct drm_dp_mst_port to increment the malloc refcount of
> + *
> + * Increments &drm_dp_mst_port.malloc_kref. When &drm_dp_mst_port.malloc_kref
> + * reaches 0, the memory allocation for @port will be released and @port may
> + * no longer be used.
> + *
> + * Because @port could potentially be freed at any time by the DP MST helpers
> + * if &drm_dp_mst_port.malloc_kref reaches 0, including during a call to this
> + * function, drivers that which to make use of &struct drm_dp_mst_port should
> + * ensure that they grab at least one main malloc reference to their MST ports
> + * in &drm_dp_mst_topology_cbs.add_connector. This callback is called before
> + * there is any chance for &drm_dp_mst_port.malloc_kref to reach 0.
> + *
> + * See also: drm_dp_mst_put_port_malloc()
> + */
> +void
> +drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port)
> +{
> +	kref_get(&port->malloc_kref);
> +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref));
> +}
> +EXPORT_SYMBOL(drm_dp_mst_get_port_malloc);
> +
> +/**
> + * drm_dp_mst_put_port_malloc() - Decrement the malloc refcount of an MST port
> + * @port: The &struct drm_dp_mst_port to decrement the malloc refcount of
> + *
> + * Decrements &drm_dp_mst_port.malloc_kref. When &drm_dp_mst_port.malloc_kref
> + * reaches 0, the memory allocation for @port will be released and @port may
> + * no longer be used.
> + *
> + * See also: drm_dp_mst_get_port_malloc()
> + */
> +void
> +drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port)
> +{
> +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->malloc_kref) - 1);
> +	kref_put(&port->malloc_kref, drm_dp_free_mst_port);
> +}
> +EXPORT_SYMBOL(drm_dp_mst_put_port_malloc);
> +
>  static void drm_dp_destroy_mst_branch_device(struct kref *kref)
>  {
> -	struct drm_dp_mst_branch *mstb = container_of(kref, struct drm_dp_mst_branch, kref);
> +	struct drm_dp_mst_branch *mstb =
> +		container_of(kref, struct drm_dp_mst_branch, topology_kref);
> +	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
>  	struct drm_dp_mst_port *port, *tmp;
>  	bool wake_tx = false;
>  
> -	/*
> -	 * init kref again to be used by ports to remove mst branch when it is
> -	 * not needed anymore
> -	 */
> -	kref_init(kref);
> -
> -	if (mstb->port_parent && list_empty(&mstb->port_parent->next))
> -		kref_get(&mstb->port_parent->kref);
> -
> -	/*
> -	 * destroy all ports - don't need lock
> -	 * as there are no more references to the mst branch
> -	 * device at this point.
> -	 */
> +	mutex_lock(&mgr->lock);
>  	list_for_each_entry_safe(port, tmp, &mstb->ports, next) {
>  		list_del(&port->next);
>  		drm_dp_mst_topology_put_port(port);
>  	}
> +	mutex_unlock(&mgr->lock);
>  
>  	/* drop any tx slots msg */
>  	mutex_lock(&mstb->mgr->qlock);
> @@ -908,14 +1073,83 @@ static void drm_dp_destroy_mst_branch_device(struct kref *kref)
>  	if (wake_tx)
>  		wake_up_all(&mstb->mgr->tx_waitq);
>  
> -	kref_put(kref, drm_dp_free_mst_branch_device);
> +	drm_dp_mst_put_mstb_malloc(mstb);
>  }
>  
> -static void drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
> +/**
> + * drm_dp_mst_topology_try_get_mstb() - Increment the topology refcount of a
> + * branch device unless its zero
> + * @mstb: &struct drm_dp_mst_branch to increment the topology refcount of
> + *
> + * Attempts to grab a topology reference to @mstb, if it hasn't yet been
> + * removed from the topology (e.g. &drm_dp_mst_branch.topology_kref has
> + * reached 0). Holding a topology reference implies that a malloc reference
> + * will be held to @mstb as long as the user holds the topology reference.
> + *
> + * Care should be taken to ensure that the user has at least one malloc
> + * reference to @mstb. If you already have a topology reference to @mstb, you
> + * should use drm_dp_mst_topology_get_mstb() instead.
> + *
> + * See also:
> + * drm_dp_mst_topology_get_mstb()
> + * drm_dp_mst_topology_put_mstb()
> + *
> + * Returns:
> + * * 1: A topology reference was grabbed successfully
> + * * 0: @port is no longer in the topology, no reference was grabbed
> + */
> +static int __must_check
> +drm_dp_mst_topology_try_get_mstb(struct drm_dp_mst_branch *mstb)
>  {
> -	kref_put(&mstb->kref, drm_dp_destroy_mst_branch_device);
> +	int ret = kref_get_unless_zero(&mstb->topology_kref);
> +
> +	if (ret)
> +		DRM_DEBUG("mstb %p (%d)\n", mstb,
> +			  kref_read(&mstb->topology_kref));
> +
> +	return ret;
>  }
>  
> +/**
> + * drm_dp_mst_topology_get_mstb() - Increment the topology refcount of a
> + * branch device
> + * @mstb: The &struct drm_dp_mst_branch to increment the topology refcount of
> + *
> + * Increments &drm_dp_mst_branch.topology_refcount without checking whether or
> + * not it's already reached 0. This is only valid to use in scenarios where
> + * you are already guaranteed to have at least one active topology reference
> + * to @mstb. Otherwise, drm_dp_mst_topology_try_get_mstb() must be used.
> + *
> + * See also:
> + * drm_dp_mst_topology_try_get_mstb()
> + * drm_dp_mst_topology_put_mstb()
> + */
> +static void drm_dp_mst_topology_get_mstb(struct drm_dp_mst_branch *mstb)
> +{
> +	WARN_ON(kref_read(&mstb->topology_kref) == 0);
> +	kref_get(&mstb->topology_kref);
> +	DRM_DEBUG("mstb %p (%d)\n", mstb, kref_read(&mstb->topology_kref));
> +}
> +
> +/**
> + * drm_dp_mst_topology_put_mstb() - release a topology reference to a branch
> + * device
> + * @mstb: The &struct drm_dp_mst_branch to release the topology reference from
> + *
> + * Releases a topology reference from @mstb by decrementing
> + * &drm_dp_mst_branch.topology_kref.
> + *
> + * See also:
> + * drm_dp_mst_topology_try_get_mstb()
> + * drm_dp_mst_topology_get_mstb()
> + */
> +static void
> +drm_dp_mst_topology_put_mstb(struct drm_dp_mst_branch *mstb)
> +{
> +	DRM_DEBUG("mstb %p (%d)\n",
> +		  mstb, kref_read(&mstb->topology_kref) - 1);
> +	kref_put(&mstb->topology_kref, drm_dp_destroy_mst_branch_device);
> +}
>  
>  static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
>  {
> @@ -937,7 +1171,8 @@ static void drm_dp_port_teardown_pdt(struct drm_dp_mst_port *port, int old_pdt)
>  
>  static void drm_dp_destroy_port(struct kref *kref)
>  {
> -	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
> +	struct drm_dp_mst_port *port =
> +		container_of(kref, struct drm_dp_mst_port, topology_kref);
>  	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
>  
>  	if (!port->input) {
> @@ -956,7 +1191,6 @@ static void drm_dp_destroy_port(struct kref *kref)
>  			 * from an EDID retrieval */
>  
>  			mutex_lock(&mgr->destroy_connector_lock);
> -			kref_get(&port->parent->kref);
>  			list_add(&port->next, &mgr->destroy_connector_list);
>  			mutex_unlock(&mgr->destroy_connector_lock);
>  			schedule_work(&mgr->destroy_connector_work);
> @@ -967,12 +1201,79 @@ static void drm_dp_destroy_port(struct kref *kref)
>  		drm_dp_port_teardown_pdt(port, port->pdt);
>  		port->pdt = DP_PEER_DEVICE_NONE;
>  	}
> -	kfree(port);
> +	drm_dp_mst_put_port_malloc(port);
> +}
> +
> +/**
> + * drm_dp_mst_topology_try_get_port() - Increment the topology refcount of a
> + * port unless its zero
> + * @port: &struct drm_dp_mst_port to increment the topology refcount of
> + *
> + * Attempts to grab a topology reference to @port, if it hasn't yet been
> + * removed from the topology (e.g. &drm_dp_mst_port.topology_kref has reached
> + * 0). Holding a topology reference implies that a malloc reference will be
> + * held to @port as long as the user holds the topology reference.
> + *
> + * Care should be taken to ensure that the user has at least one malloc
> + * reference to @port. If you already have a topology reference to @port, you
> + * should use drm_dp_mst_topology_get_port() instead.
> + *
> + * See also:
> + * drm_dp_mst_topology_get_port()
> + * drm_dp_mst_topology_put_port()
> + *
> + * Returns:
> + * * 1: A topology reference was grabbed successfully
> + * * 0: @port is no longer in the topology, no reference was grabbed
> + */
> +static int __must_check
> +drm_dp_mst_topology_try_get_port(struct drm_dp_mst_port *port)
> +{
> +	int ret = kref_get_unless_zero(&port->topology_kref);
> +
> +	if (ret)
> +		DRM_DEBUG("port %p (%d)\n", port,
> +			  kref_read(&port->topology_kref));
> +
> +	return ret;
>  }
>  
> +/**
> + * drm_dp_mst_topology_get_port() - Increment the topology refcount of a port
> + * @port: The &struct drm_dp_mst_port to increment the topology refcount of
> + *
> + * Increments &drm_dp_mst_port.topology_refcount without checking whether or
> + * not it's already reached 0. This is only valid to use in scenarios where
> + * you are already guaranteed to have at least one active topology reference
> + * to @port. Otherwise, drm_dp_mst_topology_try_get_port() must be used.
> + *
> + * See also:
> + * drm_dp_mst_topology_try_get_port()
> + * drm_dp_mst_topology_put_port()
> + */
> +static void drm_dp_mst_topology_get_port(struct drm_dp_mst_port *port)
> +{
> +	WARN_ON(kref_read(&port->topology_kref) == 0);
> +	kref_get(&port->topology_kref);
> +	DRM_DEBUG("port %p (%d)\n", port, kref_read(&port->topology_kref));
> +}
> +
> +/**
> + * drm_dp_mst_topology_put_port() - release a topology reference to a port
> + * @port: The &struct drm_dp_mst_port to release the topology reference from
> + *
> + * Releases a topology reference from @port by decrementing
> + * &drm_dp_mst_port.topology_kref.
> + *
> + * See also:
> + * drm_dp_mst_topology_try_get_port()
> + * drm_dp_mst_topology_get_port()
> + */
>  static void drm_dp_mst_topology_put_port(struct drm_dp_mst_port *port)
>  {
> -	kref_put(&port->kref, drm_dp_destroy_port);
> +	DRM_DEBUG("port %p (%d)\n",
> +		  port, kref_read(&port->topology_kref) - 1);
> +	kref_put(&port->topology_kref, drm_dp_destroy_port);
>  }
>  
>  static struct drm_dp_mst_branch *
> @@ -981,10 +1282,10 @@ drm_dp_mst_topology_get_mstb_validated_locked(struct drm_dp_mst_branch *mstb,
>  {
>  	struct drm_dp_mst_port *port;
>  	struct drm_dp_mst_branch *rmstb;
> -	if (to_find == mstb) {
> -		kref_get(&mstb->kref);
> +
> +	if (to_find == mstb)
>  		return mstb;
> -	}
> +
>  	list_for_each_entry(port, &mstb->ports, next) {
>  		if (port->mstb) {
>  			rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
> @@ -1001,25 +1302,32 @@ drm_dp_mst_topology_get_mstb_validated(struct drm_dp_mst_topology_mgr *mgr,
>  				       struct drm_dp_mst_branch *mstb)
>  {
>  	struct drm_dp_mst_branch *rmstb = NULL;
> +
>  	mutex_lock(&mgr->lock);
> -	if (mgr->mst_primary)
> +	if (mgr->mst_primary) {
>  		rmstb = drm_dp_mst_topology_get_mstb_validated_locked(
>  		    mgr->mst_primary, mstb);
> +
> +		if (rmstb && !drm_dp_mst_topology_try_get_mstb(rmstb))
> +			rmstb = NULL;
> +	}
>  	mutex_unlock(&mgr->lock);
>  	return rmstb;
>  }
>  
> -static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_branch *mstb, struct drm_dp_mst_port *to_find)
> +static struct drm_dp_mst_port *
> +drm_dp_mst_topology_get_port_validated_locked(struct drm_dp_mst_branch *mstb,
> +					      struct drm_dp_mst_port *to_find)
>  {
>  	struct drm_dp_mst_port *port, *mport;
>  
>  	list_for_each_entry(port, &mstb->ports, next) {
> -		if (port == to_find) {
> -			kref_get(&port->kref);
> +		if (port == to_find)
>  			return port;
> -		}
> +
>  		if (port->mstb) {
> -			mport = drm_dp_mst_get_port_ref_locked(port->mstb, to_find);
> +			mport = drm_dp_mst_topology_get_port_validated_locked(
> +			    port->mstb, to_find);
>  			if (mport)
>  				return mport;
>  		}
> @@ -1032,9 +1340,15 @@ drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr *mgr,
>  				       struct drm_dp_mst_port *port)
>  {
>  	struct drm_dp_mst_port *rport = NULL;
> +
>  	mutex_lock(&mgr->lock);
> -	if (mgr->mst_primary)
> -		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
> +	if (mgr->mst_primary) {
> +		rport = drm_dp_mst_topology_get_port_validated_locked(
> +		    mgr->mst_primary, port);
> +
> +		if (rport && !drm_dp_mst_topology_try_get_port(rport))
> +			rport = NULL;
> +	}
>  	mutex_unlock(&mgr->lock);
>  	return rport;
>  }
> @@ -1042,11 +1356,12 @@ drm_dp_mst_topology_get_port_validated(struct drm_dp_mst_topology_mgr *mgr,
>  static struct drm_dp_mst_port *drm_dp_get_port(struct drm_dp_mst_branch *mstb, u8 port_num)
>  {
>  	struct drm_dp_mst_port *port;
> +	int ret;
>  
>  	list_for_each_entry(port, &mstb->ports, next) {
>  		if (port->port_num == port_num) {
> -			kref_get(&port->kref);
> -			return port;
> +			ret = drm_dp_mst_topology_try_get_port(port);
> +			return ret ? port : NULL;
>  		}
>  	}
>  
> @@ -1095,6 +1410,11 @@ static bool drm_dp_port_setup_pdt(struct drm_dp_mst_port *port)
>  		if (port->mstb) {
>  			port->mstb->mgr = port->mgr;
>  			port->mstb->port_parent = port;
> +			/*
> +			 * Make sure this port's memory allocation stays
> +			 * around until it's child MSTB releases it
> +			 */
> +			drm_dp_mst_get_port_malloc(port);
>  
>  			send_link = true;
>  		}
> @@ -1155,17 +1475,26 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
>  	bool created = false;
>  	int old_pdt = 0;
>  	int old_ddps = 0;
> +
>  	port = drm_dp_get_port(mstb, port_msg->port_number);
>  	if (!port) {
>  		port = kzalloc(sizeof(*port), GFP_KERNEL);
>  		if (!port)
>  			return;
> -		kref_init(&port->kref);
> +		kref_init(&port->topology_kref);
> +		kref_init(&port->malloc_kref);
>  		port->parent = mstb;
>  		port->port_num = port_msg->port_number;
>  		port->mgr = mstb->mgr;
>  		port->aux.name = "DPMST";
>  		port->aux.dev = dev->dev;
> +
> +		/*
> +		 * Make sure the memory allocation for our parent branch stays
> +		 * around until our own memory allocation is released
> +		 */
> +		drm_dp_mst_get_mstb_malloc(mstb);
> +
>  		created = true;
>  	} else {
>  		old_pdt = port->pdt;
> @@ -1185,7 +1514,7 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
>  	   for this list */
>  	if (created) {
>  		mutex_lock(&mstb->mgr->lock);
> -		kref_get(&port->kref);
> +		drm_dp_mst_topology_get_port(port);
>  		list_add(&port->next, &mstb->ports);
>  		mutex_unlock(&mstb->mgr->lock);
>  	}
> @@ -1284,7 +1613,7 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
>  {
>  	struct drm_dp_mst_branch *mstb;
>  	struct drm_dp_mst_port *port;
> -	int i;
> +	int i, ret;
>  	/* find the port by iterating down */
>  
>  	mutex_lock(&mgr->lock);
> @@ -1309,7 +1638,9 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
>  			}
>  		}
>  	}
> -	kref_get(&mstb->kref);
> +	ret = drm_dp_mst_topology_try_get_mstb(mstb);
> +	if (!ret)
> +		mstb = NULL;
>  out:
>  	mutex_unlock(&mgr->lock);
>  	return mstb;
> @@ -1344,14 +1675,17 @@ drm_dp_get_mst_branch_device_by_guid(struct drm_dp_mst_topology_mgr *mgr,
>  				     uint8_t *guid)
>  {
>  	struct drm_dp_mst_branch *mstb;
> +	int ret;
>  
>  	/* find the port by iterating down */
>  	mutex_lock(&mgr->lock);
>  
>  	mstb = get_mst_branch_device_by_guid_helper(mgr->mst_primary, guid);
> -
> -	if (mstb)
> -		kref_get(&mstb->kref);
> +	if (mstb) {
> +		ret = drm_dp_mst_topology_try_get_mstb(mstb);
> +		if (!ret)
> +			mstb = NULL;
> +	}
>  
>  	mutex_unlock(&mgr->lock);
>  	return mstb;
> @@ -1390,11 +1724,14 @@ static void drm_dp_mst_link_probe_work(struct work_struct *work)
>  {
>  	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, work);
>  	struct drm_dp_mst_branch *mstb;
> +	int ret;
>  
>  	mutex_lock(&mgr->lock);
>  	mstb = mgr->mst_primary;
>  	if (mstb) {
> -		kref_get(&mstb->kref);
> +		ret = drm_dp_mst_topology_try_get_mstb(mstb);
> +		if (!ret)
> +			mstb = NULL;
>  	}
>  	mutex_unlock(&mgr->lock);
>  	if (mstb) {
> @@ -1722,8 +2059,10 @@ static struct drm_dp_mst_branch *drm_dp_get_last_connected_port_and_mstb(struct
>  
>  		if (found_port) {
>  			rmstb = found_port->parent;
> -			kref_get(&rmstb->kref);
> -			*port_num = found_port->port_num;
> +			if (drm_dp_mst_topology_try_get_mstb(rmstb))
> +				*port_num = found_port->port_num;
> +			else
> +				rmstb = NULL;
>  		}
>  	}
>  	mutex_unlock(&mgr->lock);
> @@ -2176,7 +2515,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
>  
>  		/* give this the main reference */
>  		mgr->mst_primary = mstb;
> -		kref_get(&mgr->mst_primary->kref);
> +		drm_dp_mst_topology_get_mstb(mgr->mst_primary);
>  
>  		ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
>  							 DP_MST_EN | DP_UP_REQ_EN | DP_UPSTREAM_IS_SRC);
> @@ -3096,13 +3435,6 @@ static void drm_dp_tx_work(struct work_struct *work)
>  	mutex_unlock(&mgr->qlock);
>  }
>  
> -static void drm_dp_free_mst_port(struct kref *kref)
> -{
> -	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
> -	kref_put(&port->parent->kref, drm_dp_free_mst_branch_device);
> -	kfree(port);
> -}
> -
>  static void drm_dp_destroy_connector_work(struct work_struct *work)
>  {
>  	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, destroy_connector_work);
> @@ -3123,7 +3455,6 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
>  		list_del(&port->next);
>  		mutex_unlock(&mgr->destroy_connector_lock);
>  
> -		kref_init(&port->kref);
>  		INIT_LIST_HEAD(&port->next);
>  
>  		mgr->cbs->destroy_connector(mgr, port->connector);
> @@ -3137,7 +3468,7 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
>  			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
>  		}
>  
> -		kref_put(&port->kref, drm_dp_free_mst_port);
> +		drm_dp_mst_put_port_malloc(port);
>  		send_hotplug = true;
>  	}
>  	if (send_hotplug)
> diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
> index 371cc2816477..8eca5f29242c 100644
> --- a/include/drm/drm_dp_mst_helper.h
> +++ b/include/drm/drm_dp_mst_helper.h
> @@ -44,7 +44,6 @@ struct drm_dp_vcpi {
>  
>  /**
>   * struct drm_dp_mst_port - MST port
> - * @kref: reference count for this port.
>   * @port_num: port number
>   * @input: if this port is an input port.
>   * @mcs: message capability status - DP 1.2 spec.
> @@ -67,7 +66,18 @@ struct drm_dp_vcpi {
>   * in the MST topology.
>   */
>  struct drm_dp_mst_port {
> -	struct kref kref;
> +	/**
> +	 * @topology_kref: refcount for this port's lifetime in the topology,
> +	 * only the DP MST helpers should need to touch this
> +	 */
> +	struct kref topology_kref;
> +
> +	/**
> +	 * @malloc_kref: refcount for the memory allocation containing this
> +	 * structure. See drm_dp_mst_get_port_malloc() and
> +	 * drm_dp_mst_put_port_malloc().
> +	 */
> +	struct kref malloc_kref;
>  
>  	u8 port_num;
>  	bool input;
> @@ -102,7 +112,6 @@ struct drm_dp_mst_port {
>  
>  /**
>   * struct drm_dp_mst_branch - MST branch device.
> - * @kref: reference count for this port.
>   * @rad: Relative Address to talk to this branch device.
>   * @lct: Link count total to talk to this branch device.
>   * @num_ports: number of ports on the branch.
> @@ -121,7 +130,19 @@ struct drm_dp_mst_port {
>   * to downstream port of parent branches.
>   */
>  struct drm_dp_mst_branch {
> -	struct kref kref;
> +	/**
> +	 * @topology_kref: refcount for this branch device's lifetime in the
> +	 * topology, only the DP MST helpers should need to touch this
> +	 */
> +	struct kref topology_kref;
> +
> +	/**
> +	 * @malloc_kref: refcount for the memory allocation containing this
> +	 * structure. See drm_dp_mst_get_mstb_malloc() and
> +	 * drm_dp_mst_put_mstb_malloc().
> +	 */
> +	struct kref malloc_kref;
> +
>  	u8 rad[8];
>  	u8 lct;
>  	int num_ports;
> @@ -626,4 +647,7 @@ int drm_dp_atomic_release_vcpi_slots(struct drm_atomic_state *state,
>  int drm_dp_send_power_updown_phy(struct drm_dp_mst_topology_mgr *mgr,
>  				 struct drm_dp_mst_port *port, bool power_up);
>  
> +void drm_dp_mst_get_port_malloc(struct drm_dp_mst_port *port);
> +void drm_dp_mst_put_port_malloc(struct drm_dp_mst_port *port);
> +


_Really_ nice documentation, and the graphs look pretty!

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Aside: We're lacking kerneldoc for drm_dp_mst_topology_cbs. A follow-up to
document a bit what drivers are expected to do in each callback would be
neat.
-Daniel


>  #endif
> -- 
> 2.20.1
>