Small vc4 kms fixes and some questions.

Submitted by Mario Kleiner on May 20, 2016, 1:40 a.m.

Details

Message ID 573E6B05.5010507@gmail.com
State New
Headers show

Not browsing as part of any series.

Patch hide | download patch | download mbox

From 3c563068a891271fc544a5c515995971e3f02955 Mon Sep 17 00:00:00 2001
From: Mario Kleiner <mario.kleiner.de@gmail.com>
Date: Wed, 18 May 2016 05:55:11 +0000
Subject: [PATCH] drm/vc4: Implement precise vblank timestamping.

Precise vblank timestamping is implemented via the
usual scanout position based method. However, on
VC4 the Pixelvalves PV do not have a scanout position
register. Only the Hardware video scaler HVS has a
similar register which describes which scanline for
the output is currently composited and stored in the
HVS fifo for later consumption by the PV.

This causes a problem in that the HVS runs at a much
faster clock (system clock / audio gate) than the PV
which runs at video mode dot clock, so the unless the
fifo between HVS and PV is full, the HVS will progress
faster in its observable read line position than video
scan rate, so the HVS position reading can't be directly
translated into a scanout position for timestamp correction.

Additionally when the PV is in vblank, it doesn't consume
from the fifo, so the fifo gets full very quickly and then
the HVS stops compositing until the PV enters active scanout
and starts consuming scanlines from the fifo again, making
new space for the HVS to composite.

Therefore a simple translation of HVS read position into
elapsed time since (or to) start of active scanout does
not work, but for the most interesting cases we can still
get useful and accurate results:

1. The PV enters active scanout of a new frame with the
   fifo of the HVS completely full, and the HVS can refill
   any fifo line which gets consumed and thereby freed up by
   the PV during active scanout very quickly. Therefore the
   PV and HVS work effectively in lock-step during active
   scanout with the fifo never having more than 1 scanline
   freed up by the PV before it gets refilled. The PV's
   real scanout position is therefore trailing the HVS
   compositing position by scanoutpos = hvspos - fifosize
   and we can get the true scanoutpos as HVS readpos minus
   fifo size, so precise timestamping works while in active
   scanout, except for the last few scanlines of the image,
   when the HVS reaches end of frame, stops compositing and
   the PV catches up and drains the fifo.

2. During early vblank, while the HVS is still refilling the
   empty fifo, we can try to translate fifo fill level into
   PV position in vblank, taking the ratio betwenn HVS clock
   and PV clock into account. From that we estimate how many
   scanlines for the PV to go before start of new active
   scanout.

3. Once the HVS fifo is full in vblank, we can only guess
   something half-way reasonable. If called from vblank irq,
   we assume the irq is usually dispatched with minimum
   delay, so we can take a timestamp taken at entry into vblank
   irq handler as a baseline and then add a full vblank duration
   until the guessed start of active scanout. As irq dispatch
   is usually pretty low latency this works with relatively
   low jitter and good results.

   If we aren't called from vblank then we could be anywhere
   within the vblank interval, so we return a neutral result,
   simply the current system timestamp, and hope for the best.

In practice we are usually in case 3 + vblank irq, or case 2,
and the timestamps are rather precise, and at least never off
more than 1 vblank duration. During vblank on/off, or if the
timestamping gets strongly delayed into active scanout due to
things ike irqs being off for a long time, lock contention etc.,
we end in case 1 and also get very precise timestamps.

TODO:

- Case 2 does not improve accuracy, probably due to wrong
  assumptions about HVS clock frequency, or about refill
  behavior, or because it only covers about 1-3 scanlines
  at most and suffers roundoff errors. Drop case 2? Or a
  way to refine it?

- Replace heuristically found numbers for fifo linebuffer
  capacity by some actual info from Eric.

- A way to improve case 3 in non-irq case?

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
---
 drivers/gpu/drm/vc4/vc4_crtc.c | 155 +++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/vc4/vc4_drv.c  |   2 +
 drivers/gpu/drm/vc4/vc4_drv.h  |   7 ++
 drivers/gpu/drm/vc4/vc4_kms.c  |  19 +++++
 4 files changed, 183 insertions(+)

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index 1981ca3..2de8ba9 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -46,6 +46,9 @@  struct vc4_crtc {
 	const struct vc4_crtc_data *data;
 	void __iomem *regs;
 
+	/* Timestamp at start of vblank irq - unaffected by lock delays. */
+	ktime_t t_vblank;
+
 	/* Which HVS channel we're using for our CRTC. */
 	int channel;
 
@@ -146,6 +149,156 @@  int vc4_crtc_debugfs_regs(struct seq_file *m, void *unused)
 }
 #endif
 
+int vc4_crtc_get_scanoutpos(struct drm_device *dev, unsigned int crtc_id,
+			    unsigned int flags, int *vpos, int *hpos,
+			    ktime_t *stime, ktime_t *etime,
+			    const struct drm_display_mode *mode)
+{
+	struct vc4_dev *vc4 = to_vc4_dev(dev);
+	struct vc4_crtc *vc4_crtc = vc4->crtc[crtc_id];
+	u32 val;
+	int fifo_lines, vblank_lines;
+	int ret = 0;
+
+	/* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */
+
+	/* Get optional system timestamp before query. */
+	if (stime)
+		*stime = ktime_get();
+
+	/*
+	 * Read vertical scanline which is currently composed for our
+	 * pixelvalve by the HVS, and also scaler status.
+	 */
+	val = HVS_READ(SCALER_DISPSTATX(vc4_crtc->channel));
+
+	/* Get optional system timestamp after query. */
+	if (etime)
+		*etime = ktime_get();
+
+	/* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */
+
+	/* vpos of composed (not scanned out) scanline is in 12 lsb's. */
+	*vpos = val & VC4_MASK(11, 0);
+
+	/* No hpos info available. */
+	if (hpos)
+		*hpos = 0;
+
+	/* XXX Proper estimate? Works for different modes, but guesswork...
+	 * Estimated the linebuffer fifo to hold 7 lines at 2048 horizontal
+	 * resolution, then guess a reasonable size according to HACTIVE.
+	 */
+	fifo_lines = (2048 * 7 / mode->crtc_hdisplay) - 1;
+
+	DRM_DEBUG_KMS("crtc %d : vblirq %d : fifo-lines %d : fifo-full %d : usec-irqdelay: %d : scanline %d\n",
+		      crtc_id, (flags & DRM_CALLED_FROM_VBLIRQ) ? 1 : 0, fifo_lines,
+		      ((val & SCALER_DISPSTATX_FULL) == SCALER_DISPSTATX_FULL) ? 1 : 0,
+		      (int) (ktime_to_ns(*stime) - ktime_to_ns(vc4_crtc->t_vblank)) / 1000,
+		      *vpos);
+
+	/* HVS more than fifo_lines into frame for compositing? */
+	if (*vpos > fifo_lines) {
+		/*
+		 * We are in active scanout and can get some meaningful results
+		 * from HVS. The actual PV scanout can not trail behind more
+		 * than fifo_lines as that is the fifo's capacity. Assume that
+		 * in active scanout the HVS and PV work in lockstep wrt. HVS
+		 * refilling the fifo and PV consuming from the fifo, ie.
+		 * whenever the PV consumes and frees up a scanline in the
+		 * fifo, the HVS will immediately refill it, therefore
+		 * incrementing vpos. Therefore we choose HVS refill position -
+		 * fifo size in scanlines as a good guess wrt. the real scanout
+		 * position of the PV.
+		 */
+		*vpos -= fifo_lines + 1;
+		ret |= DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE;
+		return ret;
+	}
+
+	/*
+	 * Less: This happens when we are in vblank and the HVS, after getting
+	 * the VSTART restart signal from the PV, just started refilling its
+	 * fifo with new lines from the top-most lines of the new framebuffers.
+	 * The PV does not scan out in vblank, so does not remove lines from
+	 * the fifo, so the fifo will be full quickly. At that point we can not
+	 * get meaningful readings wrt. scanline position of the PV and need
+	 * to make things up in a approximative and somewhat consistent way.
+	 */
+	ret |= DRM_SCANOUTPOS_IN_VBLANK;
+	vblank_lines = mode->crtc_vtotal - mode->crtc_vdisplay;
+
+	/* XXX Is there really a point in this special case - worth it?
+	 * If the HVS fifo is not yet full, we can try to estimate how many
+	 * scanlines we are into VBLANK by how many fifo lines HVS could
+	 * refill since start of VBLANK, therefore how many scanlines to go
+	 * until start of active scanout.
+	 */
+	if ((*vpos < fifo_lines) &&
+	    ((val & SCALER_DISPSTATX_FULL) != SCALER_DISPSTATX_FULL)) {
+		/*
+		 * Fifo not yet fully refilled, so presumably at beginning of
+		 * vblank. Calculate scanlines to go until end of vblank. We
+		 * assume the HVS fills its fifo at the system clock rate of
+		 * 250 Mhz, but the PV moves through the vblank at video dot
+		 * clock rate, so must not reduce lines to go by *vpos but only
+		 * by the fraction of dot_clock / system_clock.
+		 * XXX Is this right? Is 250 Mhz right?
+		 */
+		*vpos = -vblank_lines + (*vpos * mode->crtc_clock / 250000);
+		ret |= DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE;
+		return ret;
+	}
+
+	/* Fifo full. The HVS readings can't tell us where we are in VBLANK,
+	 * because the HVS is not compositing and updating its source read
+	 * position and the PV will not start consuming scanlines before the
+	 * start of active scanout, therefore the HVS can not update anything
+	 * while we are in VBLANK. All we can do is fake something reasonable.
+	 */
+	if (flags & DRM_CALLED_FROM_VBLIRQ) {
+		/*
+		 * If we are called from vblank irq then we assume the irq
+		 * handler got called within the first line of vblank,
+		 * so assume PV had about a full vblank scanlines to go.
+		 * As a base timestamp use the one taken at entry into vblank
+		 * irq handler, so it was not affected by delays due to lock
+		 * contention on event_lock or vblank_time lock in the core.
+		 */
+		*vpos = -vblank_lines;
+		if (stime)
+			*stime = vc4_crtc->t_vblank;
+		if (etime)
+			*etime = vc4_crtc->t_vblank;
+
+		ret |= DRM_SCANOUTPOS_VALID;
+	} else {
+		/*
+		 * No clue where we are. Return a vpos of zero, which will
+		 * cause calling code to just return the system timestamp.
+		 * At least this is no worse than the standard fallback.
+		 */
+		*vpos = 0;
+		ret |= DRM_SCANOUTPOS_VALID;
+	}
+
+	return ret;
+}
+
+int vc4_crtc_get_vblank_timestamp(struct drm_device *dev, unsigned int crtc_id,
+				  int *max_error, struct timeval *vblank_time,
+				  unsigned flags)
+{
+	struct vc4_dev *vc4 = to_vc4_dev(dev);
+	struct vc4_crtc *vc4_crtc = vc4->crtc[crtc_id];
+	struct drm_crtc *crtc = &vc4_crtc->base;
+
+	/* Helper routine in DRM core does all the work: */
+	return drm_calc_vbltimestamp_from_scanoutpos(dev, crtc_id, max_error,
+						     vblank_time, flags,
+						     &crtc->hwmode);
+}
+
 static void vc4_crtc_destroy(struct drm_crtc *crtc)
 {
 	drm_crtc_cleanup(crtc);
@@ -526,7 +679,9 @@  static irqreturn_t vc4_crtc_irq_handler(int irq, void *data)
 	irqreturn_t ret = IRQ_NONE;
 
 	if (stat & PV_INT_VFP_START) {
+		vc4_crtc->t_vblank = ktime_get();
 		CRTC_WRITE(PV_INTSTAT, PV_INT_VFP_START);
+		/* udelay(1500); */
 		drm_crtc_handle_vblank(&vc4_crtc->base);
 		vc4_crtc_handle_page_flip(vc4_crtc);
 		ret = IRQ_HANDLED;
diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c
index 7ff70ce7..bf9054c 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.c
+++ b/drivers/gpu/drm/vc4/vc4_drv.c
@@ -92,6 +92,8 @@  static struct drm_driver vc4_drm_driver = {
 	.enable_vblank = vc4_enable_vblank,
 	.disable_vblank = vc4_disable_vblank,
 	.get_vblank_counter = drm_vblank_no_hw_counter,
+	.get_scanout_position = vc4_crtc_get_scanoutpos,
+	.get_vblank_timestamp = vc4_crtc_get_vblank_timestamp,
 
 #if defined(CONFIG_DEBUG_FS)
 	.debugfs_init = vc4_debugfs_init,
diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index 37cac59..1b5dc60 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -415,6 +415,13 @@  extern struct platform_driver vc4_crtc_driver;
 int vc4_enable_vblank(struct drm_device *dev, unsigned int crtc_id);
 void vc4_disable_vblank(struct drm_device *dev, unsigned int crtc_id);
 int vc4_crtc_debugfs_regs(struct seq_file *m, void *arg);
+int vc4_crtc_get_scanoutpos(struct drm_device *dev, unsigned int crtc_id,
+			    unsigned int flags, int *vpos, int *hpos,
+			    ktime_t *stime, ktime_t *etime,
+			    const struct drm_display_mode *mode);
+int vc4_crtc_get_vblank_timestamp(struct drm_device *dev, unsigned int crtc_id,
+				  int *max_error, struct timeval *vblank_time,
+				  unsigned flags);
 
 /* vc4_debugfs.c */
 int vc4_debugfs_init(struct drm_minor *minor);
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index d423ba1..6092c55 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -89,6 +89,22 @@  static struct vc4_commit *commit_init(struct drm_atomic_state *state)
 	return c;
 }
 
+static void vc4_update_crtc_state(struct drm_atomic_state *state)
+{
+	struct drm_crtc *crtc;
+	struct drm_crtc_state *crtc_state;
+	int i;
+
+	/* Double check state. */
+	for_each_crtc_in_state(state, crtc, crtc_state, i) {
+		/* Update hwmode for vblank functions */
+		if (crtc->state->active)
+			crtc->hwmode = crtc->state->adjusted_mode;
+		else
+			crtc->hwmode.crtc_clock = 0;
+	}
+}
+
 /**
  * vc4_atomic_commit - commit validated state object
  * @dev: DRM device
@@ -154,6 +170,9 @@  static int vc4_atomic_commit(struct drm_device *dev,
 
 	drm_atomic_helper_swap_state(dev, state);
 
+	/* Update crtc state */
+	vc4_update_crtc_state(state);
+
 	/*
 	 * Everything below can be run asynchronously without the need to grab
 	 * any modeset locks at all under one condition: It must be guaranteed
-- 
2.7.0