RFC: console: hack up console_trylock more

Submitted by Daniel Vetter on May 2, 2019, 2:16 p.m.

Details

Message ID 20190502141643.21080-1-daniel.vetter@ffwll.ch
State New
Headers show
Series "RFC: console: hack up console_trylock more" ( rev: 1 ) in DRI devel

Not browsing as part of any series.

Commit Message

Daniel Vetter May 2, 2019, 2:16 p.m.
console_trylock, called from within printk, can be called from pretty
much anywhere. Including try_to_wake_up. Note that this isn't common,
usually the box is in pretty bad shape at that point already. But it
really doesn't help when then lockdep jumps in and spams the logs,
potentially obscuring the real backtrace we're really interested in.
One case I've seen (slightly simplified backtrace):

 Call Trace:
  <IRQ>
  console_trylock+0xe/0x60
  vprintk_emit+0xf1/0x320
  printk+0x4d/0x69
  __warn_printk+0x46/0x90
  native_smp_send_reschedule+0x2f/0x40
  check_preempt_curr+0x81/0xa0
  ttwu_do_wakeup+0x14/0x220
  try_to_wake_up+0x218/0x5f0
  pollwake+0x6f/0x90
  credit_entropy_bits+0x204/0x310
  add_interrupt_randomness+0x18f/0x210
  handle_irq+0x67/0x160
  do_IRQ+0x5e/0x130
  common_interrupt+0xf/0xf
  </IRQ>

This alone isn't a problem, but the spinlock in the semaphore is also
still held while waking up waiters (up() -> __up() -> try_to_wake_up()
callchain), which then closes the runqueue vs. semaphore.lock loop,
and upsets lockdep, which issues a circular locking splat to dmesg.
Worse it upsets developers, since we don't want to spam dmesg with
clutter when the machine is dying already.

Fix this by creating a __down_trylock which only trylocks the
semaphore.lock. This isn't correct in full generality, but good enough
for console_lock:

- there's only ever one console_lock holder, we won't fail spuriously
  because someone is doing a down() or up() while there's still room
  (unlike other semaphores with count > 1).

- console_unlock() has one massive retry loop, which will catch anyone
  who races the trylock against the up(). This makes sure that no
  printk lines will get lost. Making the trylock more racy therefore
  has no further impact.

Also cc'ing John Ogness since perhaps his printk rework fixes this all
properly.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: linux-kernel@vger.kernel.org
---
 include/linux/semaphore.h  |  1 +
 kernel/locking/semaphore.c | 33 +++++++++++++++++++++++++++++++++
 kernel/printk/printk.c     |  2 +-
 3 files changed, 35 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
index 11c86fbfeb98..82f6db6121c3 100644
--- a/include/linux/semaphore.h
+++ b/include/linux/semaphore.h
@@ -40,6 +40,7 @@  extern void down(struct semaphore *sem);
 extern int __must_check down_interruptible(struct semaphore *sem);
 extern int __must_check down_killable(struct semaphore *sem);
 extern int __must_check down_trylock(struct semaphore *sem);
+extern int __must_check __down_trylock(struct semaphore *sem);
 extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
 extern void up(struct semaphore *sem);
 
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 561acdd39960..345123336019 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -143,6 +143,39 @@  int down_trylock(struct semaphore *sem)
 }
 EXPORT_SYMBOL(down_trylock);
 
+/**
+ * __down_trylock - try to acquire the semaphore, without any locking
+ * @sem: the semaphore to be acquired
+ *
+ * Try to acquire the semaphore atomically.  Returns 0 if the semaphore has
+ * been acquired successfully or 1 if it it cannot be acquired.
+ *
+ * NOTE: This return value is inverted from both spin_trylock and
+ * mutex_trylock!  Be careful about this when converting code.
+ *
+ * Unlike mutex_trylock, this function can be used from interrupt context,
+ * and the semaphore can be released by any task or interrupt.
+ *
+ * WARNING: Unlike down_trylock this function doesn't guarantee that that the
+ * semaphore will be acquired if it could, it's best effort only. Use for
+ * down_trylock_console_sem only.
+ */
+int __down_trylock(struct semaphore *sem)
+{
+	unsigned long flags;
+	int count;
+
+	if (!raw_spin_trylock_irqsave(&sem->lock, flags))
+		return 1;
+	count = sem->count - 1;
+	if (likely(count >= 0))
+		sem->count = count;
+	raw_spin_unlock_irqrestore(&sem->lock, flags);
+
+	return (count < 0);
+}
+EXPORT_SYMBOL(__down_trylock);
+
 /**
  * down_timeout - acquire the semaphore within a specified time
  * @sem: the semaphore to be acquired
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 02ca827b8fac..5293803eec1f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -217,7 +217,7 @@  static int __down_trylock_console_sem(unsigned long ip)
 	 * deadlock in printk()->down_trylock_console_sem() otherwise.
 	 */
 	printk_safe_enter_irqsave(flags);
-	lock_failed = down_trylock(&console_sem);
+	lock_failed = __down_trylock(&console_sem);
 	printk_safe_exit_irqrestore(flags);
 
 	if (lock_failed)

Comments

On Thu 2019-05-02 16:16:43, Daniel Vetter wrote:
> console_trylock, called from within printk, can be called from pretty
> much anywhere. Including try_to_wake_up. Note that this isn't common,
> usually the box is in pretty bad shape at that point already. But it
> really doesn't help when then lockdep jumps in and spams the logs,
> potentially obscuring the real backtrace we're really interested in.
> One case I've seen (slightly simplified backtrace):
> 
>  Call Trace:
>   <IRQ>
>   console_trylock+0xe/0x60
>   vprintk_emit+0xf1/0x320
>   printk+0x4d/0x69
>   __warn_printk+0x46/0x90
>   native_smp_send_reschedule+0x2f/0x40
>   check_preempt_curr+0x81/0xa0
>   ttwu_do_wakeup+0x14/0x220
>   try_to_wake_up+0x218/0x5f0
>   pollwake+0x6f/0x90
>   credit_entropy_bits+0x204/0x310
>   add_interrupt_randomness+0x18f/0x210
>   handle_irq+0x67/0x160
>   do_IRQ+0x5e/0x130
>   common_interrupt+0xf/0xf
>   </IRQ>
> 
> This alone isn't a problem, but the spinlock in the semaphore is also
> still held while waking up waiters (up() -> __up() -> try_to_wake_up()
> callchain), which then closes the runqueue vs. semaphore.lock loop,
> and upsets lockdep, which issues a circular locking splat to dmesg.
> Worse it upsets developers, since we don't want to spam dmesg with
> clutter when the machine is dying already.
>
> Fix this by creating a __down_trylock which only trylocks the
> semaphore.lock. This isn't correct in full generality, but good enough
> for console_lock:
> 
> - there's only ever one console_lock holder, we won't fail spuriously
>   because someone is doing a down() or up() while there's still room
>   (unlike other semaphores with count > 1).
>
> - console_unlock() has one massive retry loop, which will catch anyone
>   who races the trylock against the up(). This makes sure that no
>   printk lines will get lost. Making the trylock more racy therefore
>   has no further impact.

To be honest, I do not see how this could solve the problem.

The circular dependency is still there. If the new __down_trylock()
succeeds then console_unlock() will get called in the same context
and it will still need to call up() -> try_to_wake_up().

Note that there are many other console_lock() callers that might
happen in parallel and might appear in the wait queue.

Best Regards,
Petr
On Fri, May 3, 2019 at 5:14 PM Petr Mladek <pmladek@suse.com> wrote:
> On Thu 2019-05-02 16:16:43, Daniel Vetter wrote:
> > console_trylock, called from within printk, can be called from pretty
> > much anywhere. Including try_to_wake_up. Note that this isn't common,
> > usually the box is in pretty bad shape at that point already. But it
> > really doesn't help when then lockdep jumps in and spams the logs,
> > potentially obscuring the real backtrace we're really interested in.
> > One case I've seen (slightly simplified backtrace):
> >
> >  Call Trace:
> >   <IRQ>
> >   console_trylock+0xe/0x60
> >   vprintk_emit+0xf1/0x320
> >   printk+0x4d/0x69
> >   __warn_printk+0x46/0x90
> >   native_smp_send_reschedule+0x2f/0x40
> >   check_preempt_curr+0x81/0xa0
> >   ttwu_do_wakeup+0x14/0x220
> >   try_to_wake_up+0x218/0x5f0
> >   pollwake+0x6f/0x90
> >   credit_entropy_bits+0x204/0x310
> >   add_interrupt_randomness+0x18f/0x210
> >   handle_irq+0x67/0x160
> >   do_IRQ+0x5e/0x130
> >   common_interrupt+0xf/0xf
> >   </IRQ>
> >
> > This alone isn't a problem, but the spinlock in the semaphore is also
> > still held while waking up waiters (up() -> __up() -> try_to_wake_up()
> > callchain), which then closes the runqueue vs. semaphore.lock loop,
> > and upsets lockdep, which issues a circular locking splat to dmesg.
> > Worse it upsets developers, since we don't want to spam dmesg with
> > clutter when the machine is dying already.
> >
> > Fix this by creating a __down_trylock which only trylocks the
> > semaphore.lock. This isn't correct in full generality, but good enough
> > for console_lock:
> >
> > - there's only ever one console_lock holder, we won't fail spuriously
> >   because someone is doing a down() or up() while there's still room
> >   (unlike other semaphores with count > 1).
> >
> > - console_unlock() has one massive retry loop, which will catch anyone
> >   who races the trylock against the up(). This makes sure that no
> >   printk lines will get lost. Making the trylock more racy therefore
> >   has no further impact.
>
> To be honest, I do not see how this could solve the problem.
>
> The circular dependency is still there. If the new __down_trylock()
> succeeds then console_unlock() will get called in the same context
> and it will still need to call up() -> try_to_wake_up().
>
> Note that there are many other console_lock() callers that might
> happen in parallel and might appear in the wait queue.

Hm right. It's very rare we hit this in our CI and I don't know how to
repro otherwise, so just threw this out at the wall to see if it
sticks. I'll try and come up with a new trick then.

Thanks, Daniel
On Mon 2019-05-06 09:11:37, Daniel Vetter wrote:
> On Fri, May 3, 2019 at 5:14 PM Petr Mladek <pmladek@suse.com> wrote:
> > On Thu 2019-05-02 16:16:43, Daniel Vetter wrote:
> > > console_trylock, called from within printk, can be called from pretty
> > > much anywhere. Including try_to_wake_up. Note that this isn't common,
> > > usually the box is in pretty bad shape at that point already. But it
> > > really doesn't help when then lockdep jumps in and spams the logs,
> > > potentially obscuring the real backtrace we're really interested in.
> > > One case I've seen (slightly simplified backtrace):
> > >
> > >  Call Trace:
> > >   <IRQ>
> > >   console_trylock+0xe/0x60
> > >   vprintk_emit+0xf1/0x320
> > >   printk+0x4d/0x69
> > >   __warn_printk+0x46/0x90
> > >   native_smp_send_reschedule+0x2f/0x40
> > >   check_preempt_curr+0x81/0xa0
> > >   ttwu_do_wakeup+0x14/0x220
> > >   try_to_wake_up+0x218/0x5f0
> > >   pollwake+0x6f/0x90
> > >   credit_entropy_bits+0x204/0x310
> > >   add_interrupt_randomness+0x18f/0x210
> > >   handle_irq+0x67/0x160
> > >   do_IRQ+0x5e/0x130
> > >   common_interrupt+0xf/0xf
> > >   </IRQ>
> > >
> > > This alone isn't a problem, but the spinlock in the semaphore is also
> > > still held while waking up waiters (up() -> __up() -> try_to_wake_up()
> > > callchain), which then closes the runqueue vs. semaphore.lock loop,
> > > and upsets lockdep, which issues a circular locking splat to dmesg.
> > > Worse it upsets developers, since we don't want to spam dmesg with
> > > clutter when the machine is dying already.
> > >
> > > Fix this by creating a __down_trylock which only trylocks the
> > > semaphore.lock. This isn't correct in full generality, but good enough
> > > for console_lock:
> > >
> > > - there's only ever one console_lock holder, we won't fail spuriously
> > >   because someone is doing a down() or up() while there's still room
> > >   (unlike other semaphores with count > 1).
> > >
> > > - console_unlock() has one massive retry loop, which will catch anyone
> > >   who races the trylock against the up(). This makes sure that no
> > >   printk lines will get lost. Making the trylock more racy therefore
> > >   has no further impact.
> >
> > To be honest, I do not see how this could solve the problem.
> >
> > The circular dependency is still there. If the new __down_trylock()
> > succeeds then console_unlock() will get called in the same context
> > and it will still need to call up() -> try_to_wake_up().
> >
> > Note that there are many other console_lock() callers that might
> > happen in parallel and might appear in the wait queue.
> 
> Hm right. It's very rare we hit this in our CI and I don't know how to
> repro otherwise, so just threw this out at the wall to see if it
> sticks. I'll try and come up with a new trick then.

Single messages are printed from scheduler via printk_deferred().
WARN() might be solved by introducing printk deferred context,
see the per-cpu variable printk_context.

Best Regards,
Petr
On Mon, May 6, 2019 at 9:48 AM Petr Mladek <pmladek@suse.com> wrote:
> On Mon 2019-05-06 09:11:37, Daniel Vetter wrote:
> > On Fri, May 3, 2019 at 5:14 PM Petr Mladek <pmladek@suse.com> wrote:
> > > On Thu 2019-05-02 16:16:43, Daniel Vetter wrote:
> > > > console_trylock, called from within printk, can be called from pretty
> > > > much anywhere. Including try_to_wake_up. Note that this isn't common,
> > > > usually the box is in pretty bad shape at that point already. But it
> > > > really doesn't help when then lockdep jumps in and spams the logs,
> > > > potentially obscuring the real backtrace we're really interested in.
> > > > One case I've seen (slightly simplified backtrace):
> > > >
> > > >  Call Trace:
> > > >   <IRQ>
> > > >   console_trylock+0xe/0x60
> > > >   vprintk_emit+0xf1/0x320
> > > >   printk+0x4d/0x69
> > > >   __warn_printk+0x46/0x90
> > > >   native_smp_send_reschedule+0x2f/0x40
> > > >   check_preempt_curr+0x81/0xa0
> > > >   ttwu_do_wakeup+0x14/0x220
> > > >   try_to_wake_up+0x218/0x5f0
> > > >   pollwake+0x6f/0x90
> > > >   credit_entropy_bits+0x204/0x310
> > > >   add_interrupt_randomness+0x18f/0x210
> > > >   handle_irq+0x67/0x160
> > > >   do_IRQ+0x5e/0x130
> > > >   common_interrupt+0xf/0xf
> > > >   </IRQ>
> > > >
> > > > This alone isn't a problem, but the spinlock in the semaphore is also
> > > > still held while waking up waiters (up() -> __up() -> try_to_wake_up()
> > > > callchain), which then closes the runqueue vs. semaphore.lock loop,
> > > > and upsets lockdep, which issues a circular locking splat to dmesg.
> > > > Worse it upsets developers, since we don't want to spam dmesg with
> > > > clutter when the machine is dying already.
> > > >
> > > > Fix this by creating a __down_trylock which only trylocks the
> > > > semaphore.lock. This isn't correct in full generality, but good enough
> > > > for console_lock:
> > > >
> > > > - there's only ever one console_lock holder, we won't fail spuriously
> > > >   because someone is doing a down() or up() while there's still room
> > > >   (unlike other semaphores with count > 1).
> > > >
> > > > - console_unlock() has one massive retry loop, which will catch anyone
> > > >   who races the trylock against the up(). This makes sure that no
> > > >   printk lines will get lost. Making the trylock more racy therefore
> > > >   has no further impact.
> > >
> > > To be honest, I do not see how this could solve the problem.
> > >
> > > The circular dependency is still there. If the new __down_trylock()
> > > succeeds then console_unlock() will get called in the same context
> > > and it will still need to call up() -> try_to_wake_up().
> > >
> > > Note that there are many other console_lock() callers that might
> > > happen in parallel and might appear in the wait queue.
> >
> > Hm right. It's very rare we hit this in our CI and I don't know how to
> > repro otherwise, so just threw this out at the wall to see if it
> > sticks. I'll try and come up with a new trick then.
>
> Single messages are printed from scheduler via printk_deferred().
> WARN() might be solved by introducing printk deferred context,
> see the per-cpu variable printk_context.

I convinced myself that I can take the wake_up_process out from under
the spinlock, for the limited case of the console lock. I think that's
a cleaner and more robust fix than leaking printk_context trickery
into the console_unlock code.
-Daniel