RFC: console: hack up console_lock more v3

Submitted by Petr Mladek on May 10, 2019, 9:15 a.m.

Details

Message ID 20190510091537.44e3aeb7gcrcob76@pathway.suse.cz
State New
Headers show
Series "RFC: console: hack up console_lock more v3" ( rev: 4 ) in DRI devel

Not browsing as part of any series.

Commit Message

Petr Mladek May 10, 2019, 9:15 a.m.
On Thu 2019-05-09 18:43:12, Daniel Vetter wrote:
> One thing to keep in mind is that the kernel is already dying, and
> things will come crashing down later on

This is important information. I havn't seen it mentioned earlier.

> (I've seen this only in dmesg
> tails capture in pstore in our CI, i.e. the box died for good). I just
> want to make sure that the useful information isn't overwritten by
> more dmesg splats that happen as a consequence of us somehow trying to
> run things on an offline cpu. Once console_unlock has completed in
> your above backtrace and the important stuff has gone out I'm totally
> fine with the kernel just dying. Pulling the wake_up_process out from
> under the semaphore.lock is enough to prevent lockdep from dumping
> more stuff while we're trying to print the important things,

With the more stuff you mean the lockdep splat? If yes, it might
make sense to call debug_locks_off() earlier in panic().


> and I think the untangling of the locking hiararchy is useful irrespective
> of this lockdep splat. Plus Peter doesn't sound like he likes to roll
> out more printk_deferred kind of things.
>
> But if you think I should do the printk_deferred thing too I can look
> into that. Just not quite sure what that's supposed to look like now.

Your patch might remove the particular lockdep splat. It might be
worth it (Peter mentioned also an optimization effect). Anyway
it will not prevent the deadlock.

The only way to avoid the deadlock is to use printk_deferred()
with the current printk() code.


Finally, I have recently worked on similar problem with dying system.
I sent the following patch for testing. I wonder if it might be
acceptable upstream:


From: Petr Mladek <pmladek@suse.com>
Subject: sched/x86: Do not warn about offline CPUs when all are being stopped
Patch-mainline: No, just for testing
References: bsc#1104406

The warning about rescheduling offline CPUs cause dealock when
the CPUs need to get stopped using NMI. It might happen with
logbuf_lock, locks used by console drivers, especially tty.
But it might also be caused by a registered kernel message
dumper, for example, pstore.

The warning is pretty common when there is a high load and
CPUs are being stopped by native_stop_other_cpus(). But
they are not really useful in this context. And they scrolls
the really important messages off the screen.

We need to fix printk() in the long term. But disabling
the message looks reasonable at least in the meantime.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 arch/x86/kernel/smp.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -124,7 +124,8 @@  static bool smp_no_nmi_ipi = false;
  */
 static void native_smp_send_reschedule(int cpu)
 {
-	if (unlikely(cpu_is_offline(cpu))) {
+	if (unlikely(cpu_is_offline(cpu) &&
+		     atomic_read(&stopping_cpu) < 0)) {
 		WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu);
 		return;
 	}

Comments

On (05/10/19 11:15), Petr Mladek wrote:
[..]
>  arch/x86/kernel/smp.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -124,7 +124,8 @@ static bool smp_no_nmi_ipi = false;
>   */
>  static void native_smp_send_reschedule(int cpu)
>  {
> -	if (unlikely(cpu_is_offline(cpu))) {
> +	if (unlikely(cpu_is_offline(cpu) &&
> +		     atomic_read(&stopping_cpu) < 0)) {
>  		WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu);
>  		return;
>  	}

I think we need to remove CPU which we IPI_STOP from
idle_cpus_mask. So then scheduler won't pick up stopped
CPUs (cpumask_first(nohz.idle_cpus_mask)) and attempt
rescheduling to them. It seems that currently
native_stop_other_cpus() doesn't do that.

	-ss