How do I fix kernel NMI watchdog bug soft lockup?

How do I fix kernel NMI watchdog bug soft lockup?

To resolve this behavior, perform the following steps as root user:

  1. Edit the file ‘/etc/sysctl.conf’ and add the following line at the end. “kernel.watchdog_thresh=30”
  2. Save and Exit.
  3. Reboot machine.

What causes CPU lockup?

CPU Lockup Lockup is broadly defined as the symptom of a function or task using the CPU and not releasing it for a period of time. The lockup behavior is more often caused by an application use case and occurs during firmware code development, engineering evaluation, or at production programming.

What is CPU hard lockup?

A ‘hard lockup’ is defined as a bug that causes the CPU to loop in kernel mode for more than 10 seconds […], without letting other interrupts have a chance to run. In other words, during a soft lockup a kernel task won’t unlock the CPU, like in the good old DOS days. So “something or other” is left “hanging”.

What is Watchdog_thresh?

The watchdog task is a high priority kernel thread that updates a timestamp every time it is scheduled. The period of the hrtimer is 2*watchdog_thresh/5, which means it has two or three chances to generate an interrupt before the hardlockup detector kicks in.

What is NMI watchdog bug soft lockup?

A ‘soft lockup’ is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds without giving other tasks a chance to run. The watchdog daemon will send an non-maskable interrupt (NMI) to all CPUs in the system who, in turn, print the stack traces of their currently running tasks.

What is Ksoftirqd?

A ksoftirqd is a per-CPU kernel thread raised to handle unserved software interrupts: In the preceding top sample from my personal computer, you can see ksoftirqd/n entries, where n is the CPU number that the ksoftirqds runs on.

Why are there so many Kworker processes?

“kworker” is a placeholder process for kernel worker threads, which perform most of the actual processing for the kernel, especially in cases where there are interrupts, timers, I/O, etc. These typically correspond to the vast majority of any allocated “system” time to running processes.

What is SysRq trigger?

The magic SysRq key is a key combination understood by the Linux kernel, which allows the user to perform various low-level commands regardless of the system’s state. It is often used to recover from freezes, or to reboot a computer without corrupting the filesystem.

What causes nmi watchdog to trigger kernel panic?

Root Cause. NMI Watchdog detected LOCKUP on CPU 14 and triggered the kernel panic. The cause of the NMI watchdog crashing the system is due to having IRQs disabled for too long. The timeout for a NMI watchdog to trigger is 30 seconds.

Why does watchdog detect hard lockup in the kernel?

Most likely what you see is hardlockup symptom – as you can see at the watchdog_overflow_callback implementation (and any other function from the stack) in the kernel, cpu is blocked whilst igb wanted to write some log (most likely because CONFIG_IP_ROUTE_VERBOSE was set).

Why does the watchdog timer Trigger a reboot?

It looks like “writing big files” cause the cpu hang so the watchdog timer triggers the reboot. But this should not happen so we want to check if this is a bug or something else. I’m just making scp via builtin USB interface.