Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchdog hangs on shutdown/restart #4534

Open
rootzoll opened this issue Apr 7, 2024 · 6 comments
Open

watchdog hangs on shutdown/restart #4534

rootzoll opened this issue Apr 7, 2024 · 6 comments

Comments

@rootzoll
Copy link
Collaborator

rootzoll commented Apr 7, 2024

This is a sometimes happening error that the RaspiBlitz hangs in shutdown with the message that watchdog cannot stop.

signal-2024-04-07-162116_002

This bug is under investigation and we need your help how to reproduce this problem to fix it. Its not a show stopper for release but it would be nice to get rid of it.

So if you experience it, please report:

  • what sd card image did you use (version, release candidate, min or fatpack)
  • in what state and on which action did the reboot/shutdown happen (during setup, after setup, etc)
  • and what bonus apps do you have installed.
@rootzoll rootzoll added the bug - unconfirmed Something isn't working - not (yet) reproduced label Apr 7, 2024
@rootzoll rootzoll changed the title Hangs on shutdown/restart watchdog hangs on shutdown/restart Apr 7, 2024
@rootzoll
Copy link
Collaborator Author

rootzoll commented Apr 7, 2024

Just for deeper research there are the running services on min & fatpack before setup to compare:

v1.11.0rc6-min:systemctl list-units --type=service --state=running

  UNIT                      LOAD   ACTIVE SUB     DESCRIPTION
  avahi-daemon.service      loaded active running Avahi mDNS/DNS-SD Stack
  cron.service              loaded active running Regular background program processing daemon
  dbus.service              loaded active running D-Bus System Message Bus
  fail2ban.service          loaded active running Fail2Ban Service
  getty@tty1.service        loaded active running Getty on tty1
  i2pd.service              loaded active running I2P Router written in C++
  ModemManager.service      loaded active running Modem Manager
  NetworkManager.service    loaded active running Network Manager
  nginx.service             loaded active running A high performance web server and a reverse proxy server
  polkit.service            loaded active running Authorization Manager
  redis-server.service      loaded active running Advanced key-value store
  rsyslog.service           loaded active running System Logging Service
  rtkit-daemon.service      loaded active running RealtimeKit Scheduling Policy Service
  smartmontools.service     loaded active running Self Monitoring and Reporting Technology (SMART) Daemon
  ssh.service               loaded active running OpenBSD Secure Shell server
  systemd-journald.service  loaded active running Journal Service
  systemd-logind.service    loaded active running User Login Management
  systemd-timesyncd.service loaded active running Network Time Synchronization
  systemd-udevd.service     loaded active running Rule-based Manager for Device Events and Files
  tor@default.service       loaded active running Anonymizing overlay network for TCP
  triggerhappy.service      loaded active running triggerhappy global hotkey daemon
  user@1000.service         loaded active running User Manager for UID 1000
  user@1001.service         loaded active running User Manager for UID 1001
  vnstat.service            loaded active running vnStat network traffic monitor
  wpa_supplicant.service    loaded active running WPA supplicant

v1.11.0rc6-fat:systemctl list-units --type=service --state=running

  UNIT                          LOAD   ACTIVE SUB     DESCRIPTION
  avahi-daemon.service          loaded active running Avahi mDNS/DNS-SD Stack
  blitzapi.service              loaded active running BlitzBackendAPI
  cron.service                  loaded active running Regular background program processing daemon
  dbus.service                  loaded active running D-Bus System Message Bus
  fail2ban.service              loaded active running Fail2Ban Service
  getty@tty1.service            loaded active running Getty on tty1
  i2pd.service                  loaded active running I2P Router written in C++
  ModemManager.service          loaded active running Modem Manager
  NetworkManager.service        loaded active running Network Manager
  nginx.service                 loaded active running A high performance web server and a reverse proxy server
  polkit.service                loaded active running Authorization Manager
  redis-server.service          loaded active running Advanced key-value store
  rsyslog.service               loaded active running System Logging Service
  rtkit-daemon.service          loaded active running RealtimeKit Scheduling Policy Service
  serial-getty@ttyAMA10.service loaded active running Serial Getty on ttyAMA10
  smartmontools.service         loaded active running Self Monitoring and Reporting Technology (SMART) Daemon
  ssh.service                   loaded active running OpenBSD Secure Shell server
  systemd-journald.service      loaded active running Journal Service
  systemd-logind.service        loaded active running User Login Management
  systemd-timesyncd.service     loaded active running Network Time Synchronization
  systemd-udevd.service         loaded active running Rule-based Manager for Device Events and Files
  tor@default.service           loaded active running Anonymizing overlay network for TCP
  triggerhappy.service          loaded active running triggerhappy global hotkey daemon
  user@1000.service             loaded active running User Manager for UID 1000
  user@1001.service             loaded active running User Manager for UID 1001
  vnstat.service                loaded active running vnStat network traffic monitor

@rootzoll rootzoll added this to the 1.11.0 Release milestone Apr 7, 2024
@rootzoll
Copy link
Collaborator Author

rootzoll commented Apr 7, 2024

Something to try out --> sudo nano /etc/systemd/system.conf to activate the option RebootWatchdogSec=3min

here are some details on this option:

Description: This setting specifies the timeout for the reboot watchdog. If a reboot takes longer than the specified time, the system will be hard-rebooted. This is useful for ensuring that the system recovers from a state where it has begun the reboot process but gets stuck before completion.

Usage: Set to a time value, such as 10min. If a reboot process exceeds this duration, the watchdog triggers a system reboot to recover from potential hang-ups during shutdown or reboot sequences.

The question is .. can you fight watchdog with watchdog?

@rootzoll
Copy link
Collaborator Author

rootzoll commented Apr 8, 2024

OK activating now Watchdog with RebootWatchdogSec on v1.11rc7 - please report if you still have the hanging shutdown/reboot that take longer than 3min after this.

@rootzoll rootzoll added final testing was fixed - needs testing and removed bug - unconfirmed Something isn't working - not (yet) reproduced labels Apr 8, 2024
@rootzoll rootzoll removed the final testing was fixed - needs testing label Apr 16, 2024
@rootzoll
Copy link
Collaborator Author

so far rc7 stable ... closing for final release

@openoms openoms reopened this Apr 24, 2024
@openoms
Copy link
Collaborator

openoms commented Apr 24, 2024

Reopening as this is still happening occasionally eg https://t.me/raspiblitz/142982 + reported by @fusion44
Some ideas:

for the watchdog problem can try reducing the TimeoutStopSec in:

# An extended timeout period is needed to allow for database compaction
# and other time intensive operations during startup. We also extend the
# stop timeout to ensure graceful shutdowns of lnd.
TimeoutStartSec=1200
TimeoutStopSec=3600

with the command:

sudo systemctl edit --full lnd

the watchdog service is set to:
RuntimeWatchdogSec=600s in /etc/systemd/system.conf

I feel that should be closer to 3600 if we wan to keep patient with LND

@LOCHER-21
Copy link

Raspberry Pi 5 Model B Rev 1.0 is rebooting as expected on 1.11.0
Raspberry Pi 4 Model B Rev 1.5 got this issue when rebooting till update to 1.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants