Do more with UPS - stop forwards when power's out but keep node up as long as possible! #3972

allyourbankarebelongtous · 2023-06-21T04:47:18Z

allyourbankarebelongtous
Jun 21, 2023

Limit forwards while maximising uptime to reduce forced closes during power outage

This started as me poking around with apcaccess and doing some testing with the various configurations in /etc/apcupsd/apcupsd.conf to see if I could optimize my node's uptime during a power outage (bad weather season where I live, power outages happen).

The overall goal is to minimize forced closures and prevent db corruption during a power outage. The graceful shutdown from UPS already takes care of the second goal, but how can we minimize forced closures? The first step is to see how long we can go with our UPS and our power draw requirements to keep the node online (this means the node, the router, and the modem are all powered by UPS). The second step is to stop forwards as soon as a power loss is detected. The combination of these two will hopefully allow the node to stay online long enough for any pending HTLCs to clear while preventing new ones from coming in. There's a balancing act between staying up as long as possible and a graceful shutdown - too long and your node just dies, risking the channel.db integrity. Clearly, some testing is in order!

Changing UPS settings

Edit /etc/apcupsd/apcupsd.conf:
sudo nano /etc/apcupsd/apcupsd.conf
Change the following values to what you desire (relevant code starts on line 142):

# the first that occurs will cause the initation of a shutdown.
#

# If during a power failure, the remaining battery percentage
# (as reported by the UPS) is below or equal to BATTERYLEVEL,
# apcupsd will initiate a system shutdown.
BATTERYLEVEL 95

# If during a power failure, the remaining runtime in minutes
# (as calculated internally by the UPS) is below or equal to MINUTES,
# apcupsd, will initiate a system shutdown.
MINUTES 15

# If during a power failure, the UPS has run on batteries for TIMEOUT
# many seconds or longer, apcupsd will initiate a system shutdown.
# A value of 0 disables this timer.
#
#  Note, if you have a Smart UPS, you will most likely want to disable
#    this timer by setting it to zero. That way, you UPS will continue
#    on batteries until either the % charge remaing drops to or below BATTERYLEVEL,
#    or the remaining battery runtime drops to or below MINUTES.  Of course,
#    if you are testing, setting this to 60 causes a quick system shutdown
#    if you pull the power plug.
#  If you have an older dumb UPS, you will want to set this to less than
#    the time you know you can run on batteries.
TIMEOUT 0

Shown above are the default values. I've used a few different settings and run some tests, and for my UPS and power draw I've settled on BATTERYLEVEL 30 and MINUTES 60. I left TIMEOUT 0 as it was.

Stop and restart the systemd service:
sudo systemctl stop apcupsd.service
sudo systemctl start apcupsd.service

Check your new values using apcaccess

UPS Test

This first test is a test I ran to see just how capable my UPS was with my node and config. The test results are below.

Methodology:

Unplugged the UPS at full charge. Gathered data from apcaccess command every 60s for analysis. Monitered node uptime continuously. Let test run until node powered off by apcupsd.

Starting data/conditions:

Model: APC UPS Battery Backup and Surge Protector, 600VA Backup Battery Power Supply, BE600M1 Back-UPS with USB Charger Port
UPS Age: about 8 months old
Powering: nuc6i5syh (node), router, modem (fiber)
Initial Charge: UPS was at 100% charge as reported by apcaccess.
Settings: For this test, UPS was set to shutdown the node at UPS estimated 30 minutes of power remaining or 30% of battery charge remaining, whichever came first.

Results:

Time on battery: 77 minutes
Node downtime: Node stayed online until 77 minutes after test start
Min battery power read prior to shutdown: 35%
Shutdown initiated by: Battery charge below low limit (graceful shutdown)
Node recognized offline (channels inactive): 80 minutes

apcupsd logs:

2023-06-19 10:20:00 -0500 Power failure.
2023-06-19 10:20:06 -0500 Running on UPS batteries.
2023-06-19 11:37:50 -0500 Battery charge below low limit.
2023-06-19 11:37:50 -0500 Initiating system shutdown!

Battery power available (%) over time in minutes

UPS reported time left (in minutes) over time in minutes

Discusion

The node stayed online longer than I thought it would and shutdown successfully. It's also clear that although the UPS estimated time remaining does go down predictibly, it goes down faster than the actual time, so - just like the "miles remaining" in your car - I wouldn't rely on it. I'm going to set 60 minutes instead of 30 for my final settings.

Stop Forwards when on Battery

I would like to implement a service that stops accepting new forwards once the system recognizes that it's on battery backup. It should be fairly simple - a timer that runs a service that checks the power state of the UPS machine and if it's on battery power, stops forwards with a bos command. This timer could check every minute or so and when it detects power back up it starts forwards again.

Create a service that stops forwards when activated and allows forwards when deactivated

This relies on the bos limit-forwarding command, so bos must be installed for this to work.

Create a systemd service:
sudo nano /etc/systemd/system/stop-forwards.service
Paste in this code, then save and exit:

[Unit]
Description=Stop all forwards

[Service]
Type=simple
ExecStart=/home/bos/.npm-global/bin/bos limit-forwarding --disable-forwards
User=bos
Group=bos
RemainAfterExit=yes
StandardOutput=append:/var/log/stop-forwards.log
StandardError=append:/var/log/stop-forwards.log

[Install]
WantedBy=multi-user.target

Create a bash file that will turn forwards on or off based on the UPS status and current forwards status

Create check-ups.sh and put it in /home/admin/config.scripts just to keep it with the other raspiblitz scripts:
sudo nano /home/admin/config.scripts/check-ups.sh
Paste in this code, then save and exit:

#!/bin/bash

# determines UPS status and starts/stops forwards accordingly

# get UPS status
status=$(apcaccess | grep STATUS | awk '{print $3}')
echo "UPS status = ${status}" | tee -a /var/log/stop-forwards.log

# get current forwards status
forwards_enabled=$(systemctl status stop-forwards.service | grep Active | grep -c inactive)
echo "Forwards enabled = ${forwards_enabled}" | tee -a /var/log/stop-forwards.log

if [ "${status}" = "ONBATT" ]; then
  # see if forwards are disabled, and if not, disable them
  if [ ! "${forwards_enabled}" = "0" ]; then
    # disable all forwards
    systemctl start stop-forwards.service
    echo "Stopped Forwards" | tee -a /var/log/stop-forwards.log
  fi
else
  # see if forwards are enabled, and if not, enable them
  if [ "${forwards_enabled}" = "0" ]; then
    # enable all forwards
    systemctl stop stop-forwards.service
    echo "Started Forwards" | tee -a /var/log/stop-forwards.log
  fi
fi

Fix the permissions to make it executable:
sudo chmod 755 /home/admin/config.scripts/check-ups.sh

Create a systemd service and timer that will execute check-ups.sh every five minutes

First create a systemd service to execute the script:
sudo nano /etc/systemd/system/check-ups.service
Copy and paste in the following code, then save and exit:

[Unit]
Description=Check UPS and start/stop forwards based on status

[Service]
ExecStart=/bin/bash /home/admin/config.scripts/check-ups.sh
User=root
Group=root

Now create the timer:
sudo nano /etc/systemd/system/check-ups.timer
Copy and paste in the following code, then save and exit:

# this file will run every five minutes to check the status of UPS and start/stop forwards based on status
[Unit]
Description=Check UPS status and start/stop forwards if required

[Timer]
OnCalendar=*-*-* *:00,05,10,15,20,25,30,35,40,45,50,55:00

[Install]
WantedBy=timers.target

Enable and start timer:

sudo systemctl enable check-ups.timer
sudo systemctl start check-ups.timer

Note: Logs for all actions taken by these services (htlcs blocked, forwards stopped/enabled, etc) are available in /var/log/stop-forwards.log

Testing the Service

For this test I don't feel the need to take it all the way to shutdown. The intent is to see if forwards are stopped and then successfully started.

Methodology

To ensure that I had an ample amount of forwards to work with, I had my testnode set up with one channel to my main node and then set up a recurring payment through that channel to a peer of mine so that my main node was forwarding at least one payment per minute. I logged all forwards through the node, and logged the blocked forwards in /var/log/stop-forwards.log using the systemd service above. Rather than monitoring battery power, this time I'm monitoring forwards. I will begin forwarding payments, then after 5 minutes I will unplug the UPS and wait 20 minutes to simulate a 20 minute power outage, then plug UPS back in and let the test continue to run for an additional 10 minutes. The test will be a success if forwards are stopped within five minutes of power loss but all forwards that made it into the node are complete, and then forwards resume within five minutes of power back to the UPS.

Starting data/conditions

Model: APC UPS Battery Backup and Surge Protector, 600VA Backup Battery Power Supply, BE600M1 Back-UPS with USB Charger Port
UPS Age: about 8 months old
Powering: nuc6i5syh (node), router, modem (fiber)
Initial Charge: UPS was at apcacess reported 100% charge.
Settings: UPS was set to shutdown the node at UPS estimated 60 minutes of power remaining or 30% of battery charge remaining, whichever came first.
Forwards: Forwards were configured as above, 1 forward per minute (plus organic traffic, this is my main node after all).

Results

Test began at 09:01 PM (according to the node) and ended at 0936 PM. Here are the forwards passing through the node at the time:

6/20/2023 9:02 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:03 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:04 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:05 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:06 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:07 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:08 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:09 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:10 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:32 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:33 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:34 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:35 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:36 PM,allyourbankarebelongtous,1sats.com
6/20/2023 9:37 PM,allyourbankarebelongtous,1sats.com

You can see that the forwards are happening once per minute. At 9:06 PM, the power is pulled. Because the service that checks if power is lost gets run every five minutes, and because this test was started immediately after it ran, it took an extra five minutes for the forwards to stop, but as anticipated, forwards stopped at 9:11 PM. Here are the logs from /var/log/stop-forwards.log during the time period of the test:

UPS status = ONLINE
Forwards enabled = 1
UPS status = ONBATT
Forwards enabled = 1
Stopped Forwards
limiting_forwards: true
rejection: NoNewHtlcsAccepted 786603x2054x0 → 786752x1370x8
rejection: NoNewHtlcsAccepted 786603x2054x0 → 791127x1827x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 795089x2361x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 793924x1727x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 792847x1772x1
rejection: NoNewHtlcsAccepted 772235x2122x1 → 737511x806x0
rejection: NoNewHtlcsAccepted 745571x727x0 → 737511x802x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 786752x1370x8
UPS status = ONBATT
Forwards enabled = 0
rejection: NoNewHtlcsAccepted 731852x1133x1 → 742011x1947x2
rejection: NoNewHtlcsAccepted 786603x2054x0 → 786752x1370x8
rejection: NoNewHtlcsAccepted 786603x2054x0 → 791127x1827x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 795089x2361x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 793924x1727x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 784771x1501x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 772235x2122x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 784771x1501x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 745571x727x0
rejection: NoNewHtlcsAccepted 731852x1133x1 → 750638x831x2
UPS status = ONBATT
Forwards enabled = 0
rejection: NoNewHtlcsAccepted 731852x1133x1 → 786320x1357x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 786752x1370x8
rejection: NoNewHtlcsAccepted 786603x2054x0 → 792847x1772x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 780203x740x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 726131x1436x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 774489x1447x0
UPS status = ONBATT
Forwards enabled = 0
rejection: NoNewHtlcsAccepted 731852x1133x1 → 731721x545x1
rejection: NoNewHtlcsAccepted 731852x1133x1 → 754865x1013x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 786752x1370x8
rejection: NoNewHtlcsAccepted 786603x2054x0 → 791127x1827x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 795089x2361x1
rejection: NoNewHtlcsAccepted 786603x2054x0 → 793924x1727x1
rejection: NoNewHtlcsAccepted 795089x2361x1 → 741455x2724x1
rejection: NoNewHtlcsAccepted 745571x727x0 → 759589x2182x0
UPS status = ONLINE
Forwards enabled = 0
Started Forwards
UPS status = ONLINE
Forwards enabled = 1

Again, you can see that the UPS status was ONLINE for the first five minutes of the test. Then it was detected to be ONBATT, and forwards were disabled. For the next 20 minutes, forwards were disabled for the node. Power was back on at 9:26 PM, and forwards resumed five minutes later at 9:31 PM, after which five minutes of forwards were logged before stopping the test.

Discussion

It seems to work pretty well! The next improvement will be to include stopping LNDg rebalance attempts, as I run LNDg and there were a few rebalances happening during the time period where the node should be sterile. This should be as easy as including systemctl stop rebalancer-lndg.timer, but clearly more testing is in order!

Please share your thoughts/contributions below. Remember that your mileage may vary and you should test your own UPS capability before implementing something that may compromise channel.db integrity in favor of minimizing forced closures. Happy Routing!

feelancer21 · 2023-06-21T20:01:23Z

feelancer21
Jun 21, 2023

Very nice. One remark: If you are running circuitbreaker you have to stop it before starting bos limit-forwarding.

1 reply

allyourbankarebelongtous Jun 21, 2023
Author

Yep, good info for folks running circuit breaker. To stop the rebalancers for lndg just need to add systemctl stop rebalancer-lndg.timer (adjust if your lndg rebalancers systemd service is named differently) and systemctl start rebalancer-lndg.timer to check-ups.sh above under the respective stop-forward.service cmds.

allyourbankarebelongtous · 2023-06-23T03:27:48Z

allyourbankarebelongtous
Jun 23, 2023
Author

Here's an example of check-ups.sh that stops lndg's rebalancer and starts it as well when appropriate (if you rebalancer systemd timer is named something different, adjust the name of the service in this script. If you used the raspiblitz menu to install lndg, this is the appropriate script to use).

#!/bin/bash

# determines UPS status and starts/stops forwards accordingly

# get UPS status
status=$(apcaccess | grep STATUS | awk '{print $3}')
echo "UPS status = ${status}" | tee -a /var/log/stop-forwards.log

# get current forwards status
forwards_enabled=$(systemctl status stop-forwards.service | grep Active | grep -c inactive)
echo "Forwards enabled = ${forwards_enabled}" | tee -a /var/log/stop-forwards.log

if [ "${status}" = "ONBATT" ]; then
  # see if forwards are disabled, and if not, disable them
  if [ ! "${forwards_enabled}" = "0" ]; then
    # disable all forwards
    systemctl start stop-forwards.service
    # stop rebalances
    systemctl stop rebalancer-lndg.timer
    echo "Stopped Forwards" | tee -a /var/log/stop-forwards.log
  fi
else
  # see if forwards are enabled, and if not, enable them
  if [ "${forwards_enabled}" = "0" ]; then
    # enable all forwards
    systemctl stop stop-forwards.service
    # start rebalances
    systemctl start rebalancer-lndg.timer
    echo "Started Forwards" | tee -a /var/log/stop-forwards.log
  fi
fi

0 replies

ashjkl159 · 2023-12-15T09:30:07Z

ashjkl159
Dec 15, 2023

This is very good idea to stop forwards when on UPS battery but how do I stop lnd.servise and bitcoind.service before UPS initiates computer shutdown?

0 replies

rootzoll · 2023-12-18T20:28:23Z

rootzoll
Dec 18, 2023
Maintainer

Great research. I created an issue here #4326 which can put this into release planning. Any PR referring this issue is welcome.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do more with UPS - stop forwards when power's out but keep node up as long as possible! #3972

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Do more with UPS - stop forwards when power's out but keep node up as long as possible! #3972

allyourbankarebelongtous Jun 21, 2023

Limit forwards while maximising uptime to reduce forced closes during power outage

Table of Contents

Changing UPS settings

UPS Test

Methodology:

Starting data/conditions:

Results:

apcupsd logs:

Battery power available (%) over time in minutes

UPS reported time left (in minutes) over time in minutes

Discusion

Stop Forwards when on Battery

Create a service that stops forwards when activated and allows forwards when deactivated

Create a bash file that will turn forwards on or off based on the UPS status and current forwards status

Create a systemd service and timer that will execute check-ups.sh every five minutes

Enable and start timer:

Testing the Service

Methodology

Starting data/conditions

Results

Discussion

Replies: 4 comments · 1 reply

feelancer21 Jun 21, 2023

allyourbankarebelongtous Jun 21, 2023 Author

allyourbankarebelongtous Jun 23, 2023 Author

ashjkl159 Dec 15, 2023

rootzoll Dec 18, 2023 Maintainer

allyourbankarebelongtous
Jun 21, 2023

Replies: 4 comments 1 reply

feelancer21
Jun 21, 2023

allyourbankarebelongtous Jun 21, 2023
Author

allyourbankarebelongtous
Jun 23, 2023
Author

ashjkl159
Dec 15, 2023

rootzoll
Dec 18, 2023
Maintainer