Do more with UPS - stop forwards when power's out but keep node up as long as possible! #3972
Replies: 4 comments 1 reply
-
Very nice. One remark: If you are running |
Beta Was this translation helpful? Give feedback.
-
Here's an example of check-ups.sh that stops lndg's rebalancer and starts it as well when appropriate (if you rebalancer systemd timer is named something different, adjust the name of the service in this script. If you used the raspiblitz menu to install lndg, this is the appropriate script to use).
|
Beta Was this translation helpful? Give feedback.
-
This is very good idea to stop forwards when on UPS battery but how do I stop lnd.servise and bitcoind.service before UPS initiates computer shutdown? |
Beta Was this translation helpful? Give feedback.
-
Great research. I created an issue here #4326 which can put this into release planning. Any PR referring this issue is welcome. |
Beta Was this translation helpful? Give feedback.
-
Limit forwards while maximising uptime to reduce forced closes during power outage
This started as me poking around with apcaccess and doing some testing with the various configurations in /etc/apcupsd/apcupsd.conf to see if I could optimize my node's uptime during a power outage (bad weather season where I live, power outages happen).
The overall goal is to minimize forced closures and prevent db corruption during a power outage. The graceful shutdown from UPS already takes care of the second goal, but how can we minimize forced closures? The first step is to see how long we can go with our UPS and our power draw requirements to keep the node online (this means the node, the router, and the modem are all powered by UPS). The second step is to stop forwards as soon as a power loss is detected. The combination of these two will hopefully allow the node to stay online long enough for any pending HTLCs to clear while preventing new ones from coming in. There's a balancing act between staying up as long as possible and a graceful shutdown - too long and your node just dies, risking the channel.db integrity. Clearly, some testing is in order!
Table of Contents
Changing UPS settings
Edit /etc/apcupsd/apcupsd.conf:
sudo nano /etc/apcupsd/apcupsd.conf
Change the following values to what you desire (relevant code starts on line 142):
Shown above are the default values. I've used a few different settings and run some tests, and for my UPS and power draw I've settled on BATTERYLEVEL 30 and MINUTES 60. I left TIMEOUT 0 as it was.
Stop and restart the systemd service:
sudo systemctl stop apcupsd.service
sudo systemctl start apcupsd.service
Check your new values using
apcaccess
UPS Test
This first test is a test I ran to see just how capable my UPS was with my node and config. The test results are below.
Methodology:
Unplugged the UPS at full charge. Gathered data from apcaccess command every 60s for analysis. Monitered node uptime continuously. Let test run until node powered off by apcupsd.
Starting data/conditions:
Results:
Time on battery: 77 minutes
Node downtime: Node stayed online until 77 minutes after test start
Min battery power read prior to shutdown: 35%
Shutdown initiated by: Battery charge below low limit (graceful shutdown)
Node recognized offline (channels inactive): 80 minutes
apcupsd logs:
2023-06-19 10:20:00 -0500 Power failure.
2023-06-19 10:20:06 -0500 Running on UPS batteries.
2023-06-19 11:37:50 -0500 Battery charge below low limit.
2023-06-19 11:37:50 -0500 Initiating system shutdown!
Battery power available (%) over time in minutes
UPS reported time left (in minutes) over time in minutes
Discusion
The node stayed online longer than I thought it would and shutdown successfully. It's also clear that although the UPS estimated time remaining does go down predictibly, it goes down faster than the actual time, so - just like the "miles remaining" in your car - I wouldn't rely on it. I'm going to set 60 minutes instead of 30 for my final settings.
Stop Forwards when on Battery
I would like to implement a service that stops accepting new forwards once the system recognizes that it's on battery backup. It should be fairly simple - a timer that runs a service that checks the power state of the UPS machine and if it's on battery power, stops forwards with a bos command. This timer could check every minute or so and when it detects power back up it starts forwards again.
Create a service that stops forwards when activated and allows forwards when deactivated
This relies on the bos limit-forwarding command, so bos must be installed for this to work.
Create a systemd service:
sudo nano /etc/systemd/system/stop-forwards.service
Paste in this code, then save and exit:
Create a bash file that will turn forwards on or off based on the UPS status and current forwards status
Create check-ups.sh and put it in /home/admin/config.scripts just to keep it with the other raspiblitz scripts:
sudo nano /home/admin/config.scripts/check-ups.sh
Paste in this code, then save and exit:
Fix the permissions to make it executable:
sudo chmod 755 /home/admin/config.scripts/check-ups.sh
Create a systemd service and timer that will execute check-ups.sh every five minutes
First create a systemd service to execute the script:
sudo nano /etc/systemd/system/check-ups.service
Copy and paste in the following code, then save and exit:
Now create the timer:
sudo nano /etc/systemd/system/check-ups.timer
Copy and paste in the following code, then save and exit:
Enable and start timer:
sudo systemctl enable check-ups.timer
sudo systemctl start check-ups.timer
Note: Logs for all actions taken by these services (htlcs blocked, forwards stopped/enabled, etc) are available in /var/log/stop-forwards.log
Testing the Service
For this test I don't feel the need to take it all the way to shutdown. The intent is to see if forwards are stopped and then successfully started.
Methodology
To ensure that I had an ample amount of forwards to work with, I had my testnode set up with one channel to my main node and then set up a recurring payment through that channel to a peer of mine so that my main node was forwarding at least one payment per minute. I logged all forwards through the node, and logged the blocked forwards in /var/log/stop-forwards.log using the systemd service above. Rather than monitoring battery power, this time I'm monitoring forwards. I will begin forwarding payments, then after 5 minutes I will unplug the UPS and wait 20 minutes to simulate a 20 minute power outage, then plug UPS back in and let the test continue to run for an additional 10 minutes. The test will be a success if forwards are stopped within five minutes of power loss but all forwards that made it into the node are complete, and then forwards resume within five minutes of power back to the UPS.
Starting data/conditions
Results
Test began at 09:01 PM (according to the node) and ended at 0936 PM. Here are the forwards passing through the node at the time:
You can see that the forwards are happening once per minute. At 9:06 PM, the power is pulled. Because the service that checks if power is lost gets run every five minutes, and because this test was started immediately after it ran, it took an extra five minutes for the forwards to stop, but as anticipated, forwards stopped at 9:11 PM. Here are the logs from /var/log/stop-forwards.log during the time period of the test:
Again, you can see that the UPS status was ONLINE for the first five minutes of the test. Then it was detected to be ONBATT, and forwards were disabled. For the next 20 minutes, forwards were disabled for the node. Power was back on at 9:26 PM, and forwards resumed five minutes later at 9:31 PM, after which five minutes of forwards were logged before stopping the test.
Discussion
It seems to work pretty well! The next improvement will be to include stopping LNDg rebalance attempts, as I run LNDg and there were a few rebalances happening during the time period where the node should be sterile. This should be as easy as including
systemctl stop rebalancer-lndg.timer
, but clearly more testing is in order!Please share your thoughts/contributions below. Remember that your mileage may vary and you should test your own UPS capability before implementing something that may compromise channel.db integrity in favor of minimizing forced closures. Happy Routing!
Beta Was this translation helpful? Give feedback.
All reactions