Description:
In order to perform maintenance with the storage node, it is necessary to switch all CSs into maintenance mode before shutting down.
# vstorage -c <cluster_name> rm-cs <CS_ID> --maintenance
Symptoms:
The default time allocated for the maintenance window with the nodes is 30 minutes.
If the Node is unavailable in the storage network for longer, issued chunks are excluded from replication. After turning on this Node, CSs will be reset to 0 and start replicating from scratch.
What should be done if maintenance takes more than 30 minutes? For example, if it's necessary to shut down and physically relocate the server.
Resolution:
Before starting the operations, change the timeout mds.wd.offline_tout_mnt
where the maximum allowed value can be set to 24 hours. However, make sure that there are no failures will occur during this time to avoid data loss.
After you have switched CS to maintenance mode it becomes read-only, writing process is stopped. Proceed with the maintenance and return to default values after operations are complete.
Actual steps:
- Check and set the timeout value
To change maintenance timeout up to 7 hours:# vstorage -c <cluster_name> get-config | fgrep offline_tout_mnt
# vstorage -c <cluster_name> set-config mds.wd.offline_tout_mnt=25200000
- Put every CS on the specified HN into maintenance:
# vstorage -c <cluster_name> rm-cs --maintenance <CS_ID>
- There is a known restriction that the state of CS should become
solid=1
before switching the node off.
Command to verify the state, when CS has been put into maintenance:
# vstorage -c <cluster_name> stat --xml | grep solid -B3
- Shutdown and relocate the Node. When the maintenance is complete boot the Node, and verify that the storage network is operational. Vstorage client should already be working.
- Revert CSs from maintenance:
# vstorage -c <cluster_name> rm-cs --maintenance --revert <CS_ID>
- Return timeout to default:
# vstorage -c <cluster_name> set-config mds.wd.offline_tout_mnt=1800000