Automation Suite Storage Reclamation Patch.
Issue Description:
UiPath optimizes the Automation Suite cluster storage by reclaiming unused storage to ensure high performance and prevent outages. However, in a few cases, some actively used storage may be reclaimed, resulting in service impact for multi-node setups and potential data loss for single-node clusters.
*This does not affect AKS or EKS versions as longhorn is not present.
Affected Versions
- All Versions from 21.10.3-21.10.10
- Version 22.4.8 and below
- Version 22.10.7 and below
- Version 23.4.2 and below
- No versions in 23.10.X (as long as the migration off longhorn has occurred)
Applying the Patch:
- Copy the script in this under the section Patch Script below
- Copy the script to one of the automation suite server nodes. Name it storageReclamationPatch.sh
- The script can be copied by simply comping the file contents below and using vi on the Linux machine
- Alternatively, make a file on your local machine and transfer that to the Linux machine
- Change the permissions to make the script executable:
-
chmod 755 storageReclamationPatch.sh
- Execute the script
- First, enable scripting so that the execution and output of the script can captured in a log file. If any issue in case any issue is encountered, this will help our support team diagnose the issue. (Script is a Linux program)
- After the patch script is done executing we need to make sure to exit the scripting program.
- Here is how to start the scripting program and execute the patch script:
-
script storageReclamationPatch.log ./storageReclamationPatch.sh
- Then, after the script has executed successfully (or if an issue is encountered, after debugging any debugging or has been completed), execute the 'exit' command to exit the scripting program
-
exit
- After executing the exit command, a log file will be generated call storageReclamationPatch.log. If any issues were encountered please share it with support
- Here is an example of what this would look like (the line starting with // is just a comment about what occurs between executing the storage patch script and exiting the script program):
[admin_1@autosuite storageReclamationPatch]$ script storageReclamationPatch.log Script started, file is storageReclamationPatch.log [admin_1@autosuite storageReclamationPatch]$ ./storageReclamationPatch.sh //Script executes, maybe some debugging is done [admin_1@autosuite storageReclamationPatch]$ exit exit Script done, file is storageReclamationPatch.log
- The script will output if it is executed successfully:
Checking that the patch was applied Patch was applied successfully
- The script only needs to be executed on one server node per environment
- If the script fails or an unresolvable issue is encountered, share the storageReclamationPatch.log with support.
#!/bin/bash
echo ""
echo "Starting Storage Reclamation Patch"
echo ""
echo "Checking that this is a server node"
if [ $(sudo systemctl is-enabled rke2-server) ]; then
echo " This is a server node"
echo ""
else
echo " FATAL: This is not a server node"
echo " This script should only be run on a server node"
echo "Exiting script"
echo ""
exit 1
fi
echo "Generating patch.yaml file at: /tmp/patch.yaml"
if [ -f /tmp/patch.yaml ]; then
echo " FATAL: Patch file: /tmp/patch.yaml file already exists"
echo " Remove existing /tmp/patch.yaml file and re-run script"
echo " Command to remove file: sudo rm -rf /tmp/patch.yaml"
echo "Exiting script"
echo ""
exit 1
fi
sudo cat < /tmp/patch.yaml
spec:
template:
spec:
containers:
- name: longhorn-replica-folder-cleanup
args:
- /host
- /bin/bash
- -ec
- |
while true;
do
set -o pipefail
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin:$INSTALLER_PATH
which kubectl >> /dev/null || {
echo "kubectl not found"
exit 1
}
which jq >> /dev/null || {
echo "jq not found"
exit 1
}
directories=$(find ${LONGHORN_DISK_PATH}/replicas/ -maxdepth 1 -mindepth 1 -type d)
for dir in $directories;
do
basename=$(basename "$dir")
volume_name=${basename%-*}
replica_name=$(kubectl -n longhorn-system get replicas.longhorn.io -o json | jq --arg dir "$basename" '.items[] | select(.spec.dataDirectoryName==$dir) | .metadata.name')
if kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" &>/dev/null;
then
if [[ -z ${replica_name} ]];
then
robust_status=$(kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" -o jsonpath='{.status.robustness}')
if [[ "${robust_status}" == "healthy" || "${robust_status}" == "degraded" ]];
then
echo "Replica not found but Volume found with a valid status (robust status ${robust_status}). Data directory $dir can be deleted"
rm -rf $dir
else
echo "Replica not found but Volume found with robust status ${robust_status}. Need to check if there is still a valid replica before deleting data directory $dir so that the directory is not required for recovery"
fi
else
echo "Volume found and there is a replica using the data directory $dir"
fi
else
if kubectl -n longhorn-system get volumes.longhorn.io "$volume_name" 2>&1 | grep "NotFound";
then
echo "Volume object not found. Data directory $dir can be deleted."
rm -rf $dir
else
echo "Could not fetch volume for $dir"
fi
fi
done
sleep 600
done
EOF
# Checker to see if patch.yaml file was created
if [ -f /tmp/patch.yaml ]; then
echo " /tmp/patch.yaml file created"
echo ""
else
echo " FATAL: /tmp/patch.yaml file not created"
echo " Previous command did not run successfully. Try running: sudo touch /tmp/patch.yaml to see why the command failed to generate the file /tmp/patch.yaml"
echo " If help is needed, please contact UiPath Support"
echo "Exiting script"
echo ""
exit 1
fi
echo "Applying patch.yaml file"
echo ' Executing the command: sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml -n kube-system patch daemonset longhorn-replica-folder-cleanup --patch "$(cat /tmp/patch.yaml)"'
sudo /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml -n kube-system patch daemonset longhorn-replica-folder-cleanup --patch "$(cat /tmp/patch.yaml)" 1>/dev/null
exit_code=$?
echo ""
echo "Checking that the patch was applied"
if [ $exit_code -eq 0 ]; then
echo " Patch was applied successfully"
echo ""
else
echo " FATAL: Patch was not applied successfully"
echo " Previous command did not run successfully. Check previous errors and Contact UiPath Support for help."
echo " Before re-running script please run this command: sudo rm -f /tmp/patch.yaml"
echo "Exiting script"
echo ""
exit 1
fi
echo "Removing /tmp/patch.yaml file"
echo ""
sudo rm -rf /tmp/patch.yaml
echo "System is patched. Make sure to run this tool in all environments on any master node"
echo ""