Pre-checks before performing the upgrade task

system · January 3, 2025, 4:41pm

Required Pre-checks before performing major upgrades of Automation Suite

Scenario:

This article seeks to cover useful pre-requisite checks that can be performed on an Automation Suite cluster prior to upgrade, with the assumption that:

a)The cluster is being directly upgraded to v23.4.X from a prior version

b)The cluster is currently utilizing ceph in-cluster storage and will continue to do so

Substantive Checks:

Always ensure take an ON-DEMAND BACKUP before engaging any significant cluster activities, such as Upgrades.

Check Backups status if Backup taken by Velero:

/path/to/installerdir/configureUiPathAS.sh snapshot list

Check Pods and Application health:

Run the following to check all Pods and Applications health, ensure that all should be in healthy state.

If any critical pod such as longhorn , rook-ceph or an application specific pod, is in a CrashLoopBackOff or Terminated State, address that issue before proceeding with the upgrade.

export KUBECONFIG="/etc/rancher/rke2/rke2.yaml" && export PATH="$PATH:/usr/local/bin:/var/lib/rancher/rke2/bin"

kubectl get pods -A -o wide

kubectl get application -A

Check Required Space for Ceph:

The following command shows how much space Ceph is currently utilizing:

ceph_object_size=$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status --format json | jq -r '.pgmap.data_bytes')

echo "You need '$(numfmt --to=iec-i $ceph_object_size)' storage space"

The current ceph raw disk should have extra buffer space to cater for growth, knowing the consumed space is also useful for knowing how much space to cater for a potential migration to S3-compatible external objectstore

To check ROBUSTNESS of Volumes:

Prior to initiating the upgrade activity, it is imperative to verify the robustness of all volumes. If any volume is found to be in a non-healthy state, it must be addressed before proceeding with the upgrade.

kubectl get volume.longhorn.io -n longhorn-system

All volumes should be attached and healthy. Unexpected behavior can be investigated in the Rancher console at URL https://monitoring.

Check Longhorn Replicas:

kubectl get replicas.longhorn.io -A -o wide

All Replicas should be in Running state.

Check Volumes Attachments:

kubectl get volumeattachments -A

All Volumes Attached State should be TRUE

Check Ceph Health/PG Status:

kubectl -n rook-ceph exec -i deploy/rook-ceph-tools -- ceph status

Status should show HEALTH_OK

PGS status also should show active+clean

Check Ceph OSD Status:

kubectl -n rook-ceph exec -i deploy/rook-ceph-tools -- ceph osd df

Status should indicate as “up”.

Ceph Health Detail:

kubectl -n rook-ceph exec -i deploy/rook-ceph-tools -- ceph health detail

Expect to see check for HEALTH_OK

To check connectivity between nodes:

all_ingress_data=$(kubectl -n istio-system get pod -l app=istio-ingressgateway -o json)

all_ingress_ips=$(jq -r '.items[] | .status.podIP' <<< "${all_ingress_data}")

all_ingress_pod_name=$(jq -r '.items[] | .metadata.name' <<< "${all_ingress_data}")

while IFS='\n' read pod_name

do

echo "FROM: ${pod_name}"

while IFS='\n' read pod_ip

do

echo "To: ${pod_ip} HTTP_CODE: $(kubectl -n "istio-system" exec "${pod_name}" -- curl -m 10 -w "%{http_code}\n" --silent --output /dev/null "${pod_ip}:15021/healthz/ready")"

done <<< "${all_ingress_ips}"

done <<< "${all_ingress_pod_name}"

Status should be OK with 200 code, which indicate that all nodes have proper connectivity.

Topic	Replies	Views
Automation Suite - Upgrade failure with Longhorn is not ready error Knowledge Base automation_suite_deployment_and_operatio	29	January 3, 2025
Automation Suite 잔여 저장 공간을 확인하는 방법 Korea RPA 개발자를 위한 공간 automation_suite , storage-bucket , pvc	260	March 19, 2024
Automation Suite - How To Check Automation Suite Product's Installed Version In Kubernetes Cluster? Knowledge Base automation_suite_deployment_and_operatio	778	August 8, 2023
Automation Suite- Failing to create cluster snapshot EKS Knowledge Base automation_suite_deployment_and_operatio	16	January 3, 2025
Automation Suite Storage Reclamation Patch Knowledge Base automation_suite_deployment_and_operatio	10	January 3, 2025

Pre-checks before performing the upgrade task

Required Pre-checks before performing major upgrades of Automation Suite

Related topics