Troubleshooting AS 23.4 Cluster after Ungraceful Shutdown.
Scenario: An Automation Suite 23.4 cluster had an ungraceful shutdown.
Symptoms:
- Pods and Applications giving Helm Pull Errors with Images in registry. not found.
Excerpt below:
“rpc error: code = Unknown desc = ‘helm pull oci:///registry……/helm/alerts/version 2023.4.2 –destination/tmp/…’ failed exit status 1: Error: registry…… not found
See attached sample screenshots:
Registry. is still resolvable at this point.
- Re-run the registry upload command used in new installations:
./configureUiPathAS.sh registry upload --offline-bundle /uipath/tmp/as.tar.gz --offline-tmp-folder /uipath/tmp
- Redis Secret not found
The second issue encountered post-restart was related to Redis.
- Multiple failed pods across different applications had identical events which were a combination of being unable to attach/ mount volumes, leading to timeouts; and the redb-redis-cluster-db secret was also not found.
- Delete Redis and resync-ing Redis via ArgoCD, this is a possible problem which has been documented (Redis Probe Failure). The deletion commands are,
kubectl delete redb -n redis-system redis-cluster-db --force --grace-period=0 &
kubectl delete rec -n redis-system redis-cluster --force --grace-period=0 &
kubectl patch redb -n redis-system redis-cluster-db --type=json -p '[{"op":"remove","path":"/metadata/finalizers","value":"finalizer.redisenterprisedatabases.app.redislabs.com"}]'
kubectl patch rec redis-cluster -n redis-system --type=json -p '[{"op":"remove","path":"/metadata/finalizers","value":"redbfinalizer.redisenterpriseclusters.app.redislabs.com"}]'
kubectl delete job redis-cluster-db-job -n redis-system