Automation Suite:- Upgrade failure with Longhorn is not ready error from 22.10 > 23.10
Issue description:
During the Automation Suite upgrade from 22.10 to 23.10 the script is getting stuck and failed with 'Longhorn is not ready'
Resolution:
The upgrade script makes one of the nodes cordoned; eventually, this leads to there being new pods unable to create pods/ jobs.
- Validate longhorn instance manager pod is running on all nodes
kubectl get pods -n longhorn-system | grep manager
2. Validate which node longhorn pod is node running and if the node is scheduling.diabled. If yes, uncordon the node
kubectl get nodes
kubectl uncordon node
3. 'Longhorn is not ready' error should pass now validate the system upgrade pod and logs
kubectl get pods -n system-upgrade