Issues with CEPH connectivity after power disruption

Hello, submitted also via regular support. As usually, checking who is faster, support or community: Case # 01155757

After unexpected power down, the AI center pods fails to fully start. Event log shows issues with CEPH connectivity. Ceph status cannot be verified, the ceph command on rook-ceph-tools does not give any results (stuck with command fulfillment). Multiple pods report issues with mounting to pvc has been created.

kubectl describe pod -n kurl registry-6fffbb9895-26rnt

Events:
Type Reason Age From Message


Warning FailedMount 31m (x17 over 3h32m) kubelet Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-pki registry-htpasswd default-token-td5bl registry-data registry-config]: timed out waiting for the condition
Warning FailedMount 19m (x18 over 171m) kubelet Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[registry-htpasswd default-token-td5bl registry-data registry-config registry-pki]: timed out waiting for the condition
Warning FailedMount 10m (x17 over 3h18m) kubelet Unable to attach or mount volumes: unmounted volumes=[registry-data], unattached volumes=[default-token-td5bl registry-data registry-config registry-pki registry-htpasswd]: timed out waiting for the condition
Warning FailedMount 5m7s (x103 over 3h32m) kubelet MountVolume.MountDevice failed for volume “pvc-b5548c8d-3da7-401a-b824-48d9547069b5” : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-3e9fcdea-f230-11eb-a611-8e83aec171b4 already exists

Very interesting… Kubernetes after a week staying untouched resolved the topic by himself… Ceph is again operating, but it took some significant amount of restarts…

kubectl get pods -n rook-ceph

NAME                                                 READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-7m8f4                               3/3     Running     6          13d
csi-cephfsplugin-provisioner-9588459f8-5tb98         6/6     Running     3570       13d
csi-cephfsplugin-provisioner-9588459f8-v2fgp         0/6     Pending     0          13d
csi-rbdplugin-mhd68                                  3/3     Running     6          13d
csi-rbdplugin-provisioner-5557b4599d-kgghc           6/6     Running     3570       13d
csi-rbdplugin-provisioner-5557b4599d-lkdzt           0/6     Pending     0          13d
rook-ceph-crashcollector-ai-center-789848c67-wk2xd   1/1     Running     2          13d
rook-ceph-mds-rook-shared-fs-a-54f8d69d57-zwncl      1/1     Running     0          13d
rook-ceph-mds-rook-shared-fs-b-7cb7d58645-4l9xb      1/1     Running     0          13d
rook-ceph-mgr-a-74448c68cb-fr5qn                     1/1     Running     1          13d
rook-ceph-mon-a-68d9d64d65-p4pb5                     1/1     Running     2          13d
rook-ceph-operator-5cd499f5b4-wvtbg                  1/1     Running     0          13d
rook-ceph-osd-0-668f68b66b-mtlx6                     1/1     Running     143        13d
rook-ceph-osd-1-76c8558c9-b867p                      1/1     Running     0          13d
rook-ceph-osd-prepare-ai-center-vwvvn                0/1     Completed   0          13d
rook-ceph-rgw-rook-ceph-store-a-78477cc457-4wkh4     1/1     Running     166        13d
rook-ceph-tools-7f7f7b84d-f9p9l                      1/1     Running     2          13d
rook-discover-whfzl                                  1/1     Running     2          13d

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.