How can I fix a deployment that won't start because there's not enough storage for its volume to mount in Longhorn?
Issue Description
If you're experiencing a situation where replicas in a deployment are not getting scheduled as anticipated, it often becomes evident through two key indicators:
- First, in the Longhorn GUI, you might see an error message related to the persistent volume claim (PVC) reading: "Scheduling Failure. Replica Schedule Failure."
Longhorn Ui Error:
- Secondly, when you use the
kubectl describe pod
command on the relevant pod, you may encounter an error stating "Unable to attach or mount volumes." These symptoms suggest issues with volume scheduling and mounting in the cluster.
Failed pod event log:Warning FailedMount 2m29s kubelet Unable to attach or mount volumes: unmounted volumes=[061d493d-8881-4e91-a8da-c780b66dcbf3-storage], unattached volumes=[localstorage 061d493d-8881-4e91-a8da-c780b66dcbf3-storage secret-volume pipstorage cert-trust-store shm-volume tmpstorage workspacestorage domain-cert-config additional-ca-cert-config cert-location temp-location]: timed out waiting for the condition
Warning FailedAttachVolume 45s (x12 over 9m5s) attachdetach-controller AttachVolume.Attach failed for volume “pvc-9c793e62-a7a6-45dc-b60c-3fd1d363b8fb” : rpc error: code = Aborted desc = volume pvc-9c793e62-a7a6-45dc-b60c-3fd1d363b8fb is not ready for workloads
Resolution
- Free Up Space or Adjust Settings:
Ensure the disk has sufficient space. You may need to free up space or adjust the 'Minimal Available Percentage' setting. For instance, you can reduce it from the default 30% to 20%, and even down to 10% if using a dedicated disk for Longhorn.
-
- If freeing up space isn’t an option or sufficient, the next step is to expand your storage capacity by adding additional disks to your cluster.
-
- This article proceeds under the assumption that disk space can be made available.
- Edit Node and Disks Settings:
Navigate to the 'Node' tab within the Longhorn web interface.
-
- Click on the "Operations" button on the affected node where space needs to be freed.
- Select 'Edit Node and Disks'.
-
- Under the 'datadisk' settings (identified in the path field with a value of "/datadisk"), modify the Storage Reserved to 20% of the storage maximum.
-
- Click 'Save'.
- After implementing these adjustments, the pod events should resolve and the PVC status should become healthy.
Best Practices:
- The Minimal Available Percentage value should be around 20-25% if using Longhorn in the root disk. For a dedicated disk for Longhorn, you can lower the minimal available storage percentage to 10%.
- Regularly monitor disk space usage and adjust settings as necessary to prevent scheduling issues.