Resolving OSD Down Due To "Failed To Fetch Mon Config (--no-mon-config to skip)" Error

How to resolve the OSD down issue with the error message "Failed to fetch mon config (--no-mon-config to skip)" in a Ceph-based storage cluster managed by Rook?

Issue Description

An OSD (Object Storage Daemon) down issue is encountered in a Ceph-based storage cluster managed by Rook. The OSD is unable to fetch the monitor configuration, and you receive the error message "Failed to fetch mon config (--no-mon-config to skip)." This issue prevents the OSD from starting and can disrupt your storage cluster's functionality.


Resolution

To resolve the OSD down issue with the "Failed to fetch mon config (--no-mon-config to skip)" error, follow these steps:

  1. Retrieve the OSD key from the disk

Use kubectl to edit the deployment for the affected OSD (replace rook-ceph-osd- with the name of the OSD deployment):

  • ls -l /var/lib/rook/rook-ceph

image.png

  • cat /var/lib/rook/rook-ceph//keyring
image.png

image.png

  • sed -i 's/AQCL67BHTCCJIBAAyiljN4QJnpuooiKwcCvGxB==/AQCL5SVl3PCJIBAAyiljN4QJnpuooiKwcCvGxA==/g' /tmp/osd0.auth

  1. Append Keyring to initContainer

Within the deployment YAML, locate the initContainers section and append the following command to the end of the command list for the activate initContainer:

  • cat $OSD_DATA_DIR/keyring
 

Here is an example snippet:

initContainers:
  - command:
      - /bin/bash
      - '-c'
      - |
        # ...
        # Existing commands
        # ...
        # Append this command to the end
        cat $OSD_DATA_DIR/keyring

    

This step ensures that the keyring is included in the initContainer activation process.

  1. Save and Exit

Save the changes to the deployment YAML and exit the text editor.

  1. Retrieve the OSD Key

Use the following command to retrieve the OSD key from the logs after the activate container exits (replace rook-ceph-osd-0 with the name of your OSD deployment):

  • kubectl -n rook-ceph logs deploy/rook-ceph-osd-0 -c activate | tail -n 3

Note the OSD key that is printed in the logs.

  1. Restart the OSD Pod

Exit the toolbox and restart the pod for the affected OSD:

  • kubectl -n rook-ceph delete pod -l rook-ceph-osd -l ceph-osd-id=0
The OSD should now start correctly with the updated auth configuration.