Handling Rook-Ceph Clock Skew

Handling Rook-Ceph Clock Skew.

Issue Description: How to handle Rook-Ceph clock skew.

Root Cause: Clock skew can have a negative impact on rook-ceph and cause many issues. If there is clock skew on the cluster, this needs to be addressed.

Diagnosing/Resolving

  1. There typically will not be a clear symptom of the issue. However to check for clock skew, execute the following command on one of the cluster nodes.
    1. Accessing Automation Suite - Enable kubectl documentation
    2. kubectl -n rook-ceph exec -it deploy/rook-ceph -- ceph status
  2. In the output will be a message that skew is detected
  3. On the individual nodes, the Linux command date, will show the nodes time. Executing the command on the different nodes can be used to gauge if there is some skew
  4. Typically on RedHat the time service is chronyd. A few things can be done when this issue occurs:
    1. Check that the time service is running: systemctl status chronyd
      • If it is not started, start it: systemctl restart chronyd
    2. Check the configuration to make sure a time service is configured: view /etc/chronyd
  5. As a workaround, the following can be done:
    1. Elect a node to have the 'correct' time. Lets call this masterNode
    2. On the other nodes, run the following command: date --set="$(ssh <user>@<IP of masterNode> 'date -u')"
    3. The above command, executed on all other nodes besides the masterNode, will set the time to that of the master node.
  6. The above is a workaround. When this issue is encountered, contact your system Administrator and request they fix the issue. This is part of general Linux administration.