How to Monitor Remaining Storage in Automation Suite and Resolve ObjectStore PVC Capacity Issues in Rook-Ceph

In order to reduce the Ceph usage capacity, the capacity does not seem to have decreased even if the ML package or dataset is deleted and the capacity is checked with Ceph df. If I delete the package, pipeline, etc. from AI center or DU, will it not be deleted from ceph? If I had to delete it directly from ceph, what would the command be like?

Issue Description

  • Scenario:
    • Rook-Ceph Sync Issue identified. According to ceph df, there is still 230GB available, but ceph health detail reports the following error:
    • 3 Full OSD; 8 pool full.
    • It seems some data within Ceph needs to be deleted. Attempts to delete datasets from UiPath AI Center were unsuccessful due to the system being in a full state.

Resolution:

Check the ObjectStore PVC Status

  • SSH into the server node and verify the state of rook-ceph-tools pod:
    • kubectl -n rook-ceph get pod

Access the Pod for Diagnosis

  • Start an interactive bash session in the rook-ceph-tools pod:
    • kubectl -n rook-ceph exec -it rook-ceph-tools- -- bash

Monitor Storage and Object Usage

  • Check Ceph cluster health and storage status:
    • ceph status
    • ceph df
  • Focus on high-usage pools like rook-ceph.rgw.log and rook-ceph.rgw.buckets.data.

List and Analyze Bucket Usage

  • Identify buckets consuming the most space:
    • radosgw-admin bucket stats | jq -r '["BucketName","NoOfObjects","SizeInKB"], (.[] | [.bucket, .usage."rgw.main"."num_objects", .usage."rgw.main".size_kb_actual]) | @tsv' | column -ts $'\t'

Delete Unnecessary Objects

  • List files in a bucket:
    • radosgw-admin bucket list --bucket= | jq -r '.[] | {name: .name, size: .meta.size, mtime: .meta.mtime, owner: .meta.owner} | [.name, .size, .mtime, .owner] | @tsv' | column -t
  • Delete objects that are no longer needed, backing them up first if necessary.

Run Garbage Collector to Free Space

  • Execute the garbage collector to reclaim space immediately:
  • if [[ "$(kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin gc list --include-all | jq 'length')" -gt 0 ]]; then
  • echo "[$(date +'%Y-%m-%dT%H:%M:%S%z')] Running gc process"
  • kubectl -n rook-ceph exec deploy/rook-ceph-tools -- radosgw-admin gc process --include-all
  • echo "[$(date +'%Y-%m-%dT%H:%M:%S%z')] Completed gc process with exit code: '$?'"
  • else
  • echo "[$(date +'%Y-%m-%dT%H:%M:%S%z')] GC run not required"
    • fi

Set a Monitoring Routine

  • Regularly monitor storage utilization to prevent issues with full PVCs.