How to deal with operation not permitted or permission denied errors in a container on Automation Suite Embedded.
Issue Description:
How to deal with operation not permitted in a container on Automation Suite Embedded.
Example errors:
File "/opt/redislabs/sbin/generate_gossip_envoy_default_conf.py", line 10, in main
generate_gossip_envoy_config()
File "cnm/services/gossip_envoy.py", line 90, in generate_gossip_envoy_config
IOError: [Errno 13] Permission denied: '/etc/opt/redislabs/gossip_envoy.yaml'
Command '/bin/bash -c export PYTHONPATH=:/opt/redislabs/lib/cnm:/opt/redislabs/lib/cnm/python; /opt/redislabs/bin/python2.7 -O /opt/redislabs/sbin/generate_gossip_envoy_default_conf.py' exited with status: 1
chown: changing ownership of '/var/run/saslauthd': Operation not permitted
Command 'chown -R redislabs:redislabs /var/run/saslauthd' exited with status: 1
/opt/redislabs/sbin/supervisord_prestart_script.sh: line 42: type: getenforce: not found
chmod: changing permissions of '/var/run/redis': Operation not permitted
Command 'chmod 0775 /var/run/redis' exited with status: 1
Background:
The error message can happen in a container when the host machine interferes with container operations either because:
-
The container image file system permissions were changed from the host machine.
-
fapolicy is enabled but not configured.
-
Selinux is enabled and causing issues.
Resolution:
-
Check if fapolicy is enabled.
-
If its enabled, make sure its configured as per our docs:
Automation Suite - Step 8: Configuring kernel and OS level settings
-
cat /etc/fapolicyd/rules.d/69-rke2.rules
-
If it's not configured, run the command in our docs to configure it. If it is configured, please capture a screenshot.
-
-
Check if selinux is enabled.
-
sestatus
-
If the status is enforcing, temporarily disable it on all nodes and see if the issue persists.
-
sudo setenforce 0
-
-
If the issue is resolved after disabling selinux, then it can be concluded that the issue was caused by selinux.
-
If the environment is air-gapped we need to install the selinux package.
-
On a non-airgapped machine download the rpm package with the following command:
sudo dnf download rke2-selinux
-
Once the package is downloaded, transfer it to the airgapped machine.
-
Then run rpm -ivh
-
After this selinux can be reneabled.
-
-
-
If the issue persists with selinux being disabled, most likely the host file permissions has changed.
-
It is difficult to check the permissions because it could be any random file in the directory. In such scenario rebuild the files system.
-
Steps to rebuild the Files System:
1. Drain and Stop the Node: Begin by draining the node to move workloads to other nodes (if possible) and prevent data loss.- kubectl drain --ignore-daemonsets --delete-local-data
2. Run the Cleanup Script: Stop the RKE2 agent and run rke2-killall.sh to remove all components and reset the node state.
- sudo systemctl stop rke2-agent
- sudo /usr/local/bin/rke2-killall.sh
3. Delete the Containerd Directory: Once the node is safely stopped, delete the containerd directory to clear any corrupted files.
- sudo rm -rf /var/lib/rancher/rke2/agent/containerd
4. Restart the RKE2 Agent: Start the RKE2 agent to rebuild the containerd directory and initialize a clean runtime environment.
- sudo systemctl start rke2-agent
-
-
If none of the above works, raise a support ticket. Please include the rke2 support bundle and any other artifacts gathered.