New Dataset Creation: Security concern

AI Center setup on prem, with CentOS7 system and behind the corporate proxy.
In order to avoid Personal Data leak (we will train and upload Personal Data into Dataset) before I upload data, I deleted previously created Datasets and blocked proxy.
I got an error during new Dataset creation. Logs:
kubectl logs -n aifabric ai-trainer-deployment-86d89b69df-kb2js -f

2021-07-29 08:46:18 [http-nio-8086-exec-8] INFO - Validate dataset by name: ProxyTest
2021-07-29 08:46:18 [http-nio-8086-exec-8] INFO - Get Project by id 5b2a218d-898b-40be-b49d-9bade9f69c9f called
2021-07-29 08:46:18 [http-nio-8086-exec-8] WARN  o.a.http.impl.auth.HttpAuthenticator.generateAuthResponse - NEGOTIATE authentication error: Invalid name provided (Mechanism level: KrbException: Cannot locate default realm)
2021-07-29 08:46:18 [http-nio-8086-exec-8] ERROR - Error while uploading 5b2a218d-898b-40be-b49d-9bade9f69c9f/28f2b138-4c3e-455e-9c82-2519c039c68d/ of contentType null to 5b2a218d-898b-40be-b49d-9bade9f69c9f/28f2b138-4c3e-455e-9c82-2519c039c68d/ Proxy Authentication Required (Service: Amazon S3; Status Code: 407; Error Code: 407 Proxy Authentication Required; Request ID: null; S3 Extended Request ID: null; Proxy:
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(


2021-07-29 08:33:42 [http-nio-8086-exec-10] ERROR c.u.m.c.ControllerExceptionsHandler.handleBaseException - An error has occurred. Exception code: 10401, Exception Message: Failed to upload files for content type null, Exception status: 500 Failed to upload files for content type null

  1. I do not understand why and whom wants to go outside to internet and use proxy.
  2. Why there is connection initiated to AWS? Why AI tries to connect to AWS … “”

Hi Robert
Same action with proxy enabled work? I wonder if this is not something missing in no_proxy actually.
You are only creating a new empty dataset here?


Corrrrrrect… tried after that with proxy activated (I lock on iptables level) and zoink… error the same.

  1. Yes, looking where I missed no_proxy where changing user and posting entries… however, what to be no_proxied? not clear…
  2. Still confusing, the AWS… however this might be misinterpretation. I believe you migrated from cloud to on-prem in Feb 2021 and this might be just a name of a service…
  1. I believe this is due to rook ceph service being called using IP instead of service name. So you need to add this IP in no proxy.
    If I recall correctly the service is this one:
    kubectl get service rook-ceph-rgw-rook-ceph-store -n rook-ceph
    output will look like this:
    rook-ceph-rgw-rook-ceph-store ClusterIP <none> 80/TCP 118d

In that case the IP needs to be added.
The best way to be sure of that is to upload an ML Package You can use this one: (11.1 KB)
And inspect network call to see what is the IP used for storage. It should correspond to a rook-ceph pod/service.

  1. AWS logs are because we use s3 API, which is originally developped by Amazon, to interract with storage. This is not going outside.
1 Like

rook-ceph-rgw-rook-ceph-store-a-55b7fbcd89-74qvx 1/1 Running 21 10d is included into no_proxy.

I believe this is the pod not the service, is that what you see in network call while uploading ML Package?

1 Like

Bingo… Pods are 10.32.0.* , but services are 10.96.. - I populated all IP range for pods to be sure that are all no_proxy and missed the services…

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.