Understanding Dataset Accessibility in UiPath AI Center and Orchestrator
Abstract:
This article aims to clarify the behavior of datasets in UiPath AI Center when used in conjunction with the Machine Learning Extractor Trainer activity. Specifically, it addresses the distinction between public and private datasets and their accessibility across different tenants.
Introduction:
UiPath AI Center offers powerful capabilities for machine learning integration. Part of this integration involves creating datasets in AI Center and using the Machine Learning Extractor Trainer activity in the UiPath Studio to update these datasets. However, it's essential to understand the differences between public and private datasets and how they affect accessibility when using multiple tenants.
Public Datasets:
A public dataset in UiPath AI Center is accessible by all tenants connected to the same Orchestrator instance. This means that any process running in UiPath Studio, regardless of the tenant it is connected to, can access and use the public dataset. Public datasets are helpful when you want to share data across different processes or when working in a multi-tenant environment.
Private Datasets:
In contrast, a private dataset is accessible only within the same tenant in UiPath AI Center. This means that any process running in UiPath Studio that is connected to a different tenant, won't have access to this private dataset. Private datasets are designed to maintain data privacy and are suitable for scenarios where data should be limited to a specific project or team within the same tenant.
Inter-Tenant Accessibility and its Impact:
When using the Machine Learning Extractor Trainer activity to update datasets within a DU process, it is crucial to consider the tenant connectivity. If the dataset is marked as public in AI Center, the DU process running in any tenant can access and update the dataset successfully, even when connected to a different tenant.
However, if the dataset is marked as private in AI Center, the DU process running in a different tenant will not have access to the private dataset. This is the expected behavior as private datasets are meant to be exclusive to the tenant in which they are created.
Solutions and Best Practices:
- Use Public Datasets for Cross-Tenant Accessibility:
If you require the DU process to run across different tenants and access the same dataset, ensure that the dataset is marked as public in AI Center. This way, any process running regardless of the tenant, can successfully access and update the dataset.
- Separate Datasets for Private Projects:
If data privacy is crucial and datasets should not be shared between tenants, use private datasets for projects within the same tenant. This ensures that data remains confined to the specific project or team.
- Endpoint Access for Public Datasets:
To access a public dataset externally (outside Orchestrator), you can use the dataset's endpoint. This allows you to fetch data from the dataset without requiring a robot connected to the tenant.