Storage Bucket - download everything or export to Azure

Hello,

We have a Document Understanding process, and all the related files are stored in a Storage Bucket on Automation Cloud. What we want to do now, is to use that process data for some analytics, as Action Center exports are not great. Is there any way, that we could easily download everything in the Storage Bucket, without actually looping through every file in Studio or API?
I was thinking about setting up Azure as a storage provider, and then exporting there, but I really don’t want to touch the live process and could not find any way to actually duplicate the bucket.

Thanks for ideas!

If you are using Storage Bucket in Orchestrator, then there isn’t really much choice but to use the API as the meta data of the blobs are maintained in the database.

If you were to look at the underlining storage it will not be clear on what GUIDs and Blob references belong to which buckets / files.

For example in my test environment where the default storage for Orchestrator is configured as FileSystem and the Location is a NFS.

In the image Storage is part of the base URI as defined by Storage.Location in the Orchestrator configuraton file.

Orchestrator-xxxxxxx is a Unique Tenent ID
BlobFilePersistence is the base for where Storage Buckets are stored
Next GUID is a Unique ID for the specific bucket in my case named “Test”, and then you have the File contents below that.

Structure, etc. may be different depending on your Storage Type for Orchestrator and which Storage Provider you configured for your Bucket.

1 Like

thanks for the reply, this would be great, but we are on cloud, so a bit more limited with what I can do. I guess we don’t have any other option then, just looping through the API

Hi I also have same issue, can you help me how you resolve the issue

Hi @Madhav_Jha1 - Please open a new topic that is specific to your challenges with details of what you are trying to do and steps you have taken already and any other relevant information such as Errors, etc.

Yeah, if you are using the Orchestrator as your provider, than you have no control over the background technology being used.

However if you are using Azure Storage or Amazon S3 as your Storage Bucket provider, then these resources belong to you and as such you should have more flexibility in direct access. But again, I imagine the meta data for these are still stored in the Orchestrator requiring API to identify their location and other meta data.

1 Like

Thank you Tim. I tried the “easy” route and just use the available Studio activities to enumerate all the storage bucket items, but unfortunately that does not work, just times out after running for an hour or so.
I guess I will just drop this idea, and will start looking into setting up a bucket on our Azure. Too bad that moving the data is not possible neither.