How to integrate Databricks with uipath

Hey

I am new to UiPath and I have a use case mentioned below:
I want to scrape the data from a website and store it in the form of JSON using UiPath and send it to my databricks notebook to create a dataframe and apply transformations to it

Want have I done till now:
I created a process using the UiPath studio and can scrape the website, extract the data table, and write it in an excel sheet.

Need help on:
How can I solve the use case mentioned above
What tools and applications do I require and the flow I have to follow
Any resources which can help me to solve this

Let me know if you require any more info

Hi @a.k ,

There are several ways to accomplish this, it depends on what you have available:

  1. UiPath’s storage buckets integrate with S3/Minio so you can set a bucket and have databricks to pull that file from your S3 bucket via AWS or Orchestrator API (Buckets endpoint)
    About Storage Buckets (uipath.com)
    Working with data in Amazon S3 | Databricks on AWS

  2. Have UiPath to write to a NoSQL database like mongo
    MongoDB C#/.NET Driver — MongoDB Drivers

  3. Are you using DBFS? you can consider invoking databricks API from UiPath
    DBFS API 2.0 | Databricks on AWS

What would I personally do? My personal choice is # 1 as it make my life easier and works with my current infrastructure setting.

Think of it as a pipeline:
UiPath Extraction :arrow_right: Storage Media (UiPath/3rdParty):arrow_right: API Request :arrow_right: Databricks
UiPath Extraction :arrow_right: API Request :arrow_right: Storage Media (Databricks) :arrow_right: Databricks

Finally, an extensive list of resources
PySpark Documentation — PySpark 3.3.0 documentation (apache.org)
REST API (latest) | Databricks on AWS
Working with data in Amazon S3 | Databricks on AWS
MongoDB C#/.NET Driver — MongoDB Drivers
How to Execute a REST API call on Apache Spark the Right Way | by James S Hocking | Geek Culture | Aug, 2021 | Medium | Geek Culture

Hope this helps :robot:

1 Like

Thanks a lot, Edwin!

Appreciate your reply

It helped me in resolving my questions but I need to know a few more things

  1. We require UiPath Studio for creating the workflow and I did it and published it to the workspace in UiPath Cloud but how should I call it or start this workflow - do we have any APIs which start the workflow on UiPath?

  2. Do we require a Desktop application which should be active when this workflow is processing?

Well, you can create a scheduled trigger in orchestrator

Or you can call Orchestrator API to start a job, please refer to this post

Alternatively, you can start a job from UiPath assistant

Yes, you will need to install UiPath robot on the machine(s) that will be executing the job

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.