Extract data from Online PDF without downloading

6027ae06be5a67a04d29acc18 · August 6, 2022, 11:05am

Hello, can someone please advise on how to extract data from a webpage PDF without downloading the file and using Read PDF activities?

I’m unable to extract text from an online pdf and havent found a solution on forum besides downloading, which is not efficient for the use case as there will be lots of files per run.

Get Text only extracts the first paragraph of the page after I try with AA or UIA framework (default framework doesnt work).I cannot use table extraction because the content is not in a table.

The docs usually have atleast 20 pages and API isn’t an option.
Here’s an example of how it looks on Chrome browser.

Thx

Nithinkrishna · August 6, 2022, 4:44pm

Hey @6027ae06be5a67a04d29acc18

Just see in the network tab or the inspect panel for some clue which may help…

Else you need to just go with UiAutomation as there is no option left.

Read PDF is to read files locally.

Thanks
#nK

Rahul_Unnikrishnan · August 6, 2022, 5:29pm

Hello @6027ae06be5a67a04d29acc18

I hope here uiautomation only will work. You can try with Computer vision. Try CV Extract table activity.

6027ae06be5a67a04d29acc18 · August 6, 2022, 5:30pm

Alright, thanks

6027ae06be5a67a04d29acc18 · August 6, 2022, 5:30pm

Noted, thanks

Naveen_Mohandas · August 7, 2022, 3:00am

@6027ae06be5a67a04d29acc18 i am attaching a solution .It worked for me. So i am using .Net WebRequest and response to read the pdf as a stream also using spire pdf to extract text. Let me know whether it works in your scenario.
Main.xaml (5.8 KB)

6027ae06be5a67a04d29acc18 · August 7, 2022, 12:44pm

Hi Naveen, thanks for the example. It worked with your sample url but not mine. I get an exception: " Invoke code: Exception has been thrown by the target of an application."

Could you please check if it works when you replace with this url:
efile.fara.gov/docs/7070-Exhibit-AB-20220113-1.pdf

muhammedyuzuak · August 7, 2022, 7:32pm

Hi @6027ae06be5a67a04d29acc18,

If your need is to read a specific field you can try to go with selectors.

Regards,
MY

Naveen_Mohandas · August 8, 2022, 12:01am

I tested my solution for your url and am getting exception as well. Sadly a couple of things i tried like decompression and using DeflateStream(your pdf is written using ```
FlateDecode so had to try this option.).Both options not worked .Hope you get your solution

Naveen_Mohandas · August 9, 2022, 7:44pm

@6027ae06be5a67a04d29acc18 Can you try this solution .This worked for me .
Main.xaml (8.0 KB)

6027ae06be5a67a04d29acc18 · August 10, 2022, 1:49am

Yes, thank you!!! It works. This is very helpful

Naveen_Mohandas · August 10, 2022, 2:06am

You are welcome:)

system · August 13, 2022, 2:07am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Online PDF data extraction uipath studio Studio pdf	2	789	August 4, 2022
Extract Specific Data from PDF File Activities uiautomation , pdf	5	2305	July 12, 2022
Read Pdf Text from a pdf on a webpage Help pdf , activities , web , question	16	2339	November 7, 2019
UiPath PDF Structured Table Extraction Help pdf , activities , data_scraping , question	1	953	January 6, 2020
How to extract online PDF data? Studio activities , studio , question , activities_panel	1	530	November 29, 2022

Extract data from Online PDF without downloading

Related topics