Extract data from Online PDF without downloading

Hello, can someone please advise on how to extract data from a webpage PDF without downloading the file and using Read PDF activities?

I’m unable to extract text from an online pdf and havent found a solution on forum besides downloading, which is not efficient for the use case as there will be lots of files per run.

Get Text only extracts the first paragraph of the page after I try with AA or UIA framework (default framework doesnt work).I cannot use table extraction because the content is not in a table.

The docs usually have atleast 20 pages and API isn’t an option.
Here’s an example of how it looks on Chrome browser.
images

Thx

Hey @6027ae06be5a67a04d29acc18

Just see in the network tab or the inspect panel for some clue which may help…

Else you need to just go with UiAutomation as there is no option left.

Read PDF is to read files locally.

Thanks
#nK

Hello @6027ae06be5a67a04d29acc18

I hope here uiautomation only will work. You can try with Computer vision. Try CV Extract table activity.

Alright, thanks

Noted, thanks

@6027ae06be5a67a04d29acc18 i am attaching a solution .It worked for me. So i am using .Net WebRequest and response to read the pdf as a stream also using spire pdf to extract text. Let me know whether it works in your scenario.
Main.xaml (5.8 KB)

Hi Naveen, thanks for the example. It worked with your sample url but not mine. I get an exception: " Invoke code: Exception has been thrown by the target of an application."

Could you please check if it works when you replace with this url:
efile.fara.gov/docs/7070-Exhibit-AB-20220113-1.pdf

Hi @6027ae06be5a67a04d29acc18,

If your need is to read a specific field you can try to go with selectors.

Regards,
MY

I tested my solution for your url and am getting exception as well. Sadly a couple of things i tried like decompression and using DeflateStream(your pdf is written using ```
FlateDecode so had to try this option.).Both options not worked .Hope you get your solution :pensive:

Unfortunately the process requires the text from all the pages. Guess I might just have to use the download workaround, read pdf then delete.

No problem, thank you for taking time to try it.

1 Like

@6027ae06be5a67a04d29acc18 Can you try this solution .This worked for me .
Main.xaml (8.0 KB)

Yes, thank you!!! It works. This is very helpful :slight_smile:

You are welcome:)

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.