Hello, can someone please advise on how to extract data from a webpage PDF without downloading the file and using Read PDF activities?
I’m unable to extract text from an online pdf and havent found a solution on forum besides downloading, which is not efficient for the use case as there will be lots of files per run.
Get Text only extracts the first paragraph of the page after I try with AA or UIA framework (default framework doesnt work).I cannot use table extraction because the content is not in a table.
The docs usually have atleast 20 pages and API isn’t an option.
Here’s an example of how it looks on Chrome browser.
Just see in the network tab or the inspect panel for some clue which may help…
Else you need to just go with UiAutomation as there is no option left.
Read PDF is to read files locally.
I hope here uiautomation only will work. You can try with Computer vision. Try CV Extract table activity.
@6027ae06be5a67a04d29acc18 i am attaching a solution .It worked for me. So i am using .Net WebRequest and response to read the pdf as a stream also using spire pdf to extract text. Let me know whether it works in your scenario.
Main.xaml (5.8 KB)
Hi Naveen, thanks for the example. It worked with your sample url but not mine. I get an exception: " Invoke code: Exception has been thrown by the target of an application."
Could you please check if it works when you replace with this url:
If your need is to read a specific field you can try to go with selectors.
I tested my solution for your url and am getting exception as well. Sadly a couple of things i tried like decompression and using DeflateStream(your pdf is written using ```
FlateDecode so had to try this option.).Both options not worked .Hope you get your solution
@6027ae06be5a67a04d29acc18 Can you try this solution .This worked for me .
Main.xaml (8.0 KB)
Yes, thank you!!! It works. This is very helpful
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.