Hello, can someone please advise on how to extract data from a webpage PDF without downloading the file and using Read PDF activities?
I’m unable to extract text from an online pdf and havent found a solution on forum besides downloading, which is not efficient for the use case as there will be lots of files per run.
Get Text only extracts the first paragraph of the page after I try with AA or UIA framework (default framework doesnt work).I cannot use table extraction because the content is not in a table.
The docs usually have atleast 20 pages and API isn’t an option.
Here’s an example of how it looks on Chrome browser.
@6027ae06be5a67a04d29acc18 i am attaching a solution .It worked for me. So i am using .Net WebRequest and response to read the pdf as a stream also using spire pdf to extract text. Let me know whether it works in your scenario. Main.xaml (5.8 KB)
Hi Naveen, thanks for the example. It worked with your sample url but not mine. I get an exception: " Invoke code: Exception has been thrown by the target of an application."
I tested my solution for your url and am getting exception as well. Sadly a couple of things i tried like decompression and using DeflateStream(your pdf is written using ```
FlateDecode so had to try this option.).Both options not worked .Hope you get your solution