Reading a HTML and extracting information

Hi community,

I have an issue with a project, I need to extract the information from a Coupa file that comes in an HTML format, the machine uses the browser to open it. I need to extract the information from that file but I am not sure if there is an activity that lets me extract the info just like in a PDF.
What I did was convert the HTMLs to PDFs, using CTRL+P and saving the file, however sometimes the selectors or hotkeys are not working properly and the file is not saved.
Is there some tool I can use or an activity/method to avoid using selectors for that?

Appreciate any help or advice

1 Like

Hey @RaVillalobos

This is possible in a programmatic way and also make sure you have the rules to extract data.

Install HTMLAgilityPack .Net library which will help you to parse HTML and get appropriate element value as we do in selectors.

Thanks
#nK

1 Like

Hello @RaVillalobos ,

What are the data that you are scrapping from webpage? Any Table data or field data?
Any challenges you are facing with the web page automation?

Because sending hotkeys and converting to pdf will not be reliable and can get fail. If you can proceed with webpage automation that will be better.

Open the html file in chrome browser, then use print option and save it as pdf.

Thanks for replying,

Is this the one?

I am trying to Read the text from an invoice in HTML format, just like a ReadPDF Activity

1 Like

Thanks sir,

I am trying to extract the details from an invoice in HTML format, use it like a Read PDF without converting it to PDF first

Open it in the browser and use activities like you would for any normal web page.

Hey @Randy_Villalobos

Yes the first one.

Thanks
#nK