How to extract open graph meta data from a webpage?

Learning RPA with UIPath. Happily extracting onscreen data from a website, processing it, using it, etc. Navigating in browsers, clicking links and all that.

However, there’s information in the page that isn’t visible, but is in the source, eg, open graph meta tags:

<meta property="og:image" content="https://example.com/foo.jpg" />

What options are open to me to extract this with UIPath? I gather there’s an ExtractMetaData flag from ExtractData but I’ve yet to find a useful tutorial that I can follow at this stage :confused:

I found this forum piece, but the attachment someone provided to solve it errors when I open up in UIPath Studio, and well, I just feel this is enough of a common thing that surely someone has done it before with some basic steps for me?

Many thanks,

Mark

Off the top of my head I can think of a manual way to do it. You would:

  1. Right click on the page and select “View Page Source”.
  2. Copy the text into memory.
  3. Use the extract text snippet that uses regex to get the actual values out of the text.

Other than that, HTML is essentially XML so you could repurpose this to parse the copied text and get your meta tags out.

You could make the text copying more robust by downloading the page source using something like this.

Either of those suit your needs or do you need some other ideas?