Extract specific data from .html file

Hi all, can anyone help me please with some information how can I extract specific data, like “Invoice posted” from a .html file, after I open ? As you can see in the picture, there is commas and space between data. The file looks like that so I need to extract this type of data and paste into SAP for processing. This process will be made everyday. The type of file will be the same but only that number of invoice will be different so how can I extract them ? I tried with data scrapping and save it into csv file but the content is not proper because I have that commas and spaces and I don’t know how can I solve this. :frowning:

Thank you very much for your help!

Do you want to extract these blue highlighted numbers? If yes, read this html into a text variable and then use the following regex
(?<=Invoice posted:\s?)\d+(?=/)

This can extract your desired values.

No, I need only the white fields, so, as you can see is repeating on each line. That data I need to extract.

in fact, yes, blue highlighted numbers, sorry!

Will it be always 9 charcters?

In fact, there are 10 characters, not 9. I write wrong.

Okay…if its always 10 characters you can try the below pattern which will fetch only 10 characters

image

In that case, follow my first reply and use the following regex

(?<=Invoice posted:\s?)\d{10}(?=/)

From what I see, in the “Test text” I have to write something or not ? I never use this type of activity, so, I’m sorry for so many questions. Can you make me please an example?

Thank you very much!

@ovidiu_2088 - Please check this workflow below…

  1. Read your HTML file using “Read Text File” activity and save the output as “StrInput”
  2. In the “Matches” activity -provide the input as “StrInput” and the pattern as shown here…and save the result as “IEnRegEx”
  3. In the For each loop…code as shown here…

Hope this helps…

What type of argument do I have to choose?

@ovidiu_2088 - Your Type Argument is not correct…Please recheck the screenshot above…I have also given the snipped below…

Have you tried with normal selectors, since it’s no different from having browsed to a web page?

Now I choose the correct argument and is perfect. :slight_smile: I so that in the table I have another data with 10 characters but I don’t need them, how can I write only the number that interest me? that is, all my numbers start with number 12 followed by 8 different characters, 10 in totally, but the first are specific only for the invoice number. How can I choose only that one ?

Please share some sample text and tell us the output expected? We can tweak the pattern for you…

I attached the file. So, as you know I need only the numbers with 51 to copy and paste in the SAP for processing. The issue is that I need to process one by one so, maybe first will be saved into excel file and after that copy one by one. Report output.html (16.8 KB)

Thank you very much for your support!

it looks like the main building blocks were already mentioned. Lets summarize it:

Is perfect what you send but I think I have an issue about Data scraping :frowning: the result is not the same like yours.

quick prototype was done with following extract config:

<extract>
	<row exact="1">
		<webctrl tag="tbody"/>
	</row>
	<column exact="1" name="Column1" attr="text">
		<webctrl tag="tbody"/>
		<webctrl tag="tr" idx="5"/>
		<webctrl tag="td" idx="1"/>
		<webctrl tag="font" idx="2"/>
		<webctrl tag="nobr" idx="1"/>
	</column>
	<column exact="1" name="Column2" attr="text">
		<webctrl tag="tbody"/>
		<webctrl tag="tr" idx="4"/>
		<webctrl tag="td" idx="1"/>
		<webctrl tag="font" idx="4"/>
		<webctrl tag="nobr" idx="1"/>
	</column>
</extract>

Thank you very much @ppr . Now, can you tell me please how can I paste this data into excel file to start to process in SAP ? because I think is necessary to do this to start to process forward or?