How to simplify multiple for each loops while using regex match activity and add them datatable

I am getting multiple types of information from a response of http request activity, and then save all of them into an excel sheet. The results is fine but the process is so mess and very hard to combine with other further moves. Can anyone help please?
Step A:

  1. I get response using http request.
    (I want to save date, file name and file link for each file in the page)

Step B:
2. build a datatable
3. match: use regex to find “date” for each file I need
4. for each item: get each string from the array of the result of match activity
5. in the for each loop, then add data row into datatable

Step C:
6. build a datatable2
7. match: use regex to find “filename” for each file I need
8. for each item: get each string from the array of the result of match activity
9. in the for each loop, then add data row into datatable2

Step D:
10. build a datatable3
11. match: use regex to find “urllink” for each file I need
12. for each item: get each string from the array of the result of match activity
13. in the for each loop, then add data row into datatable3

Step E:
14. in excel scope, write range datatable to A1
15. in excel scope, write range datatable to B1
16. in excel scope, write range datatable to C1

My question is:

  1. How I can save all three types of data directly into one datatable, for example, if I only have one datatable, I will save the data from step b into column 1, and then I will save the data from step c into column 2 only …
  2. Is there anyway I could write all three types of data directly, rather than one column first, and then another column later? Could I combine step b, c d totally together? How does loop work in this way…?

I attached my xaml here… could anyone help please? Thanks.
@MiaS - I am looking at it. I am thinking of creating a group regex and write everything one shot…

First let me finish my official work and come back to yours :slight_smile:

Thanks. Looking forward to seeing it. :grin:

@MiaS - I looked it…but i am not sure, how to add 3 different Regex pattern in the same datatable(that too in the loop).

@ppr @yoichi @ptrobot - can help

I had a quick look on it ( some activities were unresolved), but some open questions on the flow left.

Maybe below can help:

@ppr - Thanks … the thing I am struck here solving this for @MiaS is , each regex pattern return multiple items, so I tried to solve using your other approach from here. But didn’t work. That’s why I tagged you.

Hi @MiaS @prasath17 ,

hi @Yoichi - This is simply wow… :vulcan_salute:

I actually have LINQ query from ppr which takes groups and write it to datatable…but in this case i didnt attempted to get a group regex…

Just out of curiosity…Say if we can’t group the item where we have to write different regex for each field and each regex return multiple matches…how to write to dt in much efficient way? Please advise

I think it’s challenge as datatable is basically row-oriented. In some case, it might work with iterating index number such as “Enumerable.Range” method. However, we need to take care to be the data in each row semantically correct.



Thanks. It works very well.

It seems more accurate as my original method could lead to wrong match if one element is not extracted.

Hi @Yoichi @prasath17
Thanks for your help in this issue.
Re the group regex, as the file names sometime is not bold (without ‘strong’ tag), and sometimes there are bullets underneath (will have multiple lines before a fixed term ’ - link opens in a new window’).
1).I tried to adjust to include all the situations, but for the 3rd group file name, I couldn’t extract the ones with a bullet underneath. Could you please help?
2).Also, could I remove the ‘strong’ out directly for the 3rd group in the match regex result, rather than using replace activity afterwards? Thanks.

(?<uploaddate>(?<=<tr>\r?\n?\s+<td class="nowrap">\r?\n?\s+).*(?=\n?</td>))[\s\S]*?(?<pathlink>(?<=<a href="/company/\r?).*(?=\n?format=pdf&amp))[\s\S]*?(?<pdfname>(?<=PDF\r?\n?\s+<span class="visuallyhidden">\r?\n?\s+<?s?t?r?o?n?g?>?).*(?=\r?\n - link opens in a new window ))

html sample
Hi…@MiaS - You mean the below one, list bullet case…and you want to extract all the lines???

I think this is getting very complex(for me)…because i see there are lot of variations are there…with Span , div tag coming in between the text…

Hi @prasath17
I only want
“the temination of appointment” if possible
or “the termination fo appointment of andrew higginson as a director”
not include other lines, thanks.

@MiaS - Regex_NewPattern.txt (229 Bytes) - Please check this…this is capturing all the 25 pdf names…but it’s not clean…you can do a .replace command to clean the unwanted tags…

Check on the below link, where I have executed just the pdf names and it capturing all 25 names…

Please try and let us know…

The method you provided works very well!
I was keeping trying to use replace activities to clean the result.
Finally after several hours… I removed the new lines, , ,“/”, “:” etc…
It is exhausting for a beginner but fun.
Thanks for your xmal file. It works well!
It is much tidier than what I did… I really wish I have noticed it earlier… :laughing:
Good night! @prasath17

