How to extract several information that is all in a single text field

Hello friends, now I have another challenge … well, in a web system I need to extract several information that is all in a single text field, as shown in the image below, the data I need the robot to copy are the ones that are highlighted. What is the best method to do this? A big hug and thank you.

Hi Diego,

I will suggest my favourite tool, the regex. Since the input is very regular, you can do this with one “base pattern” in which you substitute the key parts that differ. Such a base pattern could be (?<=^{0}.* : ).*$, where {0} is the (first) substitution placeholder; this reads “if the string starts with {0} followed by any number of any character and then a colon with spaces on both sides, match anything until the end of the string”. The spaces around the colon ensure that the preceding .* consumes everything until the first occurrence of this exact sequence, so that putting “Resultado” as {0} will properly match “000012”.

Additionally you then keep a list of key strings somewhere, either in the workflow or in an external file, one for each field that should be read from the input. In your case they could be Empresa, Pesquisa, Resultado, … They should be long enough to uniquely identify the line you want to read. Then loop over these, and in your Regex Match calls/activities pass String.Format(pattern, item) as the pattern expression to replace the {0}. You could also assign the result to a local variable and pass that instead. The regex search must have Multiline mode enabled to have the ^$ anchors work as intended.

Given the nature of loops, it’s by far easiest to add the match results to an array-type container, such as a dictionary or datatable. I give these as examples because you should definitely also store the string key that was used to identify each result.

1 Like

Hi Diego,

following way can be useful if the file format is same except data.

  1. Split the file based on the new line
  2. Split each line which contains the “:” special character
  3. skip the first two string which contains the “:”
  4. you need to loop to parse the file and get the required output read by a robot.

Let me know if you require demo project for same.

Hi Vivek,
I have similar case to get data from notepad and enter into excel.
Can you please suggest anything…
Scenario is:-

I have data in notepad which is following a specific pattern that is for eg:-

INDIAN CRICKET TEAM
LIST
PLAYERS
1 DHONI/MS CSK 7 CAPTAIN
INDIA
2 SHARMA/ROHIT MI 23 CAPTAIN
INDIA
3 KOHLI/VIRAT RCB 14 CAPTAIN
INDIA

I want to enter these data into excel as follows:-
Column1 Column2 Column3 Column 4 Column5 Column6
1 DHONI/MS CSK 7 CHENNAI INDIA
2 SHARMA/ROHIT MI 23 MUMBAI INDIA
3 KOHLI/VIRAT RCB 14 BENGALURU INDIA

I dont know how can i ignore first 3 lines, and how to get the data of column 6.

Thanks in advance.

Hi Ronak,

  1. You can ignore the first 3 lines in the text file reading loop, for eg. counter > 2 so it will pick from the 4th line.
  2. you want to insert the data into excel file as per column so build the table and insert the data.
    Need to merge the line with the country name so u can dump the data in the data table with country name as country name on the next line.
  3. Reading data from the specific column is easy, add read range activity and pass data table name. Use the for each row with getting row item and pass the required coulmn name.

Please see the attached project for same and let me know if have any doubt?

Thanks
Vivek S.

Thank you so much @vivek_shiv for your response.

Yes, i am able to skip the lines.

I am unable to see the attached project, can you please attach.

Thanks

Main.xaml (29.9 KB)

Hi Ronak,

Sorry, the attachment was missed.
Please refer it.

++
TempFlow.zip (12.8 KB)

Hi Ronak,

Sorry, the attachment was missed.
Please refer it.

  1. Assign activuty,
    arrCricInfo = strCricInfo.Split(Environment.NewLine.ToArray, StringSplitOptions.RemoveEmptyEntries)
  2. Verify the array index in which required result…For example “1 DHONI/MS CSK 7 CAPTAIN” will have array index of 3.
  3. Now use a Split function on the arrCricInfo(3).Split(" “,c)(0) etc…
    Note –
    arrCricInfo(3).Split(” ",c)(0).Split({“CSK”},stringsplitoptions.None)(1)
    Try variations…with split finction.

Thanks.