Can't extract specific string from excel cell

Hello,

I am trying to extract specific information for each cell in an excel table.

I tried using regex but it isn’t working.

Could you give a little more details/context about what you’re trying to achieve? What is an example of text in an excel cell you want to pull out? What did you try to determine that regex wouldn’t work?

Of course!

Each cell contains the following info:

Consultor 1: (name
Entrevistado: (name)
Cargo: (job position)
Onda: (type)
Cliente: (client)

I need to extract the name after “Entrevistado” for each cell in the excel table.

After extracting I did as follows:

image

If the cell contains all of that, then I do think regex is your best bet over splitting the string or using substring. I can’t see what you tried for your Regex though?

This regex pattern should work: (?<=^Entrevistado: )[a-zA-Z.\- ]+
Assumptions: Names can only include lower case letters, upper case letters, periods, hyphens, and spaces. The string is contained on multiple lines.

First, go into Imports tab and add System.Text.RegularExpressions
Then update your assign activity so NomeCandidato = Regex.Match(Cell,"(?<=^Entrevistado: )[a-zA-Z.\- ]+",RegexOptions.Multiline).Value

That should pull out the name

EDIT: I see you asked the same/similar question here: Help - Trouble Selecting only certain information from an Excel cell

Based on the picture of the excel spreadsheet you posted, it looks like the problem is not with the Regex solution, but instead with your ‘Cell’ variable you are assigning. You are taking the 1st column (column A) with row.item(0).ToString. Instead, you should be taking the 8th column (column H) which is row.item(7).ToString. Alternatively, instead of using the index of 7 you can use the column name instead. I can’t read the whole thing but it is row.item("YourColumnName").ToString

Thank you for your help! The issue is that all the info is within one cell.

@gabimlobo Did the regex solution I provided not work? It worked just fine on my end and expects everything to be in one cell.

Did you see my edit above? If you use a write line activity, what do you get when you put in the Cell variable? I am guessing you are getting “8:00” because you are extracting information from Column A instead of Column H

@Dave when I put a write line activity nothing shows up.

my cell variable to extract the string from the cell should be row.item(7).ToString even if all the information is in the same cell?

That is your issue, not the regex.

row.item(7).ToString means it is looking at the current row, column #8 and converting the value to a string. I would highly recommend changing it so you say the column name instead of the index. So if Column H was named “Names” you could say row.item(“Names”).ToString

If you sent me the .xaml and excel sheet I could help more easily as well

Segunda-Feira.xaml (20.8 KB)
I can’t send the excel sheet because of privileged information, but here is the .xaml

I am having some trouble following along because I’m not sure what all is happening in the very last for each row loop.

I would erase all of that and only do the following. NOTE: replace “YourColumnName” with the actual name of the column

For each row in agendasegunda
Assign NomeCandidato = Regex.Match(row.item("YourColumnName").ToString,"(?<=^Entrevistado: )[a-zA-Z.\- ]+",RegexOptions.Multiline).Value
Message Box NomeCandidato

Right now I believe it is getting the cell information from the last column, which is blank. Doing it the way i mentioned is much simpler and will only get it from the correct column.

Thanks, Dave! But there is more than one column… What should I do then?

You want to grab the text from more than one column? Just use multiple assign activities. Change the column name so it is row.item(“ColumnName”).tostring. do that for every column that you want information from

Hello, @Dave!

I have another quick question. Is there any way for regex to also read special characters? for instance, one of the data that it needs to read is candidato gestão, but it’s only reading candidato gest. Is there any way to fix that?

Right now it is only catching the characters within the square brackets [ ]. This is the [a-zA-Z.- ] portion of the regex statement. You can add individual characters in the square brackets and this will catch it. It looks like you’re getting portuguese words, so you could add all of the portuguese special characters like this: [a-zA-Z.\- áâãàçéêíóôõú] and it should work. If i missed any, just add them in the brackets.