Writing Regex in Matches Activity

Hi,
The following below is my text and i wanted to extract particular text using Regular Expression and pass the Text into a Variable.
For Example : Cashier Name to be Extracted and passed to a Variable
i.e., Only Sanju to be extracted (it can be any length)

“CASH PAID CASH PAID C.\r\nHALDIRAM?'S\r\nRajendrakumar Agr\r\nAnjuman Complex, SatJar Nagpur\r\n255 7812, 2523896\r\nGSTIN : 27AABHR3060GIZC\r\nFASSAI No: 115 18055000586\r\n*** TAX INVOICE *\r\nSHOWROOM\r\n Card Bill\r\nOrder No.\r\n: C209\r\nInvoice No. : 19/20HRKA0614063\r\nDate & Time : 11/16/19 5:57:04 PM\r\nCashier Name: Sanju\r\nPOS: CRKO3\r\nSales Person: VIKKI PILEY\r\nDescription\r\nQty\r\nRate Amount\r\nCASH PA\r\naDryfruit Laddu 50 1\r\n243. 25 243. 25\r\nHSN: 21069099\r\n5% GST\r\nTaxable Amount\r\n243. 25\r\nGST Amount\r\n12. 16\r\nCASH PAI\r\notal Amount\r\n255. 415\r\nGST %\r\nBase Ant.\r\nCGST\r\nSGST\r\n243. 25\r\n6. 08\r\n6. 08:\r\nTotal Incl of GST 255. 41”

Please help out in doing this scenario
Thanks in Advance.

Hi,

You can get it using Matches Activity with the following settings.

Pattern : "(?<=Cashier Name:\s*).*?(?=[\r\n]+)"

Matches activity returns IEnumerable<Match> type variable, so you need to iterate this variable (eg. using For Each Activity) and get the value from it.
If you know there is only one target in your string, you can also get it with the following using Assign activity.

strResult = System.Text.RegularExpressions.Regex.Match(strData,"(?<=Cashier Name:\s*).*?(?=[\r\n]+)").Value

Regards,

Thanks for your answer Yoichi and its working fine
Now here is an extension to my question.
i.e., what could be the expression if I want to extract multiple things like Invoice No.,Date & Time ,Sales Person,Cashier Name,POS,Order No.,HSN,Taxable Amount.
I want to get the result in Single Expression.

If all of your lines are split by \r\n I’d probably split the text first using the String.split function. Then each item will be in one string in the list so you’d end up with:

Card Bill
Order No.
: C209
Invoice No. : 19/20HRKA0614063
Date & Time : 11/16/19 5:57:04 PM

In each item of a list.
Then you can search the list (or use an index number if the structure never changes) to get the item that contains the thing you want, like Invoice No.
Once you have the item, a single string that reads “Invoice No. : 19/20HRKA0614063”, you know they’re all split by a colon, so you can search for the colon and take a substring starting from where the colon is to get the value for that item.
Regex can be hard to maintain and if anyone else has to change it later they spend a long time trying to understand what your regex does, so if there’s an easier way you may as well try it.

Hi,

We can extract them using Single regex expression. However, it might be complicated to handle its result after this process. So I recommend to use For Each loop and Dictionary like the following sample. Can you try?

Sample20200110.zip (10.6 KB)

Regards,

One last question from my side, I have invoice that may have multiple items, what would be the regex for multiple items in case i don’t know the number of items in an invoice.
For Example : image

You can use my Previous Scenarios text to explain this…

i.e: “CASH PAID CASH PAID C.\r\nHALDIRAM?'S\r\nRajendrakumar Agr\r\nAnjuman Complex, SatJar Nagpur\r\n255 7812, 2523896\r\nGSTIN : 27AABHR3060GIZC\r\nFASSAI No: 115 18055000586\r\n*** TAX INVOICE * \r\nSHOWROOM\r\n Card Bill\r\nOrder No.\r\n: C209\r\nInvoice No. : 19/20HRKA0614063\r\nDate & Time : 11/16/19 5:57:04 PM\r\nCashier Name: Sanju\r\nPOS: CRKO3\r\nSales Person: VIKKI PILEY\r\nDescription\r\nQty\r\nRate Amount\r\nCASH PA\r\naDryfruit Laddu 50 1\r\n243. 25 243. 25\r\nHSN: 21069099\r\n5% GST\r\nTaxable Amount\r\n243. 25\r\nGST Amount\r\n12. 16\r\nCASH PAI\r\notal Amount\r\n255. 415\r\nGST %\r\nBase Ant.\r\nCGST\r\nSGST\r\n243. 25\r\n6. 08\r\n6. 08:\r\nTotal Incl of GST 255. 41”

Hi,

I’ll attach a sample for the invoice as the following (See Main2.xaml and Sample2.txt)

Sample20200110-2.zip (13.0 KB)

We need some common rule to extract these key and value if we don’t know the key names. The invoice seems to have a simple common rule. However the former text seems to not exist common rule about key and value string, we need to solve it.

Regards,

I’m struck with an error saying “Index was outside the bounds of the array” and i have no idea on the topic Split could you please give some details on the topic.
I have also attached the project files.

Split_Error_1.zip (39.6 KB)

Hi,

My previous sample is assumed there is “-” continuous separator like “--------” in input string. However your text file has no separator, so you don’t need this assign activity which error occurs. Can you try to disable it?

Regards,

As separator “-----” doesn’t appear in your text, maybe you can try this

System.Text.RegularExpressions.Regex.Split(Result,"\d{1,2}:\d{1,2}:\d{1,2} .M\r\n")(1)

regexOutputVariable(0)

write like this

It’s not working…

if my guess is right the Text you have extracted is quite different in format that i have extracted that may be the reason that the expression is getting failed to fetch the data Items,Qty,Pricecould you please verify the Regex in the sample below.
Sample_3.zip (42.1 KB)

Hi,

I modified your workflow (Split_1.xaml) as the following. (Note: it seems there is no Qty for each item. So this sample does not have logic for it.)

Split_Error_1_v2.zip (40.9 KB)

Hope it helps you.

Regards,

Hi Yochi,

I Just wanted to know what kind of OCR did you use to get the text because i was unable to extract “QTY” by using Microsoft Azure in my text where as you are able to Extract in your sample.

Hi,

Actually, the text file in the sample was extracted by not OCR but hand input. From my experience,I suppose Google Cloud vision is better than Microsoft Vision. I think it’s worth a try.

Regards,

Could you Please elaborate on the “Hand Input”.
is it an Activity…

Hi,

Sorry it was wrong English. It’s by manual input.

Regards,

I require one last help from you I just wanted Qty also to be displayed…Sample20200131.zip (14.4 KB)