Hi,
The following below is my text and i wanted to extract particular text using Regular Expression and pass the Text into a Variable. For Example : Cashier Name to be Extracted and passed to a Variable
i.e., Only Sanju to be extracted (it can be any length)
You can get it using Matches Activity with the following settings.
Pattern : "(?<=Cashier Name:\s*).*?(?=[\r\n]+)"
Matches activity returns IEnumerable<Match> type variable, so you need to iterate this variable (eg. using For Each Activity) and get the value from it.
If you know there is only one target in your string, you can also get it with the following using Assign activity.
Thanks for your answer Yoichi and its working fine
Now here is an extension to my question.
i.e., what could be the expression if I want to extract multiple things like Invoice No.,Date & Time ,Sales Person,Cashier Name,POS,Order No.,HSN,Taxable Amount.
I want to get the result in Single Expression.
If all of your lines are split by \r\n I’d probably split the text first using the String.split function. Then each item will be in one string in the list so you’d end up with:
Card Bill
Order No.
: C209
Invoice No. : 19/20HRKA0614063
Date & Time : 11/16/19 5:57:04 PM
In each item of a list.
Then you can search the list (or use an index number if the structure never changes) to get the item that contains the thing you want, like Invoice No.
Once you have the item, a single string that reads “Invoice No. : 19/20HRKA0614063”, you know they’re all split by a colon, so you can search for the colon and take a substring starting from where the colon is to get the value for that item.
Regex can be hard to maintain and if anyone else has to change it later they spend a long time trying to understand what your regex does, so if there’s an easier way you may as well try it.
We can extract them using Single regex expression. However, it might be complicated to handle its result after this process. So I recommend to use For Each loop and Dictionary like the following sample. Can you try?
One last question from my side, I have invoice that may have multiple items, what would be the regex for multiple items in case i don’t know the number of items in an invoice.
For Example :
We need some common rule to extract these key and value if we don’t know the key names. The invoice seems to have a simple common rule. However the former text seems to not exist common rule about key and value string, we need to solve it.
I’m struck with an error saying “Index was outside the bounds of the array” and i have no idea on the topic Split could you please give some details on the topic.
I have also attached the project files.
My previous sample is assumed there is “-” continuous separator like “--------” in input string. However your text file has no separator, so you don’t need this assign activity which error occurs. Can you try to disable it?
if my guess is right the Text you have extracted is quite different in format that i have extracted that may be the reason that the expression is getting failed to fetch the data Items,Qty,Pricecould you please verify the Regex in the sample below. Sample_3.zip (42.1 KB)
I Just wanted to know what kind of OCR did you use to get the text because i was unable to extract “QTY” by using Microsoft Azure in my text where as you are able to Extract in your sample.
Actually, the text file in the sample was extracted by not OCR but hand input. From my experience,I suppose Google Cloud vision is better than Microsoft Vision. I think it’s worth a try.