Hey all,
I need to extract the GSTIN of SpiceJet Limited, Invoice No, GSTIN / UIN of Customer and Total Invoice Value from the PDF using Regular Expression.
I’ve done this Xaml file, but couldn’t remove the colon from extracted data.Sequence7.xaml (9.3 KB) .
@Manjuts90@lakshman@ClaytonM@n I used (?>UIN of Customer:) (.)(?=Place of Supply) and got value of GSTIN / UIN of Customer. But, if I use (?>Invoice No:)(.) it’ll fetch the entire row data.
But, I need only Invoice Number and not the Original Invoice number.
How can that be extracted.
@Manjuts90 Can you please explain me this : ReadPDF.Split({"GSTIN of SpiceJet Limited : "},StringSplitOptions.RemoveEmptyEntries)(1).Split({Environment.NewLine},StringSplitoptions.RemoveEmptyEntries)(0).Trim Regular Expression which you have used.
First split method split the whole pdf data into array of 2 elements, first element of array contains text before “GSTIN of SpiceJet Limited :” and second element contains text after “GSTIN of SpiceJet Limited :”
So i took second element from array which contains required text for further processing. Since 2nd element in array contains text in multiple lines so i splitted the 2nd array element with respect to newline. So i got new array with each line as one array element.
Since value is present in first element of new array, so i took index of the element as “(0)”. Trim is used to remove extra spaces front and back of the string.
@Manjuts90 Can you explain me as to how you got this :
System.Text.RegularExpressions.Regex.Match(ReadPDF,“(?<=Invoice No: ).+”).ToString
I also tried it, but since there were two words matching Invoice No:, how did it select the second Invoice No:, not the first Invoice No:.