PDF Data Extraction using Regular Expression

Hey all,
I need to extract the GSTIN of SpiceJet Limited, Invoice No, GSTIN / UIN of Customer and Total Invoice Value from the PDF using Regular Expression.
I’ve done this Xaml file, but couldn’t remove the colon from extracted data.Sequence7.xaml (9.3 KB) .

Thanks in advance.

1 Like

@NiranjanKN Try to use replace method of string

@NiranjanKN Check below xaml file.

Sequence7.xaml (12.6 KB)

output file.

Out_Sample.xlsx (7.2 KB)

@Manjuts90 @lakshman @ClaytonM @n I used (?>UIN of Customer:) (.)(?=Place of Supply) and got value of GSTIN / UIN of Customer. But, if I use (?>Invoice No:)(.) it’ll fetch the entire row data.
But, I need only Invoice Number and not the Original Invoice number.
How can that be extracted.

@Manjuts90 Can you please explain me this : ReadPDF.Split({"GSTIN of SpiceJet Limited : "},StringSplitOptions.RemoveEmptyEntries)(1).Split({Environment.NewLine},StringSplitoptions.RemoveEmptyEntries)(0).Trim Regular Expression which you have used.

@NiranjanKN

First split method split the whole pdf data into array of 2 elements, first element of array contains text before “GSTIN of SpiceJet Limited :” and second element contains text after “GSTIN of SpiceJet Limited :”
So i took second element from array which contains required text for further processing. Since 2nd element in array contains text in multiple lines so i splitted the 2nd array element with respect to newline. So i got new array with each line as one array element.
Since value is present in first element of new array, so i took index of the element as “(0)”. Trim is used to remove extra spaces front and back of the string.

If you still have any doubts let me know

@Manjuts90 Can you explain me as to how you got this :
System.Text.RegularExpressions.Regex.Match(ReadPDF,“(?<=Invoice No: ).+”).ToString
I also tried it, but since there were two words matching Invoice No:, how did it select the second Invoice No:, not the first Invoice No:.

@NiranjanKN I have given condition like below. after No: i have given space where as in first number after space not exists after “:”

“(?<=Invoice No: ).+”

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.