How to extract the data from the pdf using regex i try but i fail

Hi team i need to extract the data which is marked in blue


Invoice.pdf (323.5 KB)
I hope i get the solution
and i extract a few things like scored (3617) and purchase order and item number

but i dont know how to extract the UOM and QTY and 2000

I hope some one will help and i share a sample file to you in the above

thanks
Chethan P

Hi @copy_writes
For UOM and QTY

From this what you have to do is
Take this whole string as array and split them like

  • after getting the match of regex just store in a string (“Variable”)
  • Another regex

    Store in a string variable “Str2”
  • Assign a String of array Arr_Str= Split(Str2.Trim," ")
  • and for UOM : Arr_Str(2)
  • and for Qty : Arr_Str(3)

I hope you confused i need item which they marked by blue color

This will get the UOM and QTY

This Field remains constantly in the same place ( whether It will be always in the 4th line?)?

yes its in the 4th line or 5th line some times

Hello @copy_writes

Here is the regex for the 4th or 5th Line
Link: regex101: build, test, and debug regex

You can assign it by trimming it

Regards
Sudharsan

2 Likes

Can you please explain how?

You are asking the last one right?

this one

Yeah sure,

When you use the regex I mentioned before this step
It will take the whole first line as regex match like this

If we split this string by space it will take Str as 1st array ,huge as 2nd array and so on

If we use this “\S\s\s” pattern and replace it with “#” ,This will become one complete array

So that we can get the UOM and Qty in 2nd array and 3rd array

Regards
Sudharsan

1 Like

(.\n)\n(?=Contact) we can use this code? instead of this (.)(?:.*\n){2}.*Contact
Because in that we have to extract all extract space?

1 Like

Yeah You can use this also,

Thanks for sharing brother

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.