Extract Double-Lined Field from PDF Invoice to Excel

Hi there, I would like to capture the supplier name as shown below using Regex Matches. However, I’m only able to capture “UNITED SPACY PLUS EQUIPMENT PTE” and the “LTD” is always left out as it is on a separate line. Have tried using “\n” in my Regex but it doesn’t seem to work. When I run the cursor from left to right on the PDF file, it runs from “UNITED SPACY PLUS EQUIPMENT PTE” to “LTD” then move on to “Invoice Date”.

The current Regex Matches statement I’m using is “Name:\s+([@\w.\s-&()']+)\s+Invoice”.

Please help and offer some advice if you know the solution, thanks.

HI @jerryl27

The same query i have solved already. Below is the thread for the same

Have a look through it.

Mark as solution and like it if this helps you :slight_smile:

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

@jerryl27

Can you write it to text file and share the file, so that we can check and try to help on regex

Thanks

Yes it is a brilliant concept! I’m amazed…must have took a fair bit of effort to figure this out. Much thanks!

Hi @jerryl27

Yeah. It is. Me too Learn at that time only by solving that post issue.

You are welcome :innocent:

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

Thanks for offering your help. It has been solved :slight_smile:

1 Like

Hi Pratik, I encounter a situation. You see, I realised not all the names are double-lined. When the bot runs into single-lined Names, it captures the next line - Address line into the capture as well. I think this requires a change in my Regex: "(?<=Name:)(.*\s.+)(?=Invoice Number:\s.+)(.+)(\s.+). Will you be able to help?

Double-lined Name:

Single-lined Name:

Hi @jerryl27

If possible then can you share me the pdf for the same ??

Bcz without actual data its hard to implement the exact regex

And also let me know all the input and the expected Output for the same

Another thing just try using | Pipe i.e., OR

If Your regex is this for now = > (?<=Name:)(.*\s.+)(?=Invoice Number:\s.+)(.+)(\s.+)
Then try doing as below

(?<=Name:)(.*\s.+)(?=Invoice Number:\s.+)(.+)(\s.+)|(?<=Name:).+(?=Invoice Number:)

Happy Automation :raised_hands:

Best Regards
Er Pratik Wavhal :robot::man_technologist:t4: :computer:

Thanks, what you have proposed is something I have not thought of. Will try it out.

For the time being, this method that I tried seems to work for now (but I have no reason why it works :slightly_smiling_face:):
“(?<=Name:)(.*\s.+)(?=Invoice Number:\s.+)(.+)(\s\D+)Address”

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.