Read PDF Text cannot get the values correctly

Hi all,

As we usually receive many invoices/debit notes in PDF format and want to read it with the RPA.
However, i find the following problem that makes my RPA failed, would anyone help.

I use “Read PDF Text” to read the PDFs.

All PDFs are in the following format:
image

However, when i copy and paste these PDFs in Notepad, their actual formats are:

Some are…

Vendor Code: 00001
Invoice No.: 11111
Page 1 of 1

But some are…

00001
Invoice No.: 11111
Page 1
of 1
Vendor Code:

Some are

Vendor Code:
Invoice No.:
Page
00001
11111
1 of 1

many many formats…

How can i get the Vendor code, Invoice number correctly?

Thanks so much

Can you share a sample pdf ?

Hello today
I have this problem.
Did you have this solution?

@yejin_kang1 - please use Preserve Format to true in the read PDf activity. And after that share the text file…we can help you with the Regex to extract the details …

Thank you for helping me! It still doesn’t work, but I’ll try a more.