I need to extract all the details from invoices pdf and line item describtion quantity and all the fields and i need to do this for all pdf files in the folder

Please help me out
InvoiceDataTemplate.xlsx (12.2 KB)
VendorA - 1.pdf (309.1 KB)

Hi @aparna3010

For looping through your pdf’s in a folder, you can use assign activity and assign a variable as Directory.GetFiles(Yourfolderpath)
You can use Document understanding concept to extract your required data from the pdf
Please refer this link :

I want to do this by using regex…i dont want any human intervention

DataExtrationExample
use two forloop first loop for getfiles from folder
and second folder for dataExtration from pdf

  1. In first for loop you must use read pdf text and create output variable inside in read pdf activity
  2. after you need to split string into list so create list type of variable and and store split data.
  3. pass List variable into second forloop
  4. inside second you can perform dataExtraction

The pdf is scanned so it is not extracting by suing read pdf text

Hi,
You can use document understanding for scanned doc’s without human intervention as well.

Still if you want to use regex. Then extract the pdf data using OCR as it is an image …scanned copy…then use ur regex to extract.

Please let me know if you still face ny issues

Thanks and Regards,
Geetishree Rao

1 Like

Could you please help me out with the code for regex?

Do you have the digitized text…pl share that…will try with that.
Is there any specific reason or problem for not using document understanding

you can use read pdf with ocr and inside that activity you can paas abby ocr so it will extract all text. Hope this will help you

Thanks & Regards
Pawan Rajpurohit

Aparna ,
Meanwhile go thru this link for regex…it may help
How to extract data from PDF's with RegEx in UiPath - Full Tutorial - YouTube.

Mark it as solved if it helps

I have already shared the pdf and excel file

taxonomy.json (8.7 KB)
Invoice.xlsx (10.5 KB)

Dear Aparna,

Attaching the xaml and other files for ur reference.
Extracted Data from Invoice using Doc Understanding and form extractor without Human validation.

Just add your API Key for doc understanding in variable strAPIKey in the below xaml.

Hope it helps
ForumInvoice.xaml (23.5 KB)

If u still want Regex then go for:
Invoice No:
INVOI\s[#]\sC\s(\d+)\sE\s(\d+)\s*

Invoice Date:
ACCELIRATE\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Date:\s+(\w{3})(\s)(\d+)(,)(\s)(\d{4})\s+

Due Date:
Due\ Date:\s+(\w{3})(\s)(\d+)(,)(\s)(\d{4})\s+

Email:
(\w+)(@)(\w+)(.)(\w+)\s+

Phone:
(+)(\d)(\s)(0-9?)?((?[0-9]{3})?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})

I will try out for line items and let u know

i am getting this issue when i opened the workflow? how to resolve this isuue?

Ok Thanks please let me know

Add the uipath ocr or omnipage o
cr…add the doc understanding key.
That ocr is missing there

I have extracted all line items as well
You can check the tabs in excel

Add packages from manage package-all-
Document understanding package
Intelligent ocr
Omnipage ocr activities and extended activities

U have to add the taxonomy as well…please go through a document understanding video in you tube or in academy…This will give u an overall idea of how to set up the project

The Doc structure has to be added in taxonomy manager

Ok I have installed all the packages but still having that missing activity error

Hi @aparna3010
Try Installing OCR activities package Uipath.OCR.Activities

1 Like

Thanks a lott…I got the output…I just wanted to know how you have removed that $ symbol and i want that item table in the first sheet only how i can get this? and and this whole thing i want to do for other pdfs in a folder as well whose format is diff …Please help