How Extract Particulart data from multiple pdf which have same format

Here i want to extract Invoiceno , date and file name from multiple pdfs , pdf are tagged , which method i should use to get relevant text data. later i want to write data in an excel file.
@raja.arslankhan
Sequence.xaml (13.7 KB)
Invoice1.pdf (92.9 KB)
Invoice2.pdf (93.1 KB)
Invoice3.pdf (93.1 KB)
Invoice4.pdf (93.1 KB)
project.json (1.6 KB)

Hi @Vidhi_Patel

You can try with Regex Expression

Check out the XAML file

image

ExtractDatafromPDFRegex.xaml (10.7 KB)

Output

PdfFile1.xlsx (7.3 KB)

Regards
Gokul

1 Like

@Vidhi_Patel your pdf have same template so I will recommend use pdf packages and get data through OCR.
Give me output data which you want to extract

1 Like

Thank you gokul , but i don’t understand Regular expression much. let me know if any other way exists.

Which OCR Engine should i use? Uipath screen OCR?

You can easily achieve via regex expression. it is a simple way to achieve the output. Have you tested the XAML file? @Vidhi_Patel

1 Like

@Vidhi_Patel yes try with that also its a good option when you are not able to make regex

1 Like

Hi @Vidhi_Patel

You can easily understand the Regulars expression tutorial

You can also work on this

Regards
gokul

1 Like

Read PDF With OCR: Error performing OCR: Invalid API key specified UiPathOCRInvalidApiKey , Getting this error while scraping what i should write inside API property.

Thanks a lot Gokul! i understood regex and solved problem as well. @Gokul001

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.