Data extraction from PDF file

How to extract specific data from a PDF file which is not in tabular form just plain text?

1 Like

Hey @Hridhya

Mostly it’s based on labels or patterns…



Can you share a sample?

1 Like

Hey @Charbel1

I don’t have a sample handy, but if you have any PDF samples. May be I can help you with a small POC.


Hello @Hridhya,

One of the most common took is “Regex”.

Also: Document Understanding - AI Document Processing | UiPath

Hope it helps! :slight_smile:

Hey @Charbel1 @Hridhya

Try this example …
Put the pdf file into “PDF PATH” Folder to try this example.
In this i had used the OCR method to extract all the plain text data from pdf and using regex to get the specific data from extracted data from OCR.

Main.xaml (24.8 KB)
invo1.pdf (93.3 KB)