Extracting Multiple Text from a PDF

Hello UiPath Community :wave:,

I hope you’re all doing well! I’ve got a bit of a challenge and could use some guidance. I need to extract multiple texts from PDF files and store them neatly in an Excel sheet. :scroll:

Are there any seasoned automators who can share some wisdom or point me in the right direction? Your expertise would be a game-changer for me! :rocket:

Thanks a bunch in advance! :sparkles:

Hi @Muhammad_Anas_Baloch

->Use Read PDF with OCR if it’s an scanned copy files and If they are normal PDF’s then you can use the Read PDF Text activity.
->Use the Matches activity and pass the Regex Expression and the output of the Read PDF activity so that you need to get the LineItems.
->Use the for each activity and iterate through each LineItem and use the regex expression on the CurrentItem.
->After Extracting the data by using the Regex Expressions, you can store the extracted data in respective variables.
->You can use the write cell activity and then you can send the extracted data into respective cell’s.

Regards

1 Like

Hi @Muhammad_Anas_Baloch

→ For extracting the text from the PDF’s, you have to use the Read PDF text activity for the structured documents and read pdf with OCR activity for the unstructured documents.
→ After that the text is stored in a string datatype variable, use the regular expressions to extract the required text.
→ After that use the Excel activities to write the extracted text to excel.

Or

If you are interested in document understanding. You can use the document understanding and AI center to extract the required data.

Hope it helps!!

1 Like