How to exract the data from pdf and store into excel?

hey guys, i need a help to exract data from pdf file there are 1-15 pages are there i need a specific transaction data ,
ex; in this pdf 2 page, after word of transaction history like one transaction. i need to exract and store to excel file.
i attached one pdf file please help me to create a flow.without using document understanding.
Thankyou

this is example pdf image.
new.txt (69.1 KB)this is after converted txt file

@Kalimuthu_Kalimuthu

Please Highlight the data which you want to extract

Hi @Kalimuthu_Kalimuthu

Please specify what is text you want to extract. We will help you out with regex.

Regards


this not only in single page same data need to exract where ever the word transaction history after the table i wnat.

Hi @Kalimuthu_Kalimuthu

Your image has been not uploaded properly.

Regards

Hi @Kalimuthu_Kalimuthu

Try this regex:

\d{1,}\/\d{1,}\s*[\<A-Za-z].*\d*\,?\d*\.\d*

=> Use the above regex expression in Find Matching Patterns activity and input will be the output of Read PDF Text.

Refer the below workflow for better understanding:

Verigy the output with your lineitems:
Output Text.txt (24.6 KB)

Hope it helps!!
Regards

@Kalimuthu_Kalimuthu

(?<=Transaction history)[\s\S]?((?=Transaction history))|(?<=Transaction history)[\s\S]?(?=Total amount)

Please Try This Hope this will help you

Use Find Matching Pattern Activity

one line item is missing in description other wise your given output is okk,
i will share the example pdf ,exract and send the xaml for me.
thankyou
test.pdf (333.2 KB)

its not working @rlgandu

@Kalimuthu_Kalimuthu

Please Mention The data from where to where you want to extract

Highlight it in pdf it will be easy

Do not copy and paste the regex from the above try to type the expression by seeing the Screenshot

I will add sample pdf,
in this pdf after the Transaction history have one transaction table is there
that one aslo different tables are there in the pdf. all the table i want extract and store it to excel file
thankyou @rlgandu
test.pdf (333.2 KB)

@Kalimuthu_Kalimuthu

I hope this work

Hi @Kalimuthu_Kalimuthu

=> Use Read PDF Text to read the PDF and store the output in a variable say str_Text.
=> Use Write Text File activity to write the extracted data from PDF to a text document.
=> Use Find Matching Patterns activity and give the below regex:

(?<=\n\s*)\d{1,}\/\d{1,}\s*[\<A-Za-z].*\d*\,?\d*\.\d*

Properties of Find Matching Patterns:


=> Use For Each loop to iterate through Matches variable and use wherever needed.
Workflow:

Happy to help if any questions.

Regards

can you look out this
@mkankatala

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.