Text pdf automation


I am new with pdf work.
I am stuck with PDF automation, my pdf is a system generated pdf and its format is fixed and i have to extract the data in a well format, as after format i have to put those extracted data to a web application form field, for now i wan to extract the same data to and excel in well format.

I am attaching a sample pdf.

testpdf.pdf (228.0 KB)

and the output in excel i am trying to extract, please note there is two sheet.OutputResult.xlsx (8.6 KB)


Hi, try using ‘Read PDF’ activity and then use string operations like split to get required data from PDF text.

I have no idea, with that, i did tried to read it from Read PDF test activities, waiting for an example or any breakthrough.

can you please share the excel how you want to arrange the data and what you required from PDf ? @indrajit.shah

Sequence.xaml (9.3 KB)

I am attaching sample File Where You can get the out by doing match activity … String Manipulation

I am planning to extract in one spreadsheet but in to sheets, where in sheet1 the header part will be extracted and in sheet2 body details will be extracted.

for every pdf there will me one ore multiple body records.

Please find attached is the excel in which i had created 2 sheets with HDR and DTLOutputResult.xlsx (8.6 KB)

I am trying from my side but didn’t get any break through.

so you want to extract the pdf data and need to update in the excel as you mentioned.

1 Like

yes Sir, actually i have to put the extracted data into a web form. as of now i have to keep the records in excel so when i will enter the data to the web system i will retrieve the data form excel.

Thank you @Vijay_RPA for the reply, i appreciate your effort sir.
But it wont help me get the details as i am looking for

please see the excel attached, there is 2 sheet i am looking for data extraction like that.OutputResult.xlsx (8.6 KB)

Yes i did sample Base you can modify in Match activity Based on Regex Match … Try Regex101.com ,Or search in Forum… Like this

Got your point sir, correct me if i am wrong here you are using “Request name” as a reference to find the item related to that, but in my pdf the reference point is for very limited and not all the data have this kind of reference.

Could you give us screenshot with extracted text to see the form od extracted text?

@qlka08, please find the attached excel, there is two sheet where i am looking to extract the data.
OutputResult.xlsx (8.6 KB)

It seems that Company & address fields do not have reference labels and may not be extracted correctly. For others, you can use mid, indPdfExtract.xaml (7.3 KB) exof functions. I have attached a sample xaml which extracts client no.

thank you for your valuable input.

it wont help man, i can get that if there is any reference i can get the details from there with regex also but what about this below part ?

Firstly, extract Item details from PDF extract. Then,Loop through each “Pos no” to extract row. you can use regex pattern for description & quantity.

You are saying Request name will not be Consistent in all PDFs Right

yes Requester: will not be constant but the below are will be constant.

and extracting the detail part is also important

i tried but its not working fro me. when extracting with reference to Pos. or Desc or Quantity or Quantity Unit its not giving the return, plus the Desc. part is not fixed it will change from pdf to pdf its length is not fixed.

Test1.xaml (19.8 KB)

Check this xaml once

got the all information

1 Like