Extraction data from pdf to Excel

Hello Team,

I have a pdf with 160 pages of data .here scenario is-( all columns names are same to extract but 11 diff columns data from 160 pages )
1.model
value
cost
description… likewise.
2.model
value
cost
description… likewise.
i have tried many ways to extract that to excel can anyone pls help me with that?

Hi @sasi_kiran
=> Use Read PDF Text or Read PDF with OCR to read the PDF and store the output in a variable.
=> Use Generate Data Table from Text activity and pass that variable. Stote the output in a variable which will be of datatype Data Table.
=> Use Write Range Workbook activity and write the data into Excel.

Hope it helps!!

no it is not working
here any particular col’s not specifying so not extracting any data

@sasi_kiran

If possible share the input PDF so that I can help you

Regards,

I can’t share you the pdf since it is confidential.
But i will share the some blue print type how it have data and how i need to be extracted.
First 13 pages theory

Model :value

Module :value

Code :value

Description:value

BTC :value

Shops :value

Date :value

Inspection:value

Estimated:value

Part’s**:value**

Test :value

Repairs:value

Major :value

Minor :value


Model :value

Module :value

Code :value

Description:value

BTC :value

Shops :value

Date :value

Inspection:value

Estimated:value

Part’s**:value**

Test :value

Repairs:value

Major :value

Model :value

Module :value

Code :value

Description:value

BTC :value

Shops :value

Date :value

Inspection:value

Estimated:value

Part’s**:value**

Test :value

Repairs:value

Major :value

Minor :value


Model :value

Module :value

Code :value

Description:value

BTC :value

Shops :value

Date :value

Inspection:value

Estimated:value

Part’s**:value**

Test :value

Repairs:value

Major :value

Minor :value


Model :value

Module :value

Code :value

Description:value

BTC :value

Shops :value

Date :value

Inspection:value

Estimated:value

Part’s**:value**

Test :value

Repairs:value

Major :value

Minor :value

Like wise…all 160 pages… so

I need to extract model,module,code,description,btc,shops,date,inspection,estimated,parts,test…… into Excel.

Hi @sasi_kiran ,
Can you share your file?
You can try read pdf with ocr to get data then generate strong to data table, then write to excel
regards,

Hello,

Sorry can’t share file.
but above is the blue print for my file. can you please help.I tried wt u said but not working

Thanks,
Bhavya

Hi @sasi_kiran ,
To difficult to solution, because I have know about your file, I have a solution about topic which is similar your topic,
https://forum.uipath.com/t/question-how-to-convert-pdf-to-excel/582752/6?u=nguyen_van_luong1
hope it help,
but to he most exactly, I want to know your file, you can create sample or send message to me
regards,

Hi @sasi_kiran

You can try with regex to extract the data from pdf to excel.

@sasi_kiran

Looking at the structure we understand that each value is associated with a key and each value is in new line… so please try as below

  1. Read the pdf into a variable of type string
  2. Use regex to extract as below

Modelvar = System.Text.RegularExpressions.Regex.Matches(str,"(?<=Model\s*:).*")

Modelvar is of type ienum of Match

You can loop through the above variable using for loop and get the data

Similarly model can be replaced with other keywords to get the details of remaining values.please try the same

Cheers