PDF to Excel - Extract structured data

How to get a table in pdf and move to an excel file?
I used the “Extract structured data” activity, but it is not taking a table, a blank activity. Can someone help me?

Hi @Gabrielle_Rodrigues,

If it’s structured data table then go for extract Structured data else go to **screen scraping **activity.
If suppose structured datatable.
1.open the PDF file.
2.ten use the extract Structured datatable to get the datatable.

Regards,
Arivu

2 Likes

Ola @arivu96
I did just that, but the activity is not catching anything.

Hi @Gabrielle_Rodrigues
If suppose table is there in 3rd page of pdf means you need to go there then only extract the data.

Make sure Did you went to the correct page in PDF???
Refer below xaml file to extract the data

pdfdataExtraction.xaml (21.5 KB)

Regards,
Arivu

I’ll try. Thank you!!

Hi @Gabrielle_Rodrigues

I want to move unstructured data (For eg- the data contained in the invoices) into excel. How could i do that?

Hi Shaista

I could not. Unhappily

Hi @SHAISTA,

@Gabrielle_Rodrigues

you need split the string based on Environment.Newline or \t or space bar or based on string length or any special char based on the string to split the data then append it into datatable and using write range activity write the datatable into excel sheet.

Regards,
Arivu

I am using data scraping to extract the data from pdf and put it under excel sheet. But i am getting the data of only first pdf as output.I have put the for-each loop but then also it is reading the data of first pdf.

Shaista,After data scraping you are updating the scraped data to excel file.What is the output you’re getting after writerange in excel file.Are you getting 4rows.

Hi Ariva,

Could you please help me in PDF to Excel task.

i would like to know how we need to find the excel data in PDF and update the same in Excel if any changes needs to be done.

i can explain you better in the call if you not mind please do call me on 9966215635.:grinning:

Thanks for all your support.

Regards,
Gopi Krishna

Hi @arivu96 and everyone,

I have a scenario whereby I have a Table A in PDF 1 located in page 6 of the PDF and I want to use this workflow for a list of PDFs (i.e. PDF 2, PDF 3,…).

However, my problem is that Table A could be located in a different page within PDF 2 and it doesn’t have a table name. May I know how can I go to the correct page AND extract Table A from that page?

Hope to hear from anyone on this.

Thanks in advance!

Hi arivu96,

I had tried with the given workflow sample to extract structure data from the pdf. Here i found one challenge is i am not able to select table as “selector”. It will take whole pdf in selection.

can you please explain how “
” this selector can be extract in PDF file.

can you please guide me.

Thanks in advance.

HI can you share your project with me? I tried to do the same for 1 week, kept failing

1 Like

Hi @arivu96… I have tried ur xaml file but the output not in correct place. How to make the data follow exactly same row and column in PDF? Thanks.