Extract multiple lines from pdf

Hi all, I have an issue while extracting multiple lines from the same column from pdf using the read pdf activity.

the text extracted after the read text activity is as follows:

Period ended 31/10/2021 Period ended Period ended Period ended
ASSETS 31/12/2020 LIABILITIES 30/09/2021 31/12/2020
Gross Dep/Prov NET NET NET NET

Kindly find attached screenshot of the pdf

appreciate the help

Hi @mounir.mohsen

Try using “Document Understanding” Technique to extract the data from specific fields from PDF.

Thanks

Hi @mounir.mohsen ,

Is the PDF a Digital PDF ? If so, have you tried enabling the PreserveFormatting Property and checked the extraction ?

Hi @supermanPunch , thank you for replying

The pdf is digitalized and I have tried the preserveFormatting property. It did work by keeping the structure of the pdf as text file which is good. However, how can I process this to be set as columns header with text as follows


                                                                   Period ended 31/10/2021                   Period ended                                                        Period ended      Period ended
                     ASSETS                                                                                  31/12/2020                         LIABILITIES                      30/09/2021         31/12/2020
                                                        Gross            Dep/Prov             NET               NET                                                                 NET                NET

How can I create header in csv file with this data

@mounir.mohsen , Could you also provide us with the Expected Output that you require for this input pdf data ? If you could show us a Screenshot it would be helpful and we could analyse further as to what needs to be done or we could understand if we really need to shift to Document Understanding method.