Hi all, I have an issue while extracting multiple lines from the same column from pdf using the read pdf activity.
the text extracted after the read text activity is as follows:
Period ended 31/10/2021 Period ended Period ended Period ended
ASSETS 31/12/2020 LIABILITIES 30/09/2021 31/12/2020
Gross Dep/Prov NET NET NET NET
Kindly find attached screenshot of the pdf
appreciate the help
Hi @mounir.mohsen
Try using “Document Understanding” Technique to extract the data from specific fields from PDF.
Thanks
Hi @mounir.mohsen ,
Is the PDF a Digital PDF ? If so, have you tried enabling the PreserveFormatting
Property and checked the extraction ?
Hi @supermanPunch , thank you for replying
The pdf is digitalized and I have tried the preserveFormatting property. It did work by keeping the structure of the pdf as text file which is good. However, how can I process this to be set as columns header with text as follows
Period ended 31/10/2021 Period ended Period ended Period ended
ASSETS 31/12/2020 LIABILITIES 30/09/2021 31/12/2020
Gross Dep/Prov NET NET NET NET
How can I create header in csv file with this data
@mounir.mohsen , Could you also provide us with the Expected Output that you require for this input pdf data ? If you could show us a Screenshot it would be helpful and we could analyse further as to what needs to be done or we could understand if we really need to shift to Document Understanding method.