Create a datatable from unstructured PDF file

Fer · October 30, 2019, 4:26pm

Hi,

I have a PDF with the following structure:

I want to convert the PDF into a datatable. I tried with no success using the following activities:
i) data scrapping
ii) screen scrapping with native, fulltext, OCR options
iii) read PDF with OCR
iv) read PDF text with Generate Data Table

Currently, I am using the string manipulation to get the PDF information I need by using the String.Split(). I will be reading multiple PDF files, whose text size is variable except the image titles (bold text). In addition, UiPath is case sensitive to text extracted and therefore, I do not consider this as reliable solution.

Could someone tell me how can I convert this PDF to a datatable?

I would like to have like this:

               Column 1               Column 2
Row 1          Moeda                  EUR
Row 2          Montante               O montante mínimo de constituição é de 250,00EUR...
Row 3          Reforços               São admitidos reforços em qualquer momento, devendo obedecer...

Thanks

AshwinS2 · October 30, 2019, 5:12pm

Hi @Fer
Use String.Split(“”,Environment.NewLine.Toarray)

Thanks
ashwin S

Fer · October 30, 2019, 5:22pm

@AshwinS2, I can’t understand your solution. Where are you suggesting to use the String.Split(“”,Environment.NewLine.ToArray) ?

AshwinS2 · October 31, 2019, 4:23am

Hi @Fer

After you scrap the text try to split the text

Thanks
Ashwin.S

Fer · October 31, 2019, 10:44am

@AshwinS2, at the moment that’s what I am doing. However, I do not consider it a reliable solution since the text field varies a lot.

I was asking for another solution.

Topic		Replies	Views
How to get datatable from string Studio studio , question , project_panel	5	609	March 6, 2023
Trying to extract columns from unaligned PDF data Help datatable , pdf , studio , string , question	11	1760	January 22, 2021
I have issues creating an excel table from a pdf table Help studio	6	1415	April 12, 2019
Pdf text to excel Help	4	920	June 13, 2019
How to extract un-structured data from PDF Studio studio , question , project_panel	6	797	March 12, 2023

Most Active Users - Yesterday
ashokkarale
Anil_G
Ruban_shanmugam
Lalit_Chaudhari
eyashb
sonaliaggarwal47
PWilliams
AzeemK
Juan_Hkahfi
More details...

Create a datatable from unstructured PDF file

Related topics