How to extract different table data from scanned pdf

Table Example - Fixed.pdf (152.7 KB)
Hi All,

This is scanned pdf which has table values with different columns, but i want to extract first three column values from each tables. how to achieve it?

@RAKESH_KUMAR_Tiwari

Reqd the data using read pdf activity

Then use str.Split({“sales years in thousands”},2,StringSplitOptions.None())(1). Split({“complex table”},2,StringSplitOptions.None())(0)

This is for one table then use generat datatable with column separator as space and row separator as enter

Repeat same for all.

Note: the strings provided are for illustration use the xact strings

Cheers

Hi @Anil_G ,

since it is a scanned pdf so i am using read pdf text ocr, but this activity gives error saying this package of pdf activity supports v6.5 or above and when i try to update the package version i can see only this version.
image

@RAKESH_KUMAR_Tiwari

Can you check on the ocr version or change the pdf version as well and try or with a different ocr

Cheers

getting below error

@RAKESH_KUMAR_Tiwari

Please replace the inverted comma after copying…

And also i see the inverted comma in the second aplit is in wront place it ahpuld be after curely brace

Cheers

you want me to use double quotes(“”) in place of inverted comma?

@RAKESH_KUMAR_Tiwari

There are two types…here inverted comma will e like this " in code it lookes like this "

To be precise

str.Split({"sales years in thousands"},2,StringSplitOptions.None)(1). Split({"complex table"},2,StringSplitOptions.None)(0)

cheers

pls check this

firstTable.Split({‘‘Simple Table’’},2,StringSplitOptions.None())(1).Split({‘‘Complex Tables’’},2,StringSplitOptions.None())(0)

@RAKESH_KUMAR_Tiwari

Remove the brackets afte none

Check the above modified comment

Cheers

firstTable.Split({‘‘Simple Table’’},2,StringSplitOptions.None)(1).Split({‘‘Complex Tables’’},2,StringSplitOptions.None)(0)

see this still error

@RAKESH_KUMAR_Tiwari

I see you have given single quotes…you have to give double quotes itself…but when you copy paste the quotes needs to be deleted and retyped in the code…as double quotes here and in the code are different

I tried the same you gave only replaced singles with double in the code

cheers

yes, error is gone,

but gave below error.

declared this variable as string and used message box to print the value.

let’s say this is first table, so what output the above expression would give?

image

@RAKESH_KUMAR_Tiwari

You have to read the data and assign the string data to the firsttable variables

Then when you do this expression it extracts the string between simple table and complex table…which you will pass to build datatable to get the datatable out

Instead of simple datatable its better to use the next line as we do not need that line as well

Cheers

ok, so i am trying to understand that, now we have firstTable variable which holds the value of 1st table and 2nd table right?
And now you are telling that “You have to read the data and assign the string data to the firsttable variables” and then “Then when you do this expression it extracts the string between simple table and complex table…which you will pass to build datatable to get the datatable out”.
so my question is i need to use build datatable? and pass the column names of first and second table?

@RAKESH_KUMAR_Tiwari

Sorry to confuse…You have to use Generate datatable…

What the expression does is from string it will extract the string between given strings…now as now columns are separated by space and rows by new line the table will be generated from string extracted

cheers

As said, i used generate dt activity and passed firstTable variable in input and created a dt to store it and then used output dt activity to see as output and used msg box to print.

but got error as object reference not set to an instance.

@RAKESH_KUMAR_Tiwari

May I know where you are getting this error

Your flow should look like this

  1. Read PDF text (Filepath aand read data to FirstTable)
  2. Assign Activity which you are giving
  3. Generate datatable(FirstTable as Input and Create a datatble for output)

cheers

in Assign activity, getting error

image

@RAKESH_KUMAR_Tiwari

Can you check in log message how you are getting the strings simple table and second table…they should be exactly same as how they look…you can either pause and check in locals panel or use log mesaage

Cheers