How to extract table from pdf structure format text

Raja.G · October 14, 2022, 10:52pm

Hi Team,

I need to extract table from pdf structure format text , I couldn’t able to pick uppercase name, so please anyone help me on this.

Input:
Abbbcc Yhhhh RAM GANESH N 12345678901 021
ARUN KUMAR R Y 44455566677 980
Rkkll Jkkkiii KAMAL KUMAR S N 55577788814 888

Note:
Number are static digit only
Before 11 digit number mentioned “N” and “Y” not a initial

Output:
Name XX Number code
RAM GANESH 12345678901 021
ARUN KUMAR R 44455566677 980
KAMAL KUMAR S 55577788814 888

Regards,
Raja G

supermanPunch · October 15, 2022, 11:03am

Hi @Raja.G ,

Could you provide us with a Sample PDF document we can work on or Test our approach with it ?

Also, The Input from Text file has N and Y mentioned but it is not seen in the PDF Table as provided in the image. Are these characters appearing only after Reading the PDF as text ? or is it an Entirely different PDF ?

We would need to get a confirmation on the Pattern of the data in the PDF, so that we can suggest/provide you with a concrete solution.

sreejith.ss · October 15, 2022, 11:12am

Hey @Raja.G please provide the sample input file

Regards

Sreejith S S

Raja.G · October 15, 2022, 11:37am

Hi @sreejith.ss / @supermanPunch ,

That pdf confidential so couldn’t able to share sorry to this.

Regards,
Raja G

supermanPunch · October 15, 2022, 11:42am

@Raja.G , We did ask for Sample / Similar type of PDF if possible as most of the time the data is confidential.

We would also like to get some clarifications cleared considering the initial 1 letter values mentioned.

Raja.G · October 15, 2022, 11:50am

Hi @supermanPunch ,

After converting text only its come above format i want string manipulation to split the name only.

Regards,
Raja G

Raja.G · October 15, 2022, 12:07pm

Hi @supermanPunch

Please help me for this

Regards,
Raja G

supermanPunch · October 15, 2022, 12:30pm

@Raja.G ,

Not completely sure if the method provided below would be appropriate as we still do not know the complete pattern of info of the PDF data. But you could try with the following :

Using Regex we could get the Data separated as required :

We could then populate these data into a Datatable.

Check the workflow below and let us know if it satisfies the requirement for other samples of your pdf documents as well.
Regex_Table_Extraction.xaml (9.1 KB)

I have used a text file to read the input data as provided above in your post.

Raja.G · October 15, 2022, 12:39pm

Hi @supermanPunch ,

Yes correct that is long space so please give solution below link,

Regards,
Raja G

supermanPunch · October 15, 2022, 2:17pm

@Raja.G ,

Is the solution provided by me above not satisfying your required output ?

Could you let us know if Text file / PDF is the input or Excel file is the input ? To avoid confusions, you could send the Input data and the Expected Output Data again.

Rahul_Unnikrishnan · October 15, 2022, 2:18pm

Hello @Raja.G

Does that mean while reading the table some unnecessary values are appending and you need to split the text to get the required data? Is this correct or your requirement is something different?

Thanks

system · October 18, 2022, 2:18pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hi please help with regex to extract table from pdf. I have the text Studio studio , question , activities_panel	1	526	November 11, 2022
How to get table from pdf Help studio , question	20	1819	February 28, 2021
Regex to extract character in pdf table Studio pdf , studio , question , pdf-extraction	11	759	March 17, 2023
How can I extract the table from below pdf using string manipulation? Studio studio , question , activities_panel	2	415	July 5, 2023
Table extraction pdf Studio studio , question , activities_panel	3	454	June 20, 2023

Most Active Users - Yesterday
prashant1603765
ashokkarale
mively
anjasing
Yoichi
sonaliaggarwal47
lrtetala
V_Roboto_V
pikorpa
sharazkm32
More details...

How to extract table from pdf structure format text

Related topics