How to extract table from pdf structure format text

Hi Team,

I need to extract table from pdf structure format text , I couldn’t able to pick uppercase name, so please anyone help me on this.

Abbbcc Yhhhh RAM GANESH N 12345678901 021
ARUN KUMAR R Y 44455566677 980
Rkkll Jkkkiii KAMAL KUMAR S N 55577788814 888

Number are static digit only
Before 11 digit number mentioned “N” and “Y” not a initial

Name XX Number code
RAM GANESH 12345678901 021
ARUN KUMAR R 44455566677 980
KAMAL KUMAR S 55577788814 888


Raja G

Hi @Raja.G ,

Could you provide us with a Sample PDF document we can work on or Test our approach with it ?

Also, The Input from Text file has N and Y mentioned but it is not seen in the PDF Table as provided in the image. Are these characters appearing only after Reading the PDF as text ? or is it an Entirely different PDF ?

We would need to get a confirmation on the Pattern of the data in the PDF, so that we can suggest/provide you with a concrete solution.

Hey @Raja.G please provide the sample input file


Sreejith S S

Hi / @supermanPunch ,

That pdf confidential so couldn’t able to share sorry to this.

Raja G

@Raja.G , We did ask for Sample / Similar type of PDF if possible as most of the time the data is confidential.

We would also like to get some clarifications cleared considering the initial 1 letter values mentioned.

Hi @supermanPunch ,

After converting text only its come above format i want string manipulation to split the name only.

Raja G

Hi @supermanPunch

Please help me for this

Raja G

@Raja.G ,

Not completely sure if the method provided below would be appropriate as we still do not know the complete pattern of info of the PDF data. But you could try with the following :

Using Regex we could get the Data separated as required :

We could then populate these data into a Datatable.

Check the workflow below and let us know if it satisfies the requirement for other samples of your pdf documents as well.
Regex_Table_Extraction.xaml (9.1 KB)

I have used a text file to read the input data as provided above in your post.

1 Like

Hi @supermanPunch ,

Yes correct that is long space so please give solution below link,

Raja G

@Raja.G ,

Is the solution provided by me above not satisfying your required output ?

Could you let us know if Text file / PDF is the input or Excel file is the input ? To avoid confusions, you could send the Input data and the Expected Output Data again.

1 Like

Hello @Raja.G

Does that mean while reading the table some unnecessary values are appending and you need to split the text to get the required data? Is this correct or your requirement is something different?


This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.