Extract Table from PDF to Excel without DU

Hello folks,

I am trying to get this table data from PDF to excel using activities and not Document understanding.

This is the sample data, similarly there are many more files.
Single FIle Sample.pdf (390.4 KB)

I am using:
1 . Read PDF Text to get the string text.
2. Using string manipulation.
3. Generate Data Table to convert that to Data Table
4. Write range to write the data to excel.

I am not able to get the values in structured format as it is in PDF, can anyone help me with that ASAP? please share the expression as that will help me, I am new to this domain.

You can try that with Regex
Have you tried ?

@Sudharsan_AIT Thanks for your response, I am not from Technical / coding background, I took a shift from management domain, that is why I am finding help here, I don’t know regex yet. at learning phase :smiley:

If you could help me with that I can try that out.

Thanks.

Hi @Rajesh_Shet

Here is the workflow

Sample.xaml (7.0 KB)
and Excel file
Sample.xlsx (9.0 KB)

The same way as you created
1 . Read PDF Text to get the string text.(Properties-> Preserve Formating =True)
3. Generate Data Table to convert that to Data Table

  1. Write range to write the data to excel.

Let me know if this works

Regards
Sudharsan

1 Like

Thanks alot @Sudharsan_AIT let me try that.

If you have more files in the folder
Steps will be like :
1 )For Each Item in Directory.GetFiles(“The Path of your folder in which the files will be there”,"*.pdf") (This will get all the excel files and store it as string of array)(Properties->Arguments=String)


Inside steps will be same as above
Sample.xaml (7.9 KB)

Hi @Sudharsan_AIT,

Please avoid attaching only xaml files as reponses to questions.

What is the approach you used? Show the person asking the question why the suggested approach is a good / reliable approach.

People who read this thread will benifit from description of your approach. Xaml files alone are not really helpful. Let the readers review different approaches.

At the minimum, try to attach a screenshot of your workflow with some annotations.

1 Like

Yes @jeevith Sir ,Sure

Hi @Rajesh_Shet,

Welcome to the forum.

As you are a just starting with Studio as a tool, I suggest you review similar topics already solved in the forum.

  1. @Cristian_Negulescu has a very good video on PDF extraction which you can refer to in this thread.
    Brainstorming Solutions for Editing Data in PDFs - Help / Activities - UiPath Community Forum

  2. You can also use the Search and Advanced Search to find similar topics and filter to solved. It is a powerful / quick way of navigating to solved queries in the forum.

image —> Options


  1. If you remember the authors user name you can also use @ForumUSERNAME

1 Like

Thank for the detailed guidance, appreciate your time and effort @jeevith .

Firstly, thank you for your help.

This worked well with Sample I provided as it was generated using excel, but the actual file I am having is an exported PDF report, generated using a software. as the details are confidential it will not be possible to share the actual PDF file with you, but all I can say is that format looks same in that PDF as well.

Seems like that PDF does not retain format, and it is merging first and second column as one and writing to coloum1. output is messy.

  1. first column contains only integer
  2. second column (“Name”) contains 17 characters with underscore included in between.
  3. similarly other column are varying in character count.

Is there a better way to do that??

In Parallel i’m looking into what was shared by @jeevith

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.