Convert pdf file into an Excel File

Hello everyone,

I’m currently facing an issue with converting my PDF file to an Excel sheet for some data manipulations. After searching, I resorted to converting it to a text file, which I now aim to transform into an Excel sheet. Despite trying various methods, I haven’t achieved the desired results. In the attached images, you can see the original PDF file converted to a text file and the intended format for the Excel file.

                                  **The Desired Tabular form in Excel**

I would greatly appreciate it if someone could provide a detailed solution for this conversion. Alternatively, if there is an easier approach to convert PDF to Excel, please share your insights.

Thank you all for your cooperation and assistance.

Hello @sasadrinksprite Welcome to the UiPath Community Forum

Mostly to deal with PDF files(structured/unstructured), UiPath has Document Understanding package with which you can easily extract data after labelling and training the pr-built model.
Document Understanding - DU

Can you please attach the PDF if possible here to check for the alternative approaches because there is no straight forward method.

Also, you can have a look at this post from Forum - PDF to Excel

Thanks

Hi,

Can you try the following sample?

mc = System.Text.RegularExpressions.Regex.Matches(strData,"(?m)^(?<Order_ID>\S+)\s+(?<Order_Date>\S+)\s+(?<Customer_Name>.*?)\s+(?<Country>\S+)\s+(?<Item>\S+)\s+(?<Price>\$\s+[.\d]+)\s+(?<Quantity>\d+)\s*$")

arrCol = dt.Columns.Cast(Of DataColumn).Select(Function(dc) dc.ColumnName).ToArray()

In AddDataRow

arrCol.Select(Function(c) CurrentItem.Groups(c.Replace(" ","_")).Value).ToArray()

Sample
Sample20240126-1.zip (4.0 KB)

Regards,

4 Likes

Thank you for you response, the output is running but there are two issues ,I hope you can help me more.

in the below screenshot

  • There is a missing column which is [Item Price] (Red line)
  • There is a unwanted text in a few cells of [Customer Name] column (Yellow line)

please help me out to solve this and thank you in advance.

Note that I’m using the full txt file not the sample that I shared with you!

Hi,

Can you share the input text as file?
It’s no problem a a part of lines if the problem is reproduced.

Regards,

1 Like

Hi,

Can you try the following expression? (replace the previous with the following)

mc = System.Text.RegularExpressions.Regex.Matches(strData,"(?<Order_ID>[A-Za-z]+-\d+-\w+)\s+(?<Order_Date>\S+)\s+(?<Customer_Name>.*?)\s+(?<Country>\S+)\s+(?<Item>\S+)\s+(?<Price>\$\s+[.\d]+)\s+(?<Quantity>\d+)\s*")

Regards,

Oh, it’s my bad. How about the following?

Main.zip (2.4 KB)

Regards,

It works, thank you for your time!

1 Like

Dear Yoichi,
Sorry for asking too many questions.
could you help if I want to create Column Name “Total Price” and Calculate the total price for each order (Total price = Item Price x Quantity), how to achieve this!

Hi,

There are some approaches to achieve it. One of them, we can use AddDataColumn activity and set expression for it. Can you try the following?

MainV2.zip (2.8 KB)

Please note that to calculate Item Price x Quantity, dollar sign is removed in the above.

Regards,

1 Like

May I know what is causing this Error Please, I searched for it but I couldn’t reach for a solution!
image

image

Hi,

Probably, there is no data in ItemPrice or Quantity. Can you check it removing this assign temporarily?

It may be necessary to check each value then calculate total price row by row.

Regards,

1 Like

Thanks!, It works now :smiley:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.