Merging Data from Multiple PDFs into a Single PDF per Account Number in UiPath

Question:

Hello everyone,

I’m working on a UiPath automation where I need to extract and merge data from two PDF files, PDF A and PDF B, into a single PDF file for each account number. The files are input in a non-fixed order and contain scattered data across various pages.

Details:

  • PDF A: Contains account numbers and related data.
  • PDF B: Contains account numbers, related data, and a program date.
  • The data related to account numbers are not consistently located on the same pages in either PDF.

the challenge is how to effectively consolidate this scattered data from both PDFs into one coherent PDF file per account number.

my logic: * I use a DataTable to capture the program date from PDF B, intending to merge this with the data from PDF A for the corresponding account number.

However, I am encountering a significant challenge: the DataTable is nullified when the workflow processes the second PDF, which leads to data loss from the first processed PDF.

Is there an alternative approach or strategy that could better handle the data consolidation from these randomly paginated PDFs?

hi @samantha_shah,

Your general approach of using datatable to capture information from PDF B and then PDF A is not wrong.

Can you elaborate more on your step by step approach?

If your datatable is a variable, check the scope.
If your datatable is an argument, make sure it’s set to in/out.

Hi @Keegan_Kosasih thanks for the reply

please see my workflow steps:

data on both the pdfs are different and we want all of that data, only common thing is account number.
**naming convention : sample: **
FD_044554_Customerlnvoice_________Final Pricing May 2024_________x_

where 0445544 is the account number and May 2024 is the program date.

1.) both my pdfs are run one by one, (order is not fixed)in the same sequence((using regex)).
2.)for example: once the pdf(B) is there in the sequence → AccountNumber is extracted, Program date is extracted(using regex)and page numbers are counted(using counter).
3.)all this data is added to a datatable DT. (DT has multiple entries for same account number since the data in the pdf is scattered and not present in single page)
if you can suggest a better way here.for step 3.
4.) once i have my DT ready, i am running a for each row in DT loop.
for every row in DT where account number is same , I extract the Programdate, PageNumber and that common Accountnumber and try to create a pdf (using Extract pdf).
the range of the Extract pdf is given as for example(currentrow(pagenumber))
5.) once the complete DT is iterated a single pdf file is created per account number as above naming convention. in sample
6.) the pdf is moved to completed number

now when the second pdf starts running, the datatable is reintialized at the start of sequence because this workflow is invoked again, from the main file
and
all the above steps are repeated
but i donot have the program date for the account number now, because only pdf B has that data.

My doubts:
1.)so how can i utilise the previosuly creatted pdf files naming convention where i can get the program date ??
2.)or else how can i use the previosuly created datatable to extract program date for that account number?
but program date is only available in one of the pdf-(PDF B).

3.) or how can i use your idea or expertise ??

Hi @samantha_shah

BlankProcess21.zip (1.1 MB)

Please replace the invoice with your invoice.

Hope it helps!!

Hi @Parvathy
thanks for the reply , I checked the workflow, However the only consents i have is

  1. how can i maintain data on different pages, we can merge them i saw that

but the question arises is the Program date found on pdf of my workflow

how can i keep that constant throughout my complete run of two pdfs. or how can i utilize UiPath in such a way that program date is not a problem.

Hi @Keegan_Kosasih

thanks for the reply

please see my workflow steps:

data on both the pdfs are different and we want all of that data, only common thing is account number.
**naming convention : sample: **
FD_044554_Customerlnvoice_________Final Pricing May 2024_________x_

where 0445544 is the account number and May 2024 is the program date.

1.) both my pdfs are run one by one, (order is not fixed)in the same sequence((using regex)).
2.)for example: once the pdf(B) is there in the sequence → AccountNumber is extracted, Program date is extracted(using regex)and page numbers are counted(using counter).
3.)all this data is added to a datatable DT. (DT has multiple entries for same account number since the data in the pdf is scattered and not present in single page)
if you can suggest a better way here.for step 3.
4.) once i have my DT ready, i am running a for each row in DT loop.
for every row in DT where account number is same , I extract the Programdate, PageNumber and that common Accountnumber and try to create a pdf (using Extract pdf).
the range of the Extract pdf is given as for example(currentrow(pagenumber))
5.) once the complete DT is iterated a single pdf file is created per account number as above naming convention. in sample
6.) the pdf is moved to completed number

now when the second pdf starts running, the datatable is reintialized at the start of sequence because this workflow is invoked again, from the main file
and
all the above steps are repeated
but i donot have the program date for the account number now, because only pdf B has that data.

My doubts:
1.)so how can i utilise the previosuly creatted pdf files naming convention where i can get the program date ??
2.)or else how can i use the previosuly created datatable to extract program date for that account number?
but program date is only available in one of the pdf-(PDF B).

3.) or how can i use your idea or expertise ??

kindly help me to think better

Duplicate

Cheers

Hi @Anil_G
I am trying to seek expertise from someone like you, I was going to delete one of the post.

as soon as I get a little help

thanks.

@samantha_shah

Have replied to the same on other question…

If there is only on eplace it would help you as multiple suggestions can come and multiple people can reference what is already given by other members

Cheers

1 Like

Hi @samantha_shah,

There are a few things you can do to improve your workflow:

  1. Write the datatable into a CSV/Excel file to save it. Process goes like this:
    1.1 At the end of a workflow, use Write CSV or Write Range to save datatable to a file
    1.2 At the beginning of the next workflow, read the CSV/Excel to retrieve the datatable with saved data
    1.3 Proceed with filling out the datatable with more data
    1.4 Back to step 1 - write datatable to CSV/Excel to save it
    1.5 I would write these to a new file each time in a temp folder to use for debugging
    1.6 Add a short workflow to cleanup temp folder after each run so it doesn’t take up excessive storage space
  2. Multiple entries for same account number is potentially going to add confusion. To avoid this issue, each time you’re about to add data to datatable:
    2.1 Use lookup datatable first, find RowIndex of existing account number
    2.2 IF RowIndex is -1:
    2.2.1 it means there is no existing account number
    2.2.2 add data to a new row
    2.3 IF RowIndex is any other number (eg. 2)
    2.3.1 it means there is existing data
    2.3.2 update corresponding row with new data
    2.3.3 consider if you want to overwrite data in a cell or append with new data
1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.