Merging Data from Multiple PDFs into a Single PDF per Account Number in UiPath

samantha_shah · May 3, 2024, 1:48am

Question:

Hello everyone,

I’m working on a UiPath automation where I need to extract and merge data from two PDF files, PDF A and PDF B, into a single PDF file for each account number. The files are input in a non-fixed order and contain scattered data across various pages.

Details:

PDF A: Contains account numbers and related data.
PDF B: Contains account numbers, related data, and a program date.
The data related to account numbers are not consistently located on the same pages in either PDF.

the challenge is how to effectively consolidate this scattered data from both PDFs into one coherent PDF file per account number.

my logic: * I use a DataTable to capture the program date from PDF B, intending to merge this with the data from PDF A for the corresponding account number.

However, I am encountering a significant challenge: the DataTable is nullified when the workflow processes the second PDF, which leads to data loss from the first processed PDF.

Is there an alternative approach or strategy that could better handle the data consolidation from these randomly paginated PDFs?

Keegan_Kosasih · May 3, 2024, 4:39am

hi @samantha_shah,

Your general approach of using datatable to capture information from PDF B and then PDF A is not wrong.

Can you elaborate more on your step by step approach?

If your datatable is a variable, check the scope.
If your datatable is an argument, make sure it’s set to in/out.

samantha_shah · May 3, 2024, 4:59am

Hi @Keegan_Kosasih thanks for the reply

please see my workflow steps:

data on both the pdfs are different and we want all of that data, only common thing is account number.
**naming convention : sample: **
FD_044554_Customerlnvoice_________Final Pricing May 2024_________x_

where 0445544 is the account number and May 2024 is the program date.

1.) both my pdfs are run one by one, (order is not fixed)in the same sequence((using regex)).
2.)for example: once the pdf(B) is there in the sequence → AccountNumber is extracted, Program date is extracted(using regex)and page numbers are counted(using counter).
3.)all this data is added to a datatable DT. (DT has multiple entries for same account number since the data in the pdf is scattered and not present in single page)
if you can suggest a better way here.for step 3.
4.) once i have my DT ready, i am running a for each row in DT loop.
for every row in DT where account number is same , I extract the Programdate, PageNumber and that common Accountnumber and try to create a pdf (using Extract pdf).
the range of the Extract pdf is given as for example(currentrow(pagenumber))
5.) once the complete DT is iterated a single pdf file is created per account number as above naming convention. in sample
6.) the pdf is moved to completed number
→
now when the second pdf starts running, the datatable is reintialized at the start of sequence because this workflow is invoked again, from the main file
and
all the above steps are repeated
but i donot have the program date for the account number now, because only pdf B has that data.

My doubts:
1.)so how can i utilise the previosuly creatted pdf files naming convention where i can get the program date ??
2.)or else how can i use the previosuly created datatable to extract program date for that account number?
but program date is only available in one of the pdf-(PDF B).

3.) or how can i use your idea or expertise ??

Parvathy · May 3, 2024, 5:04am

Hi @samantha_shah

BlankProcess21.zip (1.1 MB)

Please replace the invoice with your invoice.

Hope it helps!!

samantha_shah · May 3, 2024, 5:38am

Hi @Parvathy
thanks for the reply , I checked the workflow, However the only consents i have is

how can i maintain data on different pages, we can merge them i saw that

but the question arises is the Program date found on pdf of my workflow

how can i keep that constant throughout my complete run of two pdfs. or how can i utilize UiPath in such a way that program date is not a problem.

samantha_shah · May 3, 2024, 5:39am

Hi @Keegan_Kosasih

thanks for the reply

please see my workflow steps:

data on both the pdfs are different and we want all of that data, only common thing is account number.
**naming convention : sample: **
FD_044554_Customerlnvoice_________Final Pricing May 2024_________x_

where 0445544 is the account number and May 2024 is the program date.

1.) both my pdfs are run one by one, (order is not fixed)in the same sequence((using regex)).
2.)for example: once the pdf(B) is there in the sequence → AccountNumber is extracted, Program date is extracted(using regex)and page numbers are counted(using counter).
3.)all this data is added to a datatable DT. (DT has multiple entries for same account number since the data in the pdf is scattered and not present in single page)
if you can suggest a better way here.for step 3.
4.) once i have my DT ready, i am running a for each row in DT loop.
for every row in DT where account number is same , I extract the Programdate, PageNumber and that common Accountnumber and try to create a pdf (using Extract pdf).
the range of the Extract pdf is given as for example(currentrow(pagenumber))
5.) once the complete DT is iterated a single pdf file is created per account number as above naming convention. in sample
6.) the pdf is moved to completed number
→
now when the second pdf starts running, the datatable is reintialized at the start of sequence because this workflow is invoked again, from the main file
and
all the above steps are repeated
but i donot have the program date for the account number now, because only pdf B has that data.

My doubts:
1.)so how can i utilise the previosuly creatted pdf files naming convention where i can get the program date ??
2.)or else how can i use the previosuly created datatable to extract program date for that account number?
but program date is only available in one of the pdf-(PDF B).

3.) or how can i use your idea or expertise ??

kindly help me to think better

Anil_G · May 3, 2024, 5:41am

Duplicate

Cheers

samantha_shah · May 3, 2024, 5:44am

Hi @Anil_G
I am trying to seek expertise from someone like you, I was going to delete one of the post.

as soon as I get a little help

thanks.

Anil_G · May 3, 2024, 5:45am

@samantha_shah

Have replied to the same on other question…

If there is only on eplace it would help you as multiple suggestions can come and multiple people can reference what is already given by other members

Cheers

Keegan_Kosasih · May 3, 2024, 6:13am

Hi @samantha_shah,

There are a few things you can do to improve your workflow:

Write the datatable into a CSV/Excel file to save it. Process goes like this:
1.1 At the end of a workflow, use Write CSV or Write Range to save datatable to a file
1.2 At the beginning of the next workflow, read the CSV/Excel to retrieve the datatable with saved data
1.3 Proceed with filling out the datatable with more data
1.4 Back to step 1 - write datatable to CSV/Excel to save it
1.5 I would write these to a new file each time in a temp folder to use for debugging
1.6 Add a short workflow to cleanup temp folder after each run so it doesn’t take up excessive storage space
Multiple entries for same account number is potentially going to add confusion. To avoid this issue, each time you’re about to add data to datatable:
2.1 Use lookup datatable first, find RowIndex of existing account number
2.2 IF RowIndex is -1:
2.2.1 it means there is no existing account number
2.2.2 add data to a new row
2.3 IF RowIndex is any other number (eg. 2)
2.3.1 it means there is existing data
2.3.2 update corresponding row with new data
2.3.3 consider if you want to overwrite data in a cell or append with new data

system · May 6, 2024, 6:13am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Want to Merge Data from Multiple PDFs into a Single PDF per Account Number Studio pdf , studio , question , help	4	168	May 3, 2024
Combining Data from Multiple Pages into One PDF per Account Number Studio pdf , studio , question , help	2	142	April 25, 2024
How to merge bulk multiple pdf into single file Activities datatable , excel , uiautomation , pdf , activities , question	4	467	June 21, 2023
Need Help Reducing Processing Time for Splitting and Merging PDFs by Account Number in UiPath Studio pdf , studio , question	1	149	May 15, 2024
Extracting Tables from Multiple PDFs Studio datatable , excel , pdf , studio	1	960	April 1, 2020

Most Active Users - Yesterday
pikorpa
prashant1603765
Anil_G
ben.smith
jrdev2
More details...

Merging Data from Multiple PDFs into a Single PDF per Account Number in UiPath

Question:

Related topics