How to split and extract from huge PDF file

JPOkawa · March 22, 2018, 3:50am

Hi Experts,

Would like to ask for help on how to automate in UiPath the split PDF function of Acrobat Pro Xi as I need to split a huge PDF file (around 7000 pages) to individual PDF files based on tracking number. Am thinking to split first the huge PDF with range of 100 maybe then loop thru the pages of splitted PDF to extract the pages for each tracking number which may have 2 or more pages then save extracted pages to new folder.

I have already tried using send hotkeys ALT+VTP and arrow down keys to access Split function but seems not working. Am still new to UiPath and not sure how to execute this. Looking forward to hearing from you. Thanks.

Cheers,
JPOkawa

ClaytonM · March 22, 2018, 4:24pm

Hi,

This is just my quick thinking.
If Read PDF to text activity takes too long or crashes (like I think it will), you can open the file and perform a Select All, which is Ctrl+A followed by Ctrl+Shift+End. Then, do a Ctrl+C to copy all text to the clipboard. This can be tricky, however, cause you need to verify that all text has been placed in the Clipboard or you will miss some information.

Once you have the text stored in a string variable, then you can use .Split() to create an array by your tracking number or a key word that identifies the split. It might also require some additional massaging of the text, which in that case, I would recommend LINQ. Like for example if you wanted to format it into a CSV-comma delimitted file, you could do that.

With your array, you should be able to run that through a For each and write each part to a separate file.

I hope this helps spark some ideas.

Regards.

JPOkawa · March 23, 2018, 11:20am

Hi @ClaytonM,

Thanks for sharing your thoughts. Actually, we just wanted to split the huge PDF into individual PDF files based on tracking number thus we are thinking to automate using Acrobat Pro Xi. Each page has tracking number and information in tabular format which we also want to keep as is.

Regarding LINQ, I don’t have any experience on that. Would you please provide a sample workflow making use of LINQ that I can use for this purpose?

Would you also please kindly share sample xaml using the approach you mentioned? Thanks in advance!

Regards,
JPOkawa

Girish · April 12, 2018, 6:47am

Hi @JPOkawa,

As mentioned splitting the pdf based on tracking ID and each page in tabular format.

You can make use of data scraping if you want to extract the data follow below link:
Extract table data from multiple pages of pdf - #2 by Lavinia
If it is in proper tabular format and have same delimeter you can try the screen scraping and generate the data table.
Read PDF—>Store it in Output variable–>Split using string as delimeter as you mentioned in your case it is tracking ID->Analyze pattern to further split–>Process each item page by page

Thanks
Girish

Topic		Replies	Views
How to split PDF files Help pdf	6	7784	October 17, 2018
Uipath split pdfs Activities pdf	2	414	May 15, 2023
Split a PDF in multiple files, based on a given number of pages Activities pdf , activities , question	6	1967	February 5, 2022
How to split pdf pages and extract? Help pdf , activities , question	4	15091	September 25, 2020
Merge pdf page after extraction of data from a large file in Uipath Studio studio , question , activities_panel	5	1212	October 27, 2021

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

How to split and extract from huge PDF file

Related Topics