Extract Pdf pages based on key words

Navya_Nadakuduti · November 20, 2022, 6:18am

Hi All,

I have a pdf of pages of 160 (Invoices Merged and page count May vary). I want to split pdf into multiple pages (single invoices) based on key words present in pdf.
Here I’m using for each loops taking so much time, looping each pdf page and searching that keyword taking around 25 minutes.

Is there any possibility to reduce the time by using any regex instead of loops.

Please help me.

Thanks & Regards,
Navya.

Anil_G · November 20, 2022, 9:50am

One thing you can try is…you can read all pdf’s at one… And if there is any reliable value like Invoice header or footer or page number field which would be there on all pages then split the string on that and then use System.Text.RegularExpressions.Regex.Match(EachpageString,“Regex for the string youa re searching)”).Tostring and delete which are not needed. This wat you wont interact with pdf always but only once to read and everything else is done with string that you already have read

cheers

Navya_Nadakuduti · November 22, 2022, 11:36am

Hi Anil,

Below is the workflow which i have created taking so much of time. In some cases invoice may be 2 pages.

Main.xaml (63.4 KB)
flipkart_invoices.pdf (465.9 KB)
project.json (1.5 KB)

Please help me.

Thanks & Regards,
Navya

william_joe · November 24, 2022, 7:02am

Step 1: Import all libraries. Step 2: Convert PDF file to txt format and read data. Step 3: Use “. findall()” function of regular expressions to extract keywords.

Regards,
Will

Navya_Nadakuduti · November 26, 2022, 6:28pm

Hi William,

Thanks for solution.

Can you please elaborate the above solution.

Best Regards,
Navya.

Anil_G · November 27, 2022, 6:04am

Hi @Navya_Nadakuduti

Can you try this

trypdf.xaml (10.1 KB)

cheers

arjun.vijayakumar · June 6, 2024, 4:33am

HI Anil,

i have a pdf with around 80 pages which contain description about different topics. Could you provide a new workflow with the latest UiPath version to extract pages with particular key words

Topic		Replies	Views
Identify Pages in PDF file contain keyword string Studio studio , question , activities_panel	13	1256	June 22, 2023
An automation logic building Studio studio , question , activities_panel	1	160	November 24, 2023
How Extract Particulart data from multiple pdf which have same format Automation Starter uiautomation , pdf , activities , studio	10	1392	September 18, 2022
PDF Extraction -Multiple Invoices in single file Studio studio , question	1	423	November 7, 2023
Project Approach and Process Help - bulk pdf files Studio studio , question , activities_panel	0	556	June 19, 2023

Extract Pdf pages based on key words

Related topics