I have to extract a list of pdf files and check if there is more than one invoice inside, if so, I must extract by page rank the pages that correspond to each invoice and save them in independent pdf files, now, I thought I would do it by indetifying a keyword, which is unique for each payment, and check the number of the page where it is to extract only the necessary gaps as the case may be, my solution depends on finding an activity that tells me the total number of pages that each PDF has and then search the key word and print the ranges of necessary pages, the problem is that not if there is an activity that gives me the total number of pages. Any recommendation?
Hi @Renenobal
Try to use strvar=directory.getfiles (“”)
For each item in strvar
Use read pdf text and mention the range and use out put variable based on that set a condition as stroutput.ToString.contains(“keyword”)
Thanks
Ashwin.S
Create instance of stream reader
Sr= new StreamReader(path.trim)
MatchesVar= regex.Matches(sr.ReadtoEnd(), “/Type\s*/pages[^s]”)
Matchesvar.count.tostring
You can try something like this. It will give count of pages
Thank you very much, I will try with this solution, I think it could work perfectly, then I’ll tell you how it was
I do not think I have the skills to do it this way, if you have an example I could try it, but I honestly do not understand the solution
I was trying but I still do not identify the page number where the keyword is located. Is there any activity that gives me the total number of pages in the document?
Sorry but i am not aware of it or how to deal with it … Wf is running fine for me.
Thank´s, it work perfectly, now a just need indentify the page number that contains my key word
Unfortunately the pdf activity package is now out of date.
If you believe that this should be an activity for finding the total page count, please vote here: PDF Page Count Activity
Hello Tushar
below solution helps to search one keyword at time through start to end of PDF
Sr= new StreamReader(path.trim)
MatchesVar= regex.Matches(sr.ReadtoEnd(), “/Type\s*/pages[^s]”)
could you please help me to code dynamic solution,
where search criteria is array of multiple elements and need to search at one time(in on GO) when document is being read from start to end in on GO.
My problem is details explained in below post …
Will really thanks full if we come up with solution