Matching PDF extracted names to create count then write PDF

I have a PDF document that contain names for example Frank Smith name appears on page 1-3 but not 4. I am able to extract the name out of the PDF as a variable strCurrentName = Frank Smith So what I started to do was first get a page count of the pdf document variable inTotalPageCount


This is the workflow I have set up I extract the namestrCurrentName = Frank Smith and use an if condition and this is where I’m probably getting this process messed up. I wonder if I should use a while loop to have this process look to page 2 to see if the name matches if the name matches then look to page 3. If the name matches create the PDF. This is where I need help on the logic.

Hi @jeff.shubzda,

Can you walk us through your thinking behind the If condition there? Why use it?
Looking at your snippet, say you already have extracted the strCurrentName, why then check if strPDFExtract.Contains(strCurrentName), if the name can be extracted then you know it should have existed in strPDFExtract. If the strOut(1).ToString says there is no index 1 then you know the name does not exist.

I would also avoid using only Split method on a PDF text to extract text values, a regular expression would be a better approach to tackle this. Using regular expression will allow you to set anchors (before and after) your required pattern. This way you know you are not dependent on an array index value always being 1 (strOut(1).ToString). Here is an in-depth Regex usage tutorial from @Steven_McKeering

Nonetheless, the forum members need more information what you are trying to do here.

Thanks, I used Regex the first time I did this to get the name but tried this way for this example. So I’m looking for a process that can get the name, which I can get and then i need to figure out how to take that name and compare it to the next page in the PDF to see if they match and continue down the line until it doesn’t find a match and it creates the PDF and then moves to the next PDF page that has a different name. Then the process begins all over again. I’m just testing this process to see if it can be done.

have a look at this flow:

we check the page count and read the pdf page by page
Now you can continue and add the next parts

Create a Dictionary(of Int32, String) and add for each iteration the pgNo and the extrracted text to it.

Afterwards you can loop over the dictionary or post process with other approache e.g. LINQ

Ok, thanks I will take a shot at it later this week.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.