Looking for keywords in PDFs, renaming the file based on keywords found

Hi guys

I’m currently building an automation to loop through a bunch of PDFs in a folder, and to rename the file based on keywords it is looking for.

I have created a CSV with the keywords and the corresponding name to rename the file to, if it finds the keywords.

So far I have created a loop to go through and read each PDF (using the snippet For each File in Folder).

Can anyone tell me what the next step would be, in comparing the PDF contents to my CSV to see which (if any) document name would be applicable?

Would appreciate any assistance, thanks!

2 Likes

Inside for loop read pdf text into a string and add one more for loop for looping thru each keyword from a csv file…
Search for a keyword in a string read from pdf. If found, exit from the loop and rename the file.

For each file in folder
    boolFound = false
    Stroutput = Read pdf text activity to read the pdf

    For each value in csv
       If stroutput.Contains(value)
          boolFound = true
          Exit from for loop
       End if
    End for loop 
    if boolFound = true
      Rename the file
    End if
End for loop
1 Like

Thanks for this. I’m a little lost when you move into the For Each for the CSV.

I have 3 columns in the CSV - DocNo, DocType, and Keyboards

How do I get it to look at the third column for the keywords, but then rename the doc to the corresponding DocNo?

This is my first real venture into For Loops and CSV reading.

Please use Read CSV activity, so the output will be saved to Datatable.

For Each row in Datatable

   DocNo = row(0).ToString
   DocType = row(1).ToString
   Keyword = row(2).ToString

Next

row(ColumnIndex) - 0 is for first column, similarly others.

Regards,
Karthik Byggari

4 Likes

Thank you!

I’ve almost got this but I am running into an issue, which I think would be solved with a counter but I am not quite getting my head around it.

I currently have the For each file in folder - > Check the CSV -> For Each Row in the CSB -> Get the keywords its into an array -> For each Item in the Array -> Check it against the document.

My problem is, if I have 3 words in my array, it’s searching for the first word, 3 times, instead of looking to the second word, then the third word.

image

Any thoughts on how to solve this?

^ figured this out, all good! :slight_smile:

2 Likes

@Sheri

I have similar problem. I need to search specific keywords in PDF document. basis the search, i want to insert found words in one column of excel and not found words in another column of same excel sheet.

please let me know in case your workflow would help