Need to automate keyword search from multiple PDF docs and write into excel

Hi All, I am looking for best solution for the below requirement.

  1. Read multiple PDF files (Tagged PDF - NO, Data is not structured)
  2. Keywords (around 6 fields) based search and retrieve data from a paragraph, e.g.: Annual Fee Increase(keyword): The Fees for the Services under this contract shall increase rounded to the nearest RS. 2000 (data to be captured), on the first day of each contract…
  3. Write the data into excel: 1. Columnn 1 : Each file name, Column 2: Keyword 1 Column 2: Keyword 3, so an…
    Please suggest, Thanks in advance.

Hey @cschevuri is there a way you can share maybe a dummy pdf text and keywords to better understand what are you trying to do?

Hello @RedMoon , please find the dummy data doc as attached, thank you.
SOW for ABC INC.pdf (98.5 KB)

  1. Keyword 1 - SOW date , data to be retrieved: October 1, 2022
  2. Keyword 2 - Annual Fee, data to be retrieved: $10.00
  3. Keyword 3 - Minimum Fee, data to be retrieved: $ 6,000
  4. Keyword 4 - Termination, data to be retrieved: 90 working days
  5. Keyword 5 - Address, Data to be retrieved: ABC INC,
    1215 BIG street, LU ITALIA.

Hi @cschevuri ,

Is it also possible to provide another variation of the PDF document, as you have mentioned it is not structured. We would also want to understand how much difference would there be in the document structure.

We might be able to use Regex for the 4 Keywords, but the Address is a bit tricky and we would want to identify what could be the different variations that you would get.

I am able handle with Regex and read PDF with OCR… This is working fine. Thank you all.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.