Data Extraction from pdfs of different formats

Hi guys,
I would like to extract name,date of completion,duration of course from different course certificates and store the details in excel,I tried to extract using Regular Expression but it didn’t extract properly.Please help me guys.
Please refer the sample pdfs.
Thanks in advance.

23_Infosys_Java.pdf (158.0 KB)
22_LinkedIn_Agile Software.pdf (216.8 KB)
certificate_ Bootstrap.pdf (91.7 KB)

Hi @Sweetlin_D

Can you share the expected output?


This is the expected output
Exceldate.xlsx (11.9 KB)

Hi @Sweetlin_D

Can you please specify what should be extracted from each PDF, so that it can be easily understood.


Name of the person,Date of completion,Duration of course(from each certificate pdf).

Hi @Sweetlin_D

Where should we extract Date of completion and duration of course from Each PDF.


Data to be extracted:

  1. Name of the person
  2. Duration of the course
  3. Date of Completion

Expected Output:

Should be stored in the excel file which i sent earlier.
