Extracting words on presentation PDF and store in Excel

Hi,

I have a folder with presentations in the form of PDF, and I want to extract certain parts of it (ex. Title, Presenter name, date, etc) in each PDF to store in Excel. I’m confused how I can achieve this since the anchor base doesn’t apply to PDFs and I can’t do keyword matches either (e.g. the word ‘Title’ wouldn’t be included in the slide). Thanks

Hi @Ii_Mariko

You can consider below approach.

  1. read pdf with ocr activity. this would save the contents of pdf in a string variable
  2. then use regex on that string variable to extract the details you want provided those details always appear in same place/pattern

Hope this helps.

Regards
Sonali

Hi Sonali, thanks for the suggestion. Is there a way to do this without the read PDF with OCR activity, since I don’t have the credentials to connect to adobe PDF services?

Hi @Ii_Mariko

I don’t think you require to have any adobe licenses to be able to use this activity.

Please refer below doc on this activity

@Ii_Mariko

Welcome to the community

try to use genrative extractor and pass the pdf to it and in it basically you need to ask a question or prompt to get the relevant data

cheers