Problem Extracting PDF Text and Summarizing

I’m trying to extract text from multiple pdf files of police reports in individual directories and create a folder with a summary of each police report inside each directory for each arrestee.

When I test-run the first iteration, it successfully creates the file, but it outputs this text: I’m sorry, but there is no text provided for me to summarize. Please provide the text you would like me to summarize.

So i gather it must be an issue in the Extract PDF Text activity or maybe further up the line with my variables: Here’s my sequence for reference:

Hi @justindward you can try with document understanding if you have multiple pdfs, it’s better to use different extractor based on your requirement.

Thanks

@justindward

  1. Run in debug mode and check if any data is extracted from pdf…you can do that by using breakpoint and then check locals panel
  2. Also you used many loops and in every loop the comon variable is current item try to change it as there are multiple for loops nested it might wrong intepret as well

Cheers

Would you like to add some logs to show the data you are going to assign?

I will try that, but I need to learn to use document understanding first and also it is limited to two pages in the community edition. Everything I need is generally on pages 2-3. Is there some way around that? I know there’s an activity to extract specific pages from a PDF. Do you have any videos you can recommend on learning DU?

Sorry, I’m about one month into this. How do I run in debug mode? I know how to “Run to this Activity” and I tried that with a “Write Line” after the extract PDF text function. It wasn’t producing any output as far as I could tell. Should I try to do a different OCR engine or another OCR activity to get the text?

Also to your second point, I specified using CurrentItem from the For Each File function in that specific folder, so that should get it to the correct file, right? Is there some custom expression I should use instead?

Hi @justindward for DU you can watch below video you will get an idea about DU

Thanks

@justindward

If the value pronted is blank then nothing is extracted

Yes you might need to try with a different ocr

Cheers