I need help with a solution to read and understand PDF content and give them a proper name

Hi,

Problem and requirements description
I have thousands of different PDFs that are not named properly. They are store in an efile applicaiton. I want to rename them based on the PDFs content so it is easy for the user to understand what this PDF is about.

First thoughts on solution
I need a solution that can actually read and understand what the PDF is about and summarize this in a proper new title of the document. Or at least identify where the header of the document is and use this to rename the document.

Thanks for your help.

Cheers
SL

Hi @SL26 ,

Are the different PDFs that you are referring to Digital always or scanned images or can it be a mixture of both ?

If there are different PDF formats, then we cannot use a common pattern on extracting the headers, and it would then revolve around summarising the content into few words to use it as the file name. In this case, the only easier option available is to use the Gen AI Content Generation activity to summarise the read data from PDF into 2 or 3 words.

But you should also consider the situation that the content or information could also be interpreted as the same content and handle also duplicate file naming in these cases.

Let us know what are your thoughts on this suggestion.

Hi @supermanPunch, they are scanned paper forms.