Is there a way to sort PDFs based on keywords within the PDF?

I’m looking for a way to sort PDF bank statements from different banks. The name of the files are not consistent, so I would like to sort by looking at the logo within each PDF. I have a ForEach iterating through the folder so it will grab each file individually. I have tried the Keyword Based Classifier but that seems like it is more for the data extraction rather than sorting these files.

Does anyone have an idea on an efficient way to do this?

Directory.GetFiles into an array, then convert that into a datatable which also has a sort column. Then For Each through the datatable, reading each file and detecting the image. Based on the image found, update the value of CurrentRow’s sort column. Then sort the datatable on that column.

Thank you for the quick response! Could you attach a sample workflow by chance? I’m relatively new to UiPath still so it would be a big help for me to learn!

You try out below steps:

  1. Install the UiPath.AI.ComputerVision.Activities package: If you haven’t already, you’ll need to install this package in UiPath Studio.
  2. Create a workflow: Create a new workflow in UiPath Studio and add a For Each activity to iterate through each file in the folder.
  3. Use the Computer Vision activity to extract the logo: Within the For Each activity, add a Computer Vision activity to extract the logo from each PDF file. You can use the “Extract Logo” activity from the UiPath.AI.ComputerVision.Activities package to do this.
  4. Compare the extracted logo with a known set of logos: After extracting the logo from each PDF file, you can compare it with a known set of logos to determine which bank the statement belongs to. You can use a switch statement to handle each bank separately based on the logo that was detected.
  5. Move the PDF files to the appropriate folder: Finally, use the “Move File” activity to move each PDF file to the appropriate folder based on the bank it belongs to.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.