How to extract specific data from multiple PDFs?

Good afternoon! How do I extract the CNPJ from 20 PDFs in a row? Would you have to do it one by one or do you have to extract this data from all PDFs at once?

You have to do this for each PDF, but you can do it in a For Each loop to iterate over all of the PDFs in a directory.


Hello, @Brenosants

  • i think all PDF’s are in folder,if so you can do this use Assiign Activty like this
    ArrayVar = Directory.GetFiles(“YourfolderPath”)

  • use For each loop and pass the ArrayVar as a input to the For Each Loop

  • if PDF’s are scanned one’s you can use Read PDF with OCR and you can use any OCR Engines if the PDF’s are Digital one’s you can use Read PDF Text

  • pass Item as input to the Read PDF Text and get output as outStr you’ll get output type as System.String

  • then you can use Matches activity to extract the CNPJ
    i can help you with the CNPJ extraction only when i see how the data in pdf

  • pass outStr as input to the Matches Activity and get Output like outMatches you’ll get the collections as output you can print the result using WriteLine like this outMatches(0)

Let me know if this helps you

