Dynamic PDF Automation

Good afternoon,

I need to retrieve 2 fields from different pdfs completely dynamic as they will come in different formats.

For example:

Field Name: 890272-908

Field Name 890272-908

Field Name

Field Name.

Field Name: 890272-908

In the example above, I would need to extract 890272-908 (Sometimes there is no dash for the numbers and it would be just ex. 890272) making sure that is the right field taking into consideration that the field names also come in different variations (See example below).

The fields could come in many ways for instance one of them could be Po No, Purchase Order Number, Purchase Order No, Order Number, Your Reference, etc. The same would go to my other field, it has multiple variations.

I tried pasting the entire PDF in Excel but it pastes the data in multiple variations as well depending on the format of the file.

Any suggestions on how to approach this?


Hi @rjackson

Either u can get text activity
U can read the pdf with read pdf activity and the store the result in string and then do the string manipulaton

Nived N
Happy automation

The position of these fields change every time, as I mentioned before given that the format of the files is different so get text activity would not work on this scenario.

I did try read pdf but again the format is different so it hard to work with the strings if they are not always the same as you can see on my examples of my original post.

Any other recommendations?

Like what i meant is u can use string manipulation for this, u can get the field name by splliting the text by using Field Name as delimiter, and extract like that

The Field name is also dynamic, it’s not the same.
Could you provide an example of what you mean?

can u share the sample pdf file ?

I cannot share the files because they contain sensitive data but here are some samples that could help you visualize my current issue.

As I mentioned earlier, I have multiple files, all with different formats, I don’t have an specific template. For this specific samples, I would have to retrieve the PO Number field. (On these samples the field title it’s the same “PO Number” but on my real data it comes in different ways like “Purchase Order Number”, “Po. Number.”, “PO#” and many other variations)

Sample 1:

Sample 2:

Sample 3: