How to validate the account numbers in pdf file?

Hello All,

I have to verify in pdf which contains 4 pages as below:

I need to validate whether the account number is present in all 4 pages is same or not.

The account number is start with CIF: followed by 12 to 14 digits.

If all page contains same account number than I need to mark that as successful if any mismatch than Bot needs to throw business rule exception.

Someone please help me how to achieve this.

@naveen.s

Try this approach

  1. Use Get PDF Page Count activity to get page count.
  2. Use while loop to iterate all the pages.
  3. Under while use Read PDF Text to read the page number being iterated
  4. Use regex to get account number. You can ask any LLM for the regex pattern for it.
  5. Store the extracted account number in a list or array.
  6. Once account numbers extracted from all pages, outside of the while loop use logic to get unique account numbers from the list or array we used.
  7. If the unique account numbers count is one then there is only one account number in the pdf else multiple.
1 Like

@naveen.s

  1. Create a flag and set value to true
  2. read pdf with only 1 page and then use regex to get cif CIF\d{12,14}
  3. Get pdf page count and loop on it and skip the first page…Enumerable.Range(2,Pagecount-1)
  4. Inside that use read pdf
  5. use if condition with pdfvalue.contains(extractedvalueinstep2) on true side do nothing and on false side set a flag to false and end loop
  6. output side use check if flag is true or false…if false then cif is not matching

this helps avoid extra looping if CIF is not present on second sheet itself

cheers

1 Like

Hi @naveen.s

Follow the below steps to achieve solution,
β†’ Use the Get PDF Page Count activity to get the count of pages of pdf. Store in a variable called PdfPageCount.
β†’ Then use the For each activity to iterate the each page number, in For each give the below expression, output of For each is currentNumber

Enumerable.Range(1, PdfPageCount)

β†’ Inside for each insert the Read PDF Text activity and open the properties, give the currentNumber.toString in Range field. Create a variable in the Text field as EachPageText.
β†’ After that use the If activity and give the below condition,

currentNumber=1

β†’ Inside then block insert an assign activity and create a variable called FirstPageAccountNumber then write regex expression as below,

- Assign -> FirstPageAccountNumber = System.Text.RegularExpressions.Regex.Match(EachPageText, "CIF\d{12,14}").value.toString

β†’ After If activity insert one more If activity to check below condition,

System.Text.RegularExpressions.Regex.Match(EachPageText, "CIF\d{12,14}").value.toString.equals(FirstPageAccountNumber)

β†’ Create a boolean datatype variable called Bool_MatchAccountNumber
β†’ Inside then block insert an assign activity and assign boolean variable as True.

- Assign -> Bool_MatchAccountNumber = True

β†’ Inside else block insert an assign activity and assign boolean variable as False.

- Assign -> Bool_MatchAccountNumber = False

β†’ After assign activity insert the Break activity to break the loop.

After loop bot will come outside of For each. Then check the boolean variable output Bool_MatchAccountNumber, if it is True then account number in all pages are matched else not matched.

Check the below workflow for better understanding,

Hope it helps!!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.