How to extract the Name and DoB from PAN Card

Hi,

I want to extract the name and Date of Birth from the Scanned image of PAN Card converted to PDF.
sample-pan-card-front.pdf (31.5 KB)

Attached is the sample PAN Card for your reference

Regards,

Hi @raju_alakuntla

You can use the read pdf with OCR activity and store in a String Variable.
Use regular expression to extract the Name and DOB from it.

Note - We are using OCR, because we are extracting the data from the scanned documents.

Hope it helps!!

You can use regex to extract this

Regards,

can you please share me the Regular expressions for these.

Regards,

yes @raju_alakuntla

Let me take sometime.

hey! @raju_alakuntla
Main.xaml (12.6 KB)
Attached the workflow here!



Hope you find it useful!

can you please share me in zip format.

Regards,

Hi @raju_alakuntla ,
You can use documentOCR-> get all text in it
use regex to get datetime format dd/MM/yyyy

In READ PDF WITH OCR Activity:
USE UiPath Document OCR

Regex for DOB:
System.Text.RegularExpressions.Regex.Match(Out_Pdf,β€œ\d{2}.\d{2}.\d{4}”).Value

Regex for Name:
System.Text.RegularExpressions.Regex.Match(Out_Pdf,β€œ(?<=LERARRCT4.)\w+\s\w+”).Value

forum.zip (3.5 KB)

if pdf activites are not installed,please install pdf activities from manage packages

Steps to follow:
–>Use Read pdf with ocr activity,use UiPath document ocr
–>use regex(using find matching pattern activity) to match name->(?<=LERARRCT4\s+)[A-Za-z]+\s+[A-Za-z]+
–>regex to match date->(\d{2}/\d{2}/\d{4})

Regards,

Name is not extracting. Only DoB is extracting.

Regards,

@raju_alakuntla

Use the UiPath Document OCR in Read PDF with OCR activity.
It was extracting like below

CidTR TO4131 TITRA MIxOEX INCOMETAX DEPARTMENT GOVT OF INDIA RAHUL GUPTA 
LERARRCT4 SURESH GUPTA 23/11/1974 Permanent Account Number ABCDE1234F SAMPLE IMMIHELP.COM - Signature

Regular expression to extract the DOB

Regular expression to extract the name

Hope it helps!!

Try this @raju_alakuntla
(?<=INDIA\s+)[A-Za-z]+\s+[A-Za-z]+

Regards,

Does this code work, If I want to use this code for multiple Pan cards?

Regards,

Depends on the format too,
u can follow the same method

If this method helped you to find the solution,please mark it as a solution

regards,

I am working on a web Portal, When I click on the document it will navigate to a new tab where the PAN card can be visible. What should I pass in the β€œFile Name” section in the β€œRead PDF with OCR”

image

Regards,

If your pan card is in pdf format file. Then you can pass the path of pan card pdf file in the File Name option.

If you want to scrap from the website it’s not possible to use the Read PDF with OCR activity. In this case you have to use the get text activity to scrap the data in the website.

Hope you understand!!

is the pan card in the pdf format?

its in pdf format only. But we are not downloading the documents. Its opening in the new tab in the browser in pdf format only.

Regards,

1 Like