Hi,
I want to extract the name and Date of Birth from the Scanned image of PAN Card converted to PDF.
sample-pan-card-front.pdf (31.5 KB)
Attached is the sample PAN Card for your reference
Regards,
Hi,
I want to extract the name and Date of Birth from the Scanned image of PAN Card converted to PDF.
sample-pan-card-front.pdf (31.5 KB)
Attached is the sample PAN Card for your reference
Regards,
You can use the read pdf with OCR activity and store in a String Variable.
Use regular expression to extract the Name and DOB from it.
Note - We are using OCR, because we are extracting the data from the scanned documents.
Hope it helps!!
You can use regex to extract this
Regards,
can you please share me the Regular expressions for these.
Regards,
yes @raju_alakuntla
Let me take sometime.
can you please share me in zip format.
Regards,
Hi @raju_alakuntla ,
You can use documentOCR-> get all text in it
use regex to get datetime format dd/MM/yyyy
In READ PDF WITH OCR Activity:
USE UiPath Document OCR
Regex for DOB:
System.Text.RegularExpressions.Regex.Match(Out_Pdf,β\d{2}.\d{2}.\d{4}β).Value
Regex for Name:
System.Text.RegularExpressions.Regex.Match(Out_Pdf,β(?<=LERARRCT4.)\w+\s\w+β).Value
forum.zip (3.5 KB)
if pdf activites are not installed,please install pdf activities from manage packages
Steps to follow:
β>Use Read pdf with ocr activity,use UiPath document ocr
β>use regex(using find matching pattern activity) to match name->(?<=LERARRCT4\s+)[A-Za-z]+\s+[A-Za-z]+
β>regex to match date->(\d{2}/\d{2}/\d{4})
Regards,
Name is not extracting. Only DoB is extracting.
Regards,
Use the UiPath Document OCR in Read PDF with OCR activity.
It was extracting like below
CidTR TO4131 TITRA MIxOEX INCOMETAX DEPARTMENT GOVT OF INDIA RAHUL GUPTA
LERARRCT4 SURESH GUPTA 23/11/1974 Permanent Account Number ABCDE1234F SAMPLE IMMIHELP.COM - Signature
Regular expression to extract the DOB
Regular expression to extract the name
Hope it helps!!
Try this @raju_alakuntla
(?<=INDIA\s+)[A-Za-z]+\s+[A-Za-z]+
Regards,
Does this code work, If I want to use this code for multiple Pan cards?
Regards,
Depends on the format too,
u can follow the same method
If this method helped you to find the solution,please mark it as a solution
regards,
I am working on a web Portal, When I click on the document it will navigate to a new tab where the PAN card can be visible. What should I pass in the βFile Nameβ section in the βRead PDF with OCRβ
Regards,
If your pan card is in pdf format file. Then you can pass the path of pan card pdf file in the File Name option.
If you want to scrap from the website itβs not possible to use the Read PDF with OCR activity. In this case you have to use the get text activity to scrap the data in the website.
Hope you understand!!
is the pan card in the pdf format?
its in pdf format only. But we are not downloading the documents. Its opening in the new tab in the browser in pdf format only.
Regards,