String manipulation in pdf

Hi,
here i need to get only [SSUMA PRABHA SAVANUR VESHWANATHYEERAPPABULLA] in the below string.
but am not able to get that name… can anyone help me out to resolve.

INCOMETAXDEPARTMINT 'irgila
SSUMA PRABHA SAVANUR
VESHWANATHYEERAPPABULLA
01/00/1113
Permanent’Account
Abcd12345
Signature Unna. GOVT. OF INDIA
Scanned by CamScanner

(?<=INCOMETAXDEPARTMINT)(.|\n)*(?=\d{1,2}.\d{1,2}.\d{1,4})
use this in matches activity!
cheers @Kavyashreems

Hi @Kavyashreems

Please do check this

(?<=INCOMETAXDEPARTMINT)(.|\n)*(?=\d{2}.\d{2}.\d{4})

Thanks
Ashwin S

here am getting extra characters like ['irgila] and dob.

thats because ocr issue
did you try using any other OCR??

here i have used microsoft ocr

Pls use this below regular expression
Link:

Regex:
\W*((?i)SSUMA PRABHA SAVANUR
VESHWANATHYEERAPPABULLA(?-i))\W*

Hi @Kavyashreems

Check this

(?<=INCOMETAXDEPARTMINT 'irgila)(.|\n)*(?=\d{2}.\d{2}.\d{2})

Thanks
Ashwin S

@Kavyashreems Will irgila always be present when Scraping Data from different PDF’s of Same Format ?

NO its not same

thank you so much