Need REGEX code for extracted PDF info

gustavo_marrufo · September 20, 2021, 1:38pm

Hello
I scraped information from a PDF. This word doc has the output. I need to use REGEX to extract certain fields. The words highlighted in blue are the names of the “fields” in the PDF. The highlighted words in yellow is the info I need to extract using REGEX. Can someone help me build the codes? Any assistance will be appreciated.

extracted PDF info.docx (15.9 KB)

NIVED_NAMBIAR · September 20, 2021, 1:52pm

Hi @gustavo_marrufo

Just to know there are common field as well Transferre name which are occuring twice, did u data from that as well, or noy hihghlighted data needed to be extracted?

gustavo_marrufo · September 20, 2021, 2:04pm

Yes I did notice, but the highlighted in yellow is the information I need to extract using REGEX

NIVED_NAMBIAR · September 20, 2021, 2:32pm

Hi @gustavo_marrufo
Below are regex patterns

For the word,

U.S. US Fish & Wildlife Service/Region 7  ------  (?<=DEPARTMENT OR AGENCY, BUREAU OR SERVICE, AND LOCATION SHOWN ON SUBVOUCHERS BUR. VOU. NO.\s+).*

CARRIER'S BILL NUMBER ------------------  (?<=CARRIER'S BILL NUMBER )\w+

Transferee:-------------------------------    (?<=^Transferee: )\w+ , U had to use set multiline option here

GBL Number: ----------------------------  (?<=PAYEE’S CERTIFICATE\s+GBL Number: ).* 

TA Number ------------------------------- (?<=TA Number: ).*

Total Claimed ---------------------------  (?<=TOTAL CLAIMED . )\$[\d\.]+

Invoice Number ------------------------ (?<=Invoice Number: )\w+

Total Charges ------------------------------  (?<=Total Charges\s+)[\d\.]+

These are regex for the specified document

Please ensure the multiline option is set for all

Regards,
Nived N

NIVED_NAMBIAR · September 20, 2021, 7:07pm

Hi @gustavo_marrufo

If this resolves ur query

Kindly mark the appropriate answer as solution

Topic		Replies	Views
Need regex to extract the data Activities activities , question , document_processing	13	1302	February 22, 2021
Need help with regex to extract from pdf Help pdf , activities , regex , question	13	2230	August 19, 2021
Grab specific info in PDF text with Regex Studio studio , question , activities_panel	14	525	August 18, 2023
Regex advice Studio studio , regex , question , string-manipulation	6	949	October 12, 2022
Regex wizards look here! Studio	3	740	December 8, 2020

Most Active Users - Yesterday
sonaliaggarwal47
A_Learner
sharazkm32
More details...

Need REGEX code for extracted PDF info

Related topics