Need REGEX code for extracted PDF info

Hello
I scraped information from a PDF. This word doc has the output. I need to use REGEX to extract certain fields. The words highlighted in blue are the names of the “fields” in the PDF. The highlighted words in yellow is the info I need to extract using REGEX. Can someone help me build the codes? Any assistance will be appreciated.

extracted PDF info.docx (15.9 KB)

Hi @gustavo_marrufo

Just to know there are common field as well Transferre name which are occuring twice, did u data from that as well, or noy hihghlighted data needed to be extracted?

Yes I did notice, but the highlighted in yellow is the information I need to extract using REGEX

Hi @gustavo_marrufo
Below are regex patterns

For the word,

U.S. US Fish & Wildlife Service/Region 7  ------  (?<=DEPARTMENT OR AGENCY, BUREAU OR SERVICE, AND LOCATION SHOWN ON SUBVOUCHERS BUR. VOU. NO.\s+).*

CARRIER'S BILL NUMBER ------------------  (?<=CARRIER'S BILL NUMBER )\w+

Transferee:-------------------------------    (?<=^Transferee: )\w+ , U had to use set multiline option here

GBL Number: ----------------------------  (?<=PAYEE’S CERTIFICATE\s+GBL Number: ).* 

TA Number ------------------------------- (?<=TA Number: ).*

Total Claimed ---------------------------  (?<=TOTAL CLAIMED . )\$[\d\.]+

Invoice Number ------------------------ (?<=Invoice Number: )\w+

Total Charges ------------------------------  (?<=Total Charges\s+)[\d\.]+

These are regex for the specified document

Please ensure the multiline option is set for all

Regards,
Nived N

1 Like

Hi @gustavo_marrufo

If this resolves ur query

Kindly mark the appropriate answer as solution