Grab specific info in PDF text with Regex


This is the result after read Pdf and write text file.

I had to block out most of the information or else our customer information gets leak
But is there a way to get text under specific keywords using Regex?
I know there’s ways to get words infront or behind, but I don’t see a way to grab words below.
For example, I would like to get the blue column’s data under the B/L #, or the blue column under Container #.

The amount of container # also changes from file to file, sometimes there’s only one, sometimes multiple, sometimes none.

Please help, would appreciate a lot, Thank you.

Hi,

It may be better to use DocumentUnderstanding framework.

https://docs.uipath.com/document-understanding/standalone/2022.4/user-guide/introduction

If you need to extract them with regex, can you share specific input text and expected output as file? It’s no problem if dummy data.

Regards,

yes, it possible, like. Get all the pdf text and you can create an match for each one

Hi @Joanne_Chang_LAX

=> Use Read PDF Text or Read PDF with OCR to read the PDF and store the output in a variable say str_text.
=> Use Write Text File activity to write the pdf into the text file.
=> After writing the text file you can use Regex expressions to extract the text.

Share the sample text to extract so that we can help you with regex.

I think you have created duplicate post with same question.
https://forum.uipath.com/t/grab-specific-info-in-pdf-text/572712?u=parvathy
Check into that .

Hope it helps!!

Take the entire line in which you need to extract the data and extract the particular value by splitting the string by index

Hope this helps
Usha

Hey @Joanne_Chang_LAX ,

  1. You can make use of regex to retrieve the data after using the ‘Read PDF Activity’ provided that the data has specific patterns that make it unique.

  2. Or you can use ‘Document Understanding’ which will require you to use certain intelligent packages. Here is a playlist below to help you get started.

Sorry I deleted that post as I wanted to be more detailed in my question, I’ve provided a sample text below, appreciate for the help

TEST.txt (2.5 KB)

Again, I would need the info after “LOAD PICKUP POOL ADDRESS”, under “B/L #”, under “CONTAINER #”, “LAST FREE DAY”, and "PICKUP # (the amount of items under the last three requirements are random, there might be none, there might be multiple)

Hi @Joanne_Chang_LAX

Could you please the q=required output in Bold so that we can give you regex.

Regards

sorry I didn’t understand your meaning, can you state it again?

Give the output that you need in bold @Joanne_Chang_LAX .

Regards

未命名文件 (2).docx (14.0 KB)

Hey @Joanne_Chang_LAX ,

Check this workflow out.

AI_Forum.zip (11.5 KB)
Input FIle:
未命名文件 (2).pdf (30.9 KB)

In the Form extractor make sure you copy paste the API key that’s available in:

cloud.uipath.com> Admin > License > Robots & Services >Document Understanding > Copy API Key

image

Paste the API key above

Expected output:

2 Likes

for address
try this

can this get changing amounts and position of container # and pickup number?

Hey @Joanne_Chang_LAX ,

It depends on the custom area we provide in the ‘Form Extractor’


The area shaded in grey is the custom area provided and hence it will retrieve all the data within that area.

That is why we see ‘Container1 Container2’ in the output below.

I would urge you to take a look at this play list as well.