Doubt in pdf automation

In pdf if there is any blank how to capture the blank also in the pdf orelse if there is any text how to capture the text also

@anjani_priya

Please Put the input text so that we can Extract

I cant capture the empty blank in pdf how to capture?

Hi @anjani_priya

You can use Read PDF Text to read the PDF and extract the text. If it’s a scanned PDF you can use Read PDF with OCR to read the PDF and extract the text. After that you can use Regular Expressions to extract particular field.

Regards

Capture6
Iam enable to capture the empty spaces in pdf

Hi @anjani_priya

If the pdf is structured one then use the Get pdf text activity to extract the text and store in a String Variable.

Then use the Regular expressions to extract the required data from the String variable.

Hope it helps!!

Capture6
iam enable to capture the empty filed in get text activity

iam unable to capture the empty field because I have to specify the field if it empty or it should get the text from the field

Are you using the Ui activities to extract the data from the pdf… @anjani_priya

I have used anchor base and get text to get the text from pdf

Okay @anjani_priya

Can you share your pdf file as image, then its better for our understanding.


assume that due date is empty the condition is like if there is text the text should come to excel or else if the field is empty then the excel cell should also be empty
Iam unable to capture the empty field in get text activity how to do it

Hi @anjani_priya

You can try this way if you want to know if there is a specific value in the field or just blank

  1. Read the pdf using read pdf activity
  2. Use regex to extract the value in the fields and store it in a string variable
  3. Use if condition to know if the value is null or anything is there.

can you send the sample code how to extract the empty field in pdf

Okay @anjani_priya

send me the invoice pdf in personal then I’ll extract this pdf into text and I’ll give you the regular expression to extract it and condition also.

wordpress.pdf (42.6 KB)

@anjani_priya

Read the above invoice using Read pdf text with ocr

use write text file activity to write the string variable data into a notepad file
Use regex expressions to extract the due date

Or put the notepad file i can help you for getting the regex if the due date

Hi @anjani_priya

Use the read pdf text activity and use the below regex expression

(?<+Due Date\s*)([A-Z].*)

Regards

can you send the sample code

HI @anjani_priya

Please check the below flow:

System.Text.RegularExpressions.Regex.Match(str_Text,"(?<=Due Date\s?)([A-Z].*)").Value

Regards